# A look at ext4

In this Jupyter notebook I want to give an interactive "hello world" walkthrough of how data is organized within ext4, hoping that in the process I will also find out a bit about how it works.

**NOTE**: this notebook attempts to mount the created filesystem, which will require either root privileges or passwordless sudo. You might want to run `sudo true` in your terminal and restart Jupyter before running this.

## Creating up the filesystem

One can create an ext4 filesystem on a file (which can also be a block device, such as hard drive) using `mkfs.ext4` command. Let's create such device and take a look at what's there:

In [None]:
from subprocess import check_output, STDOUT, CalledProcessError
C = lambda *a, **kw: print(check_output(*a, **kw, stderr=STDOUT, shell=True).decode('ascii').rstrip())
C("dd if=/dev/zero bs=1024 count=128 of=fs.img && mkfs.ext4 fs.img")

As we can see from `mkfs.ext4`'s output, we now have a file that once contained 131072 zeros and now has an ext4 layout initialized there. It's probably worth noting that we got a warning that the filesystem is `too small for a journal`, which means tha at least one of the features was disabled. Assuming that fs.img is already a valid filesystem, I won't bother to figure out what difference lack of journal makes - at least not for now. It also appears that we have room for 16 inodes, so I would guess I can't put more than 16 files in there. Before we try to mount the filesystem, let's take a brief look at the hexdump of the file (`*` means "row repeated"):

In [None]:
from IPython.core.display import display, HTML
from subprocess import check_output
hexdump_output = check_output('hexdump -C fs.img', shell=True).decode('ascii')
hexdump_output = hexdump_output.replace(' 00', '<span style="color: red"> 00</span>')
display(HTML('<pre>%s</pre>' % hexdump_output))

It appears that we're looking mostly looking at `\x00` and `\xFF`'s, let's verify that:

In [None]:
import zlib
def see_file_size(fname):
    with open(fname, 'rb') as f:
        fs_img = f.read()
    b0x00_cnt = fs_img.count(b'\x00')
    b0xff_cnt = fs_img.count(b'\xff')
    other_count = len(fs_img) - b0x00_cnt - b0xff_cnt
    print("00=%d, ff=%d, other=%d" % (b0x00_cnt, b0xff_cnt, other_count))
    print("The file would compress with zlib to %dB." % len(zlib.compress(fs_img)))
see_file_size('fs.img')

About 600 bytes of actual data... looks like it's not that much to figure out at this point, hopefully! Now let's mount it to verify if it's valid and actually contains anything. We'll also need to back it up before if we want to see if just mounting it changed the contents.

In [None]:
from subprocess import check_output, STDOUT, CalledProcessError
C = lambda *a, **kw: print(check_output(*a, **kw, stderr=STDOUT, shell=True).decode('ascii'), end='')
try:
    should_raise = False
    C('SUDO_ASKPASS=/bin/false sudo -A true')
except CalledProcessError:
    should_raise = True  # otherwise we would get a long stack trace
if should_raise:
    raise RuntimeError("This will not work without passwordless sudo.")

C('cp fs.img fs_empty.img')
C('mkdir -p out && sudo mount fs.img out')
C('ls -lra out')
C('sudo umount out')

Looks like we've only found `lost+found` directory, which was the only thing we could find by looking at the binary file. Now, has the file changed after mounting?

In [None]:
! bash -c 'diff -u <( hexdump -C fs.img ) <( hexdump -C fs_empty.img ) > after_mounting.diff'

from IPython.core.display import display, HTML
from pygments import highlight
from pygments.lexers import DiffLexer
from pygments.formatters import HtmlFormatter

display(HTML('<style>%s</style>' % HtmlFormatter().get_style_defs('.highlight')))
with open('after_mounting.diff') as f:
    highlighted = highlight(f.read(), DiffLexer(), HtmlFormatter())
    display(HTML(highlighted))

It looks like we observed at least three changes, which is something I will probably explore later. While looking for a description of ext4 header, I found a note about `dumpe2fs`, which gives interesting output:

In [None]:
! dumpe2fs fs.img

This is confusing to me - it says that superblock is at block \#1 and block size is 1024, so unless I'm getting anything wrong, it should be a bunch of zeros... unless it's zero-indexed?

In [None]:
! dd if=fs.img bs=1024 count=1 | hexdump -C
! echo ">>>>>>TRYING SECOND BLOCK"
! dd if=fs.img bs=1024 count=1 skip=1 | hexdump -C

Yes, the second entry looks more promising. While trying to interpret it I came across [this kernel.org wiki entry](https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout) and decided to write some code to try to extract some data based on their description:

In [None]:
import requests
from lxml import html

# Visit the website, fetch its HTML and parse it
t = html.fromstring(requests.get('https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout').text)

In [None]:
import struct
from collections import OrderedDict

field_sizes = {'__le16': 2, '__le32': 4, '__le64': 8, '__u8': 1, '__u32': 4, 'char': 1}
field_types = {'__le16': 'H', '__le32': 'I', '__le64': 'Q', '__u8': 'B', '__u32': 'I'}


def parse_ext4_description_at_offset_f(f, section_id):
    
    ret = OrderedDict()
        
    started = f.tell()

    # Find "The Super Block" table, iterate over every row other than the first...
    table_xpath = '//h1/span [@id="%s"]/..//following::table[1]/tr' % section_id
    for row in t.xpath(table_xpath)[1:]:
        if len(row) != 4:
            continue

        row_texts = [td.text_content().strip() for td in row]
        offset, field_type, field_name, description = row_texts
        offset = int(offset, 16)
        
        if 'bytes' in field_type:
            # This is here because "Inode table" has nested types inside
            # and I just want to show them raw
            field_cnt = int(field_type.split(' ')[0])
            field_type = 'char'
        elif '[' not in field_name:
            field_cnt = 1
        else:
            field_cnt = int(field_name.split('[')[1].split(']')[0])

        field_size = field_sizes[field_type] * field_cnt

        f.seek(started + offset)
        field_value = f.read(field_size)

        if field_type == '__u8' and field_size > 1:
            field_type = 'char'

        if field_type == 'char':
            try:
                field_value = field_value.decode('ascii')
            except UnicodeDecodeError:
                pass
        else:
            field_format = '<' + field_types[field_type] * field_cnt
            field_value = struct.unpack(field_format, field_value)
            
        ret[field_name] = field_value
    return ret

def parse_ext4_description_at_offset(skip, section_id):
    with open('fs.img', 'rb') as f:
        f.seek(skip)
        return parse_ext4_description_at_offset_f(f, section_id)

parse_ext4_description_at_offset(1024, 'The_Super_Block')

Yay! Looks like we managed to parse something! I still have no idea how to find the first inode though. How about we cheat a bit and interpret `dumpe2fs`'s `First inode: 11` as a hint that we should be looking at 12th block?

In [None]:
(
    parse_ext4_description_at_offset(1024 * 12, 'Inode_Table') ==
    parse_ext4_description_at_offset(1024 * 11, 'Inode_Table') == 
    parse_ext4_description_at_offset(1024 * 10, 'Inode_Table') ==
    parse_ext4_description_at_offset(1024 * 9, 'Inode_Table')
)

Either we have four copies of first inode or - what's much more likely - we're out of luck with this trick. Sadly, it looks like I'll have to read the spec (which is boring) or find another way. I first came across https://github.com/skeledrew/ext4-raw-reader, but it was some ugly Python 2 code and then tried https://github.com/tegrak/Fricando/tree/master/ext4img-parser which didn't seem to get me anywhere either. Then I decided to switch to my second favourite language, which is Rust (and which I'm less fluent at) and after a bit of hacking I managed to run tests for [https://github.com/FauxFaux/ext4-rs](ext4-rs) based on this empty filesystem. Here's the proof:

```
<2> all-types-tiny.img: Directory([DirEntry { inode: 2, file_type: Directory, name: "." }, DirEntry { inode: 2, file_type: Directory, name: ".." }, DirEntry { inode: 11, file_type: Directory, name: "lost+found" }]) Stat { extracted_type: Directory, file_mode: 493, uid: 0, gid: 0, size: 1024, atime: Time { epoch_secs: 1527420384, nanos: None }, ctime: Time { epoch_secs: 1527420384, nanos: None }, mtime: Time { epoch_secs: 1527420384, nanos: None }, btime: None, link_count: 3, xattrs: {} }
<11> all-types-tiny.img/lost+found: Directory([DirEntry { inode: 11, file_type: Directory, name: "." }, DirEntry { inode: 2, file_type: Directory, name: ".." }]) Stat { extracted_type: Directory, file_mode: 448, uid: 0, gid: 0, size: 12288, atime: Time { epoch_secs: 1527420384, nanos: None }, ctime: Time { epoch_secs: 1527420384, nanos: None }, mtime: Time { epoch_secs: 1527420384, nanos: None }, btime: None, link_count: 2, xattrs: {} }
```

I took a leap of faith and decided to read its code. I found that in order to get to an inode, I probably need to parse block group data which is in the next 1024 bytes, so let's see if `parse_ext4_description_at_offset` will get me anywhere:

In [None]:
from pprint import pprint
sblk = parse_ext4_description_at_offset(1024, 'The_Super_Block')
blocks_count = 1  # TODO: got this from ext4-rs, was nontrivial to calculate
groups = []
with open('fs.img', 'rb') as f:
        f.seek(2048)
        for i in range(blocks_count):
            group = parse_ext4_description_at_offset_f(f, 'Block_Group_Descriptors')
            groups.append(group)
            pprint(group)

It seems that we should be looking for the inode at block 35. Let's try:

In [None]:
offset = 1024 * groups[0]['bg_inode_table_lo'][0]
inode_number = sblk['s_first_ino'][0] - 1
offset += (inode_number * sblk['s_inode_size'][0])
parse_ext4_description_at_offset(offset, 'Inode_Table')

Yay! We've got sane `i_mode` (because `oct(16832)` == `'0o40700'`) and it looks like we're on a good way towards actually reading the directory name. We know `i_size_lo = 12288`, but where is it exactly and how to parse it?