# A look at ext4

In this Jupyter notebook I want to give an interactive "hello world" walkthrough of how data is organized within ext4, hoping that in the process I will also find out a bit about how it works.

**NOTE**: this notebook attempts to mount the created filesystem, which will require either root privileges or passwordless sudo. You might want to run `sudo true` in your terminal and restart Jupyter before running this.

## Creating up the filesystem

One can create an ext4 filesystem on a file (which can also be a block device, such as hard drive) using `mkfs.ext4` command. Let's create such device and take a look at what's there:

In [None]:
from subprocess import check_output, STDOUT, CalledProcessError
C = lambda *a, **kw: print(check_output(*a, **kw, stderr=STDOUT, shell=True).decode('ascii'))
C("dd if=/dev/zero bs=1024 count=128 of=fs.img && mkfs.ext4 fs.img")

As we can see from `mkfs.ext4`'s output, we now have a file that once contained 131072 zeros and now has an ext4 layout initialized there. It's probably worth noting that we got a warning that the filesystem is `too small for a journal`, which means tha at least one of the features was disabled. Assuming that fs.img is already a valid filesystem, I won't bother to figure out what difference lack of journal makes - at least not for now. It also appears that we have room for 16 inodes, so I would guess I can't put more than 16 files in there. Before we try to mount the filesystem, let's take a brief look at the hexdump of the file (`*` means "row repeated"):

In [None]:
from IPython.core.display import display, HTML
from subprocess import check_output
hexdump_output = check_output('hexdump -C fs.img', shell=True).decode('ascii')
hexdump_output = hexdump_output.replace(' 00', '<span style="color: red"> 00</span>')
display(HTML('<pre>%s</pre>' % hexdump_output))

It appears that we're looking mostly looking at `\x00` and `\xFF`'s, let's verify that:

In [None]:
import zlib
def see_file_size(fname):
    with open(fname, 'rb') as f:
        fs_img = f.read()
    b0x00_cnt = fs_img.count(b'\x00')
    b0xff_cnt = fs_img.count(b'\xff')
    other_count = len(fs_img) - b0x00_cnt - b0xff_cnt
    print("00=%d, ff=%d, other=%d" % (b0x00_cnt, b0xff_cnt, other_count))
    print("The file would compress with zlib to %dB." % len(zlib.compress(fs_img)))
see_file_size('fs.img')

Less than 650 bytes of actual data... looks like it's not that much to figure out at this point, hopefully! Now let's mount it to verify if it's valid and actually contains anything. We'll also need to back it up before if we want to see if just mounting it changed the contents.

In [None]:
from subprocess import check_output, STDOUT, CalledProcessError
C = lambda *a, **kw: print(check_output(*a, **kw, stderr=STDOUT, shell=True).decode('ascii'))
try:
    should_raise = False
    C('SUDO_ASKPASS=/bin/false sudo -A true')
except CalledProcessError:
    should_raise = True  # otherwise we would get a long stack trace
if should_raise:
    raise RuntimeError("This will not work without passwordless sudo.")

C('cp fs.img fs_empty.img')
C('mkdir -p out && sudo mount fs.img out')
C('ls -lra out')
C('sudo umount out')

In [None]:
! bash -c 'diff -u <( hexdump -C fs.img ) <( hexdump -C fs_empty.img ) > after_mounting.diff'

from pygments import highlight
from pygments.lexers import DiffLexer
from pygments.formatters import HtmlFormatter

display(HTML('<style>%s</style>' % HtmlFormatter().get_style_defs('.highlight')))
with open('after_mounting.diff') as f:
    highlighted = highlight(f.read(), DiffLexer(), HtmlFormatter())
    display(HTML(highlighted))