# The Linux filesystem

> A Unix file is just a big bag of bytes, with no other attributes. In particular, there is no capability to 
> store information about the file type or a pointer to an associated application program outside the
> file's actual data.
>
> More generally, everything is a byte stream; even hardware devices are byte streams. This metaphor was a
> tremendous success of early Unix, and a real advance over a world in which (for example) compiled programs
> could not produce output that could be fed back to the compiler. Pipes and shell programming sprang from
> this metaphor.
>
> But Unix's byte-stream metaphor is so central that Unix has trouble integrating software objects with
> operations that don't fit neatly into the byte stream or file repertoire of operations (create, open, read, 
> write, delete). This is especially a problem for GUI objects such as icons, windows, and ‘live’ documents.
> 
> http://www.catb.org/~esr/writings/taoup/html/ch20s03.html

A filesystem is the part of an operating system that controls how data is stored and retrieved.
A filesystem is most often associated with physical storage media (disk, USB drive, optical media, 
magnetic tape, etc.). Most computer users are used to interacting with the filesystem using a
graphical file browser.

There are many different filesystems (https://en.wikipedia.org/wiki/List_of_file_systems). 
Windows filesystems include FAT, exFAT, and NTFS.
macOS uses  AFS (Apple File System). Linux supports a multitude of filesystems; some commmon ones
include ext, ext2, ext3, ext4, ReiserFS, and XFS.

## Devices

A device is a physical piece of hardware that can store files such as a disk drive, USB flash drive, 
CD-ROM drive, or magnetic tape drive. In UNIX, each device corresponds to one or more special files called
device special files. A device special file is an interface to a device driver. Data can be read and/or
written from and/or to the device through the file. The device special files are how UNIX models
devices as byte stream.

## Filesystems

A filesystem organizes data placed on a storage device. Data can be organized into files and files are
given names. Filesystems typically have directories which represent a collection of files having a collective
name. Most modern filesystems supported nested directories to support hierarchies of directories.

The contents of a file are just bits. Imagine that an OS abstracts a storage device as a simple array. An
image of any empty storage device might look like an empty array:

![](./images/fs1.png)

Suppose that we write five files to the storage device. Using our array abstraction, an image of the
storage device might look like the following where each color corresponds to a different file:

![](./images/fs2.png)

Even with such a simplistic abstraction, the filesystem needs to store extra information so that it
can read the contents of a particular file. Take a moment to think about what extra information
needs to be stored. What other information might be useful?

## Inodes

UNIX uses *inodes* to store information about the contents of a file. A partial list of the information stored
includes:

* the device where the inode resides
* file type
* user ID (owner of the file)
* file size
* time when the file was created
* time when the file was last modified
* time when the file was last accessed
* and more (including where to find the file contents)

An inode has an integer number called the *inode number* that is a unique id for the inode on a given device.
Different devices can have inodes with the same inode number so the inode number is not a globally unique
id for the filesystem.

Notice that an inode does not store the filename of a file. Filenames are stored in a separate file
called a *directory*. A directory is a file whose contents is basically a table mapping filenames to inode
numbers. For example, on the author's computer the directory containing the Bash notebooks looks like
the following:

| inode number | filename |
| :--- | :--- |
| 662400 | `images/` |
| 655381 | `scripts/` |
| 662411 | `animals.txt` |
| 655371 | `arithmetic.ipynb` |
| 657898 | `arrays.ipynb` |
| 661257 | `cli_intro.ipynb` |
| ... | |

In UNIX, ordinary files that contain data are called *regular files*. Directories are files, but they are not
regular files.

#### Hard links

A hard link is a directory entry that maps a filename to a file (inode). Every file has at least one hard link. The
table in the previous cell shows the hard links for the files in the directory containing this notebook.

It is possible to create additional hard links to a single file (Linux does not allow more than one hard link
to a directory). When this happens, multiple filenames all refer to the same file. If we edit the file
using one of the filenames and then save the results, then viewing the file with any of its alternate
filenames will show the edited file.

See the *Files* notebook for additional information on hard links and symbolic links.

## Filesystem hierarchy standard

The [filesystem hierarchy standard](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard)
defines the directory structure and directory contents for Linux distributions.
macOS uses a similar, but different, hierarchy.
We do not need to study this in detail but it is useful to know something about it.

There is a top-level (or uppermost) directory named `/` that contains all other directories. The
top-level directory is usually called *slash* or the *root* directory.

In a terminal, we can change to a directory using the `cd` built-in shell command. The following
will change the *current working directory* to the root directory:

In [None]:
cd /

The built-in shell command `pwd` will print the name of the current working directory:

In [None]:
pwd

The command `ls` (ell ess, not one ess) will list the files in a directory:

In [None]:
ls

There is a distinction between built-in shell commands and other commands.
Built-in commands are commands that are defined by the Bash shell.
Other commands are just programs that can be run by the Bash shell. To get documentation for a built-in
command use:

In [None]:
man builtins

or use the `help` built-in:

In [None]:
help cd

In [None]:
help pwd

For other commands, use the `man` command to obtain the documentation.

In [None]:
man ls

A (very) incomplete image of the directory structure described by the hierarchy standard is shown below:

![](./images/fs_hierarchy.png)

The standard states the the following directories are required in `/` (https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s02.html):

| Directory | Description |
| :--- | :--- |
| bin | Essential command binaries |
| boot | Static files of the boot loader |
| dev | Device files |
| etc | Host-specific system configuration |
| home | *Optional:* User home directories |
| lib | Essential shared libraries and kernel modules |
| media | Mount point for removable media |
| mnt | Mount point for mounting a filesystem temporarily |
| opt | Add-on application software packages |
| run | Data relevant to running processes |
| sbin | Essential system binaries |
| srv | Data for services provided by this system |
| tmp | Temporary files |
| usr | Secondary hierarchy |
| var | Variable data |

The contents of a specific directory can be listed by passing the directory name to the `ls` command as a
command line argument:

In [None]:
ls /bin

## Absolute pathnames

A pathname is the name of a file that uniquely identifies the location of a file.
An absolute pathname includes all of directory names that lead to the file. 
Directories are separated by the `/` character. The following are some examples of
absolute pathnames:

| Absolute pathname | Description |
| :--- | :--- | 
| `/` | the root directory |
| `/bin` | the `bin` directory located in the root directory |
| `/bin/ls` | the `ls` file located in the `/bin` directory |
| `/home/cisc220/CISC220/notes/bash/filesystem.ipynb` | the absolute pathname of this notebook file |


## Relative pathnames

A relative pathname includes all of the directory names that lead to the file starting from the current
working directory. The following are some examples of relative pathnames assuming that the working
directory is `/`:

| Relative pathname | Description |
| :--- | :--- | 
| `.` | the root directory |
| `bin` | the `bin` directory located in the root directory |
| `bin/ls` | the `ls` file located in the `/bin` directory |

The following are some examples of relative pathnames assuming that the working
directory is `/home/cisc220/CISC220`:

| Relative pathname | Description |
| :--- | :--- | 
| `..` | the parent directory `/home/cisc220` |
| `notes` | the `notes` directory located in the directory `/home/cisc220/CISC220` |
| `notes/bash/filesystem.ipynb` | the relative pathname of this notebook file located in the directory `/home/cisc220/CISC220` |

Most commands that accept a pathname will accept both absolute and relative paths. For example, if the
current working directory is the root directory, then we can list the contents of any other directory using a relative path:

In [None]:
# enter cd / if you've changed to a different directory
ls home

In [None]:
# enter cd / if you've changed to a different directory
ls usr

In [None]:
# enter cd / if you've changed to a different directory
ls home/cisc220

## More about the `cd` command

`cd` on its own will change to the current user's home directory:

In [None]:
cd
pwd

The symbol `~` expands to the current user's home directory (but this can be changed):

In [None]:
cd ~
pwd

The symbol `.` represents the name of the current working directory:

In [None]:
cd
pwd
cd .
pwd

The symbol `..` represents the name of the parent directory of the current working directory:

In [None]:
cd
pwd
cd ..
pwd

You can move up multiple directories by using multiple `..`s:

In [None]:
cd
pwd
cd ../..
pwd

If you have been navigating through the directory structure and want to go back to the previous working 
directory, then you can use `cd -`:

In [None]:
cd /usr
pwd
cd         # should now be in your home directory
pwd
cd -       # should now be back in /usr, should print /usr

`cd -` prints the absolute path of the previous directory. You can suppress the output using `cd ~-`:

In [None]:
cd /usr
pwd
cd         # should now be in your home directory
pwd
cd ~-      # should now be back in /usr