In [2]:
%run -i ../python/common.py
publish=False

if not publish:
    # cleanup any old state
    bashCmds('''rm -rf bad-link
    rm -rf loopy''')
else:
    bashCmds('''rm -rf ~/*''')
    
closeAllOpenTtySessions()
bash = BashSession()

generated="~/myfile ~/errors ~/mydate ~/mydir ~/mynewdir ~/out"

# remember to avoid something having a history entry put a space at the beginning of the line
#bash.run(" history")

(cont:fs:interface)= 
# Interface

## Objects

General-purpose operating systems typically provide access to block
storage (i.e. disks) via a *file system*, which provides a much more
application- and user-friendly interface to storage. From the point of
view of the user, a file system contains the following elements:


- a *name space*, the set of names identifying objects;
- *objects* such as the files themselves as well as directories and other supporting objects;
- *operations* on these objects.

[^hier]: Very early file systems sometimes had a single flat directory per user, or like MS-DOS 1.0, a single directory per floppy disk

**Hierarchical namespace:** File systems have traditionally used a
tree-structured namespace[^hier], as shown in {numref}`fs:tree-logical`. 
This tree is constructed via the use of
*directories*, or objects in the namespace which map strings to further
file system objects. A full filename thus specifies a *path* from the
root, through the tree, to the object (a file or directory) itself.
(Hence the use of the term "path" to mean "filename" in Unix
documentation)

```{figure} ../images/pb-figures/fs/filesys-tree.png
---
width: 45%
name: fs:tree-logical
---
Logical view: hierarchical file system name space
```

**File:** Early operating systems supported many different file
types---binary executables, text files, and record-structured files, and
others. The Unix operating system is the earliest I know of that
restricted files to sequences of 8-bit bytes; it is probably not a
coincidence that Unix arrived at the same time as computers which dealt
only with multiples of 8-bit bytes (e.g. 16 and 32-bit words), replacing
older systems which frequently used odd word sizes such as 36 bits.
(Note that a machine with 36-bit instructions already needs two
incompatible types of files, one for text and one for executable code)


% an attempt to put side by side; didn't work
% :::{figure-md} fig:filesys:tree
% ![alt](../images/pb-figures/fs/filesys-tree.png) ![alt](../images/pb-figures/fs/filesys-tree2.png)
% 
% Logical (left) and implementation (right) view of a hierarchical file system name space.
% :::


```{figure} ../images/pb-figures/fs/filesys-tree2.png
---
width: 40% 
name: fs:tree-imp
---
Implementation view: hierarchical file system name space. Gray blocks are directories that contain entries that point to files.
```

Modern operating systems follow the UNIX model, which imposes no
structure on a file---a file is merely a sequence of bytes.[^simple] Any
structure to the file (such as a JPEG image, an executable program, or a
database) is the responsibility of applications which read and write the
file. The file format is commonly indicated by a file extension like
.jpg or .xml, but this is just a convention followed by applications and
users. You can do things like rename file.pdf to file.jpg, which will
confuse some applications and users, but have no effect on the file
contents.

% example of a footnote
[^simple]: Almost. Apple OSX uses resource forks to store information associated with a file (HFS and HFS+ file systems only), Windows NTFS provides for multiple data streams in single file, although they were never put to use, and several file systems support file attributes, small tags associated with a file.

Data in a byte-sequence file is identified by the combination of the
file and its offset (in bytes) within the file. Unlike in-memory objects
in an application, where a reference (pointer) to a component of an
object may be passed around independently, a portion of a file cannot be
named without identifying the file it is contained in. Data in a file
can be created by a write which appends more data to the end of a
shorter file, and modified by over-writing in the middle of a file.
However, it can't be "moved" from one offset to another: if you use a
text editor to add or delete text in the middle of a file, the editor
must re-write the entire file (or at least from the modified part to the
end).

**Unix file name translation:** each process has an associated *current
directory*, which may be changed via the `chdir` system call. File names
beginning in '`/`' are termed *absolute* names, and are interpreted
relative to the root of the naming tree, while *relative* names are
interpreted beginning at the current directory. (In addition, `d/..`
always points to the parent directory of `d`, and `d/.` points to `d`
itself.) Thus in the file system in
{numref}`fs:tree-logical`, if the current directory were `/home`,
the the paths `pjd/.profile` and `/home/pjd/.profile` refer to the same
file, and `../bin/cat` and `/bin/cat` refer to the same file.

## File System Operations:

There are several common types of file operations supported by Linux
(and with slight differences, Windows). They can be classified into
three main categories: open/close, read/write, and naming and
directories.

**Open/close**: In order to access a file in Linux (or most operating
systems) you first need to open the file, passing the file name and
other parameters and receiving a *handle* (called a *file descriptor* in
Unix) which may be used for further operations. The corresponding system
calls are:

- `int desc = open(name, O_READ)` - Verify that file `name` exists and may
be read, and then return a *descriptor* which may be used to refer to
that file when reading it.

- `int desc = open(name, O_WRITE | flags, mode)` - Verify permissions and
open `name` for writing, creating it (or erasing existing contents) if
necessary as specified in `flags`. Returns a descriptor which may be
used for writing to that file.

- `close(desc)` - stop using this descriptor, and free any resources
allocated for it.


Note that application programs rarely use the system calls themselves to
access files, but instead use higher-level frameworks, ranging from Unix
Standard I/O to high-level application frameworks.

**Read/Write operations**: To get a file with data in it, you need to
write it; to use that data you need to read it. To allow reading and
writing in units of less than an entire file, or tedius calculations of
the current file offset, UNIX uses the concept of a *current position*
associated with a file descriptor. When you read 100 bytes (i.e. bytes 0
to 99) from a file this pointer advances by 100 bytes, so that the next
read will start at byte 100, and similarly for write. When a file is
opened for reading the pointer starts at 0; when open for writing the
application writer can choose to start at the beginning (default) and
overwrite old data, or start at the end (`O_APPEND` flag) to append new
data to the file.

System calls for reading and writing are:

- `n = read(desc, buffer, max)` - Read `max` bytes (or fewer if the end of
the file is reached) into `buffer`, starting at the current position,
and returning the actual number of bytes `n` read; the current position
is then incremented by `n`.

- `n = write(desc, buffer, len)` - write `len` bytes from `buffer` into
the file, starting at the current position, and incrementing the current
position by `len`.

- `lseek(desc, offset, flag)` Set an open file's current position to that
specified by `offset` and `flag`, which specifies whether `offset` is
relative to the beginning, end, or current position in the file.

[^3]: On Linux the `pread` and `pwrite` system calls allow specifying an
    offset for the read or write; other UNIX-derived operating systems
    have their own extensions for this purpose.


Note that in the basic Unix interface (unlike e.g. Windows) there is no
way to specify a particular location in a file to read or write
from[^3]. Programs like databases (e.g. SQLite, MySQL) which need to
write to and read from arbitrary file locations must instead move the
current position by using `lseek` before a read or write. However most
programs either read or write a file from the beginning to the end
(especially when written for an OS that makes it easier to do things
that way), and thus don't really need to perform seeks. Because most
Unix programs use simple "stream" input and output, these may be
re-directed so that the same program can---without any special
programming---read from or write to a terminal, a network connection, a
file, or a pipe from or to another program.

[^4]: A hard link is an additional directory entry pointing to the same
    file, giving the file two (or more) names. Hard links are peculiar
    to Unix, and in modern systems have mostly been replaced with
    symbolic links (covered next); however Apple's Time Machine makes
    very good use of them: multiple backups can point to the same single
    copy of an un-modified file using hard links.

[^5]: Sort of. If there are multiple hard links to a file, then this
    just removes one of them; the file isn't deleted until the last link
    is removed. Even then it might not be removed yet - on Unix, if you
    delete an open file it won't actually be removed until all open file
    handles are closed.. In general, deleting open files is a problem:
    while Unix solves the problem by deferring the actual delete,
    Windows solves it by protecting open files so that they cannot be
    deleted

**Naming and Directories**: In Unix there is a difference between a name
(a directory entry) and the object (file or directory) that the name
points to. The naming and directories operations are:

- `rename(path1, path2)` - Rename an object (i.e. file or directory) by
either changing the name in its directory entry (if the destination is
in the same directory) or creating a new entry and deleting the old one
(if moving into a new directory).

- `link(path1, path2)` Add a *hard link* to a file[^4].

- `unlink(path)` - Delete a file.[^5]

- `desc = opendir(path)`\
`readdir(desc, dirent*), dirent=(name,type,length)` This interface
allows a program to enumerate names in a directory, and determine their
type. (i.e. file, directory, symbolic link, or special-purpose file)

- `stat(file, statbuf)`\
`fstat(desc, statbuf)` - returns file attributes - size, owner,
permissions, modification time, etc. In Unix these are attributes of the
file itself, residing in the i-node, and can't be found in the directory
entry - otherwise it would be necessary to keep multiple copies
consistent.

- `mkdir(path)`\
`rmdir(path)` - directory operations: create a new, empty directory, or
delete an empty directory.


## Symbolic links

An alternative to hard links to allow multiple names for a file is a
third file system object (in addition to files and directories), a
*symbolic link*. This holds a text string which is interpreted as a
"pointer" to another location in the file system. When the kernel is
searching for a file and encounters a symbolic link, it substitutes this
text into the current portion of the path, and continues the translation
process.

Thus if we have:

<pre>
directory: /usr/program-1.0.1
  file:      /usr/program-1.0.1/file.txt
  sym link:  /usr/program-current -> "program-1.0.1"
</pre>

and if the OS is looking up the file `/usr/program-current/file.txt`, it
will:

1. look up `usr` in the root directory, finding a pointer to the `/usr`
directory
2. look up `program-current` in `/usr`, finding the link with contents
`program-1.0.1`

3. look up `program-1.0.1` and use this result instead of the result from
looking up `program-current`, getting a pointer to the
`/usr/program-1.0.1` directory.

4. look up `file.txt` in this directory, and find it.


Note that unlike hard links, a symbolic link may be "broken"---i.e. if
the file it points to does not exist. This can happen if the link was
created in error, or the file or directory it points to is deleted
later. In that case path translation will fail with an error:

In [3]:
bash.run('''ln -s /bad/file/name bad-link
ls -l bad-link
cat bat-link''')

Output(layout=Layout(border='1px solid black', height='100%', overflow_y='scroll'))

<pre>
pjd-1:tmp pjd$ ln -s /bad/file/name bad-link
pjd-1:tmp pjd$ ls -l bad-link 
lrwxr-xr-x  1 pjd  wheel  22 Aug  2 00:07 bad-link -> /bad/file/name
pjd-1:tmp pjd$ cat bad-link
cat: bad-link: No such file or directory
</pre>

Finally, to prevent loops there is a limit on how many levels of
symbolic link may be traversed in a single path translation:

<pre>
pjd@pjd-fx:/tmp$ ln -s loopy loopy
pjd@pjd-fx:/tmp$ ls -l loopy
lrwxrwxrwx 1 pjd pjd 5 Aug 24 04:25 loopy -> loopy
pjd@pjd-fx:/tmp$ cat loopy
cat: loopy: Too many levels of symbolic links
pjd@pjd-fx:/tmp$ 
</pre>

In [4]:
bash.run('''ln -s loopy loopy
ls -l loopy
cat loopy''')

Output(layout=Layout(border='1px solid black', height='100%', overflow_y='scroll'))

In early versions of Linux (pre-2.6.18) the link translation code was
recursive, and this limit was set to 5 to avoid stack overflow. Current
versions use an iterative algorithm, and the limit is set to 40.

**Device Names vs. Mounting**: A typical system may provide access to
several file systems at once, e.g. a local disk and an external USB
drive or network volume. In order to unambiguously specify a file we
thus need to both identify the file within possibly nested directories
in a single file system, as well as identifying the file system itself.
(in Unix this name is called an *absolute pathname*, providing an
unambiguous "path" to the file.) There are two common approaches to
identifying file systems:


[^6]: Modern Windows systems actually use a mount-like naming convention
    internally; e.g. the `C:` drive actually corresponds to the name
    `\``DosDevices``\``C:` in this internal namespace.

1. Explicitly: each file system is given a name, so that a full pathname
looks like e.g. `C:``\``MyDirectory``\``file.txt` (Windows[^6]) or
`DISK1:[MYDIR]file.txt` (VMS).

2. Implicitly: a file system is transparently *mounted* onto a directory in
another file system, giving a single uniform namespace; thus on a Linux
system with a separate disk for user directories, the file "/etc/passwd"
would be on one file system (e.g. "disk1"), while "/home/pjd/file.txt"
would be on another (e.g. "disk2").


The actual implementation of mounting in Linux and other Unix-like
systems is implemented via a *mount table*, a small table in the kernel
mapping directories to directories on other file systems. In the example
above, one entry would map "/home" on disk1 to ("disk2", "/"). As the
kernel translates a pathname it checks each directory in this table; if
found, it substitutes the mapped file system and directory before
searching for an entry. Thus before searching "/home" on disk1 (which is
probably empty) for the entry "pjd", the kernel will substitute the
top-level directory on disk2,and then search for "pjd".

For a more thorough explanation of path translation in Linux and other
Unix systems see the `path_resolution(7)` man page, which may be
accessed with the command `man path_resolution`.

#### Review Questions

::: enumerate
:::