# File system functionality

Provides the file naming organization such as directories, manage disk layout, pick the blocks that constitute a file, balance locality with expandability, manage free space.  
File system translates from file name and offset ot data block

File Metadata - file header describes where the file is on disk and the attributes of the file such as owner id, size, permissions, last modified time, and location of all data blocks. Metadata blocks are stored at a location known by the OS so they can be accessed without checking another data structure.  
Data: the contents that users actually care about, directory data blocks that map file names to file headers, file data blocks that contain file data

# Designing the File Layout

We need to support sequential and random access.  
We need to lay out the files on the physical disk.   
We need to maintain file location information. 

Most files are small, so we need support for small files, block size can't be too large, so that we don't have too much internal fragmentation.  
Most disk space is consumed by large files, so we must allow large files, and large file access should be reasonably efficient.   
I/O operations target both small and large files.

The OS may choose to use a large block size than the sector size of the physical disk. Each block consists of consecutive sectors so that you have sequential access. A larger block size increases transfer efficiency also because of sequential acess, don't have to move the head much, it may be convenient if the block size matches the machine's page size, this is because we don't have to switch pages.  
Most systems allow transferring of many sectors between interrpts.

# File Layout

# Contiguous Allocation

OS maintains an ordered list of free disk blocks, OS allocates a contiguous chunk of free blocks when it creates a file. The location information in the file header need only contain the start location and the size.  
Advantages: simple.   
Disadvantages: changing file sizes, fragmentation

All file data stored contiguously on disk, file header specifies block and length, best performance for the initial write of a file. Once space has been allocated, later writes may cause the file to grow which would require it to be copied and moved.

# Linked Allocation

File stored as a linked list of blocks. In the file header, keep a pointer to the first and last sector/block allocated to that file. In each sector keep a pointer to the next sector.   
2 implementations: 
1. Linked list of disk blocks, data blocks point to other blocks. 
2. linked list in a table (FAT)

# FAT File System

FAT (File allocation table)  
Parts:   
1. Index structures- master file table, array of 32 bit entries, each element in the array represents a data block in the system, each file represented as an embedded linked list of the entries in the master file table. file nmber = index of first FAT entry, FAT entry will have number of next FAT entry, etc, etc
2. Free space map- If data block is free, then it will be 0. Find free blocks by scanning over MFT
3. Locality heuristics- As simple as next fit, scan sequentially from last allocated entry and return next free entry, can be improved through defragmentation, moving file data around so it is stored more contiguously on disk

Easy to implement, but poor random access, limited access control, no support for hard links, volume and file size are limited.

# Direct Allocation

File header points to each data block.  
Advantage: easy to create, grow, shrink files, little fragmentation, supports random access  
Disadvantages: File header is big or variably sized

# Indexed Allocation

OS keeps an array of block pointers for each file in a non data block called the index block. OS allocates an array to hold the pointers to all the blocks when it creates a file but allocates the blocks only on demand. OS fills in the pointers as it allocates blocks.  
Advantages: supports both types of access, not much fragmentation.  
Disadvantages: Maximum file size, lots of seeks since data is not contiguous

Create a non data block for each file called the index blocks that contains a list of pointers to file blocks. The number of pointers is based on size of poitners and size of block. File header contains a pointer to the index block. FIle header has no direct knowledge of where the information is on disk.

Handling index blocks is througha. linked index block, index blocks could point to other index blocks.  
Can also use a mult level index block, file header points to an index block, which has pointers to other index blocks, which hold pointers to data blocks. This method can grow in levels to support larger files.

# Multilevel indexed files

Each file is a fixed asymmetric tree with fixed size file blocks as its leaves. The root of the tree is the file's inode, contains the file's metadata, contains a set of pointers. First 10 point to data blocks, last three point to intermediate blocks.

# FFS (Fast file system)

Used by UNIX, smart index structure, multilevel index allows to locate all blocks of a file.   
Uses locality heuristics, block group palcement, optimizes placement for when a file data and metadata and other files within same directory are accessed together.   
Reserved space- gives up storage to allow flexibility needed to achieve locality

# Directories

Directories are just a file that contains a collection of mappings from file name to a file number. The file number is an inode number.   
Only the OS can modify directories. This ensures the integrity of mappings and application programs can read directories.

Naive solution is to use one nmae space for the entire disk. If one use uses a name, noone else can.   
User based strategy: Each user has a separate directory, but all of each user's files must still have unique names.   
Multilevel directories: tree structures hierarchical name space, store directories on disk, just like files, except there is a special flag bit for directories.  
User programs can read directories like any other file, but only special system calls can write directories. Each directory contains the name file number pairs, and there is one root directory.

To find a block of a file, find the file header, it contains pointers to file blocks, and to find a file header, we need its inumber. To find inumber, read the directory that contains the file, and to find the directory, we need to find a file.

OS can cache current working directory, users can now specify relative file names. This is a direct optimization.

# File System Layout

Components of the entire disk:
1. MBR - Master boot record
2. Partition table- contains the addresses of first and last blocks of each partition
3. Disk partition
  
Components of each partition:
1. Boot block
2. super block
3. free space management
4. inodes
5. root directory
6. files and directories. 
  
Components of a super block
1. File system type
2. file system size
3. key parameters of a system
4. other administrative info

# FFS Locality: Block Groups

Divide parition into block groups, distribute metadata, distribute free space bitmap and inode array among block groups.  
Place file in block group, when a new file is created FFS looks for inodes in the same blokcas the file's directory.  
When a new directory is created FFS places it in a different block from the parent's directory  
Place data blocks

When a disk is close to full, hard to optimize locality, file may end up scattered through disk. FFS presents applications with a smaller disk (10% reserved space), user write that encroaches on reserved space fails, super user still able to allocate inodes.

# NTFS

Index structure: Extents and flexible tree  
Extents: track ranges of contiguous blocks rather than single blocks.   
Flexible tree: File represented by variable depth tree, MFT- array of 1KB records holding the trees' roots, similar to inode table. Each record stores sequence of variable sized attribute records

NTFS - basic file with 2 data extents, has master file table that holds all file headers, which area called MFT records.  
Record conains information, file name, data (resident) and free space. 
If a record is resident, it contains data inside the data segment of the record.  
Otherwise, an attribute list contains pointers to many pointers, or pointers to extents

Small file has data resident in the record, medium files have a single record with pointers to extend, and large files can span many records, which have pointers to extents, and pointers to next records, and for really large files, it can span multiple records.  

NTFS stores most metadata in ordinary files with well-known numbers.   
9 stores access control list for every file, indexed by fixed length key. Files store appropriate key in their MFT record.   
The MFT is file number 0, and to read it, you need to know the first entry of MFT, and a pointer to it is stored in the first sector of NTFS. It can start small and grow dynamically. To avoid fragmentation, NTFS reserves part of start of volume for MFT expansion.

NTFS takes advantage of locality, in that it finds the smallest region large enough to fit file. It caches allocation status for a small area of disk, and writes that occur together in time get clustered together.   
SetEndOfFile() lets users specify expected length of file at creation.