# Sources of Inconsistency: Cached Data Structures

File systems have many data structures that the OS caches to get good performances. Keeping them accurate is easy if we read.

# Writing to Caches

Modified data kept in memory can be lost.   
You can either do:
1. Write-through: write changes immediately back to disk, this method is consistent, but slow. We have to wait for the write to hit the disk and generate an interrupt.  
2. Write-back: delay writing the modified data until the page is replaced in memory. This is better performance, but can cause inconsistencies since the data can be lost in a crash.

If only a single write succeeds, the following scenarios can occur:
1. The data block is written to disk: Data is written, but there is no way to get to it
2. The updated inode is written to disk: If we follow the pointer, we read garbage. Data bitmap says block is free, while inode says it is uesd, must be fixed.  
3. Bitmap is written to disk: Data bitmap says data block is used, but no inode points to it

You need all three writes to succeed.

Several file system operations update multiple global data structures. We need reliability from unreliable parts.

# UNIX approach

To keep metadata consistent, UNIX uses synchronous write-through. If multiple updates are needed, they are performed in a specific order. If a crash occurs, check for inprogress operations and fix up problems, run fsck to scan entire disk for consistency.

# FSCK

Scans entire disk for inconsistencies, prior to update of inode bitmap, writes dissapear. Data block referenced in inode, but not in data bitmap. Update data bitmap.  
File created but not in any directory: delete file.

In order to keep regular data consistent, UNIX uses asynchronous write back for user data. Can lose data written within time interval. 
Does not guarantee blocks are written to disk in any particular order.   
User programs that care about consistency and reliability store new versions of data in temporary files and replace older version only when user commits

Issues: Synchronous writes lead to poor performance, recovery is slow, need to get reasoning exactly right

If we need multiple file operations and need atomicity, we run into issues

# Transactions in File Systems

Most file systems now use write-ahead logging, known as journaling file systems.   
Write all metadata changes to a transaction log before sending any changes to disk. File changes are : update directory, allocate blocks, etc. Transaction are create directory, delete file, etc.   
Eliminates need for fsck after crash.  
In the event of a crash, read the log. If no log, then all updates made it to disk.  
If the log is incomplete, do nothing.   
If the log is commited, apply any changes left to disk.

Issuing 5 writes to log is sequentially slow. Issue all at once and transform in a single sequential write.   
Problem is that a disk can schedule writes out of order. The solution is to set a barrier before TXEnd, which must block until the data is on disk.

Issue with this is you need to write twice. It is however, reliable.

# Copy on Write file systems

Data and metadata are not updated in place, but written to a new location. This transforms random writes to sequential writes.  
Motivations
1. Small writes are expensive
2. Small writes are expensive on RAID
3. Expensive to update a single block, but efficient for entire stripes
4. Caches filter reads
5. Widespread adoption of flash storage
6. Large capacities enable versioning.