# File System
O.S. layer that controls how data is stored and retrieved
![](FS.svg)

 - Example of no File System at all: Recording music to a tape
     - Only data is stored
 - Example of File System: MP3 player with an album in each folder and a song in each file.
     - Data + Metadata
 

# Files & Directories
There are 2 types of data stored in file systems:
 - Files
     - The real and useful data
 - Directories (Or Folders)
     - An special type of file that contain the names and ubication on disk of other Files and Directories

Directory structure in a file system is hierarchical (Tree fashion)
![](FilesAndFolders.png)
[De FrancoGG - Trabajo propio, CC BY-SA 3.0](https://commons.wikimedia.org/w/index.php?curid=766938)
 

In [1]:
! tree root

[01;34mroot[00m
├── [01;34mbranch_1[00m
│   ├── [01;34mbranch_A[00m
│   │   ├── leave.dat
│   │   └── leave.txt
│   ├── [01;34mbranch_B[00m
│   │   ├── leave.doc
│   │   ├── [01;35mleave.gif[00m
│   │   └── leave.txt
│   └── [01;34mbranch_C[00m
│       ├── [01;34mbranch_1C1[00m
│       │   └── leave.txt
│       ├── [01;34mbranch_1C2[00m
│       │   ├── leave.dat
│       │   └── leave.doc
│       ├── [01;35mleave.jpeg[00m
│       └── leave.txt
└── [01;34mbranch_2[00m
    ├── [01;34mbranch_A[00m
    │   ├── [01;34mbranch_2A1[00m
    │   │   ├── [01;35mleave.jpeg[00m
    │   │   ├── leave.old
    │   │   └── leave.txt
    │   ├── [01;34mbranch_2A2[00m
    │   │   ├── leave.doc
    │   │   ├── [01;35mleave.jpeg[00m
    │   │   └── leave.txt
    │   ├── leave.doc
    │   └── leave.txt
    └── [01;34mbranch_B[00m
        └── leave.txt

11 directories, 19 files


# Paths
Access to a particular file is done specifying it's path. There are two ways of referencing a file:
 - Absolute path
     - `/home/ruben/Documents/curso\ 2023/FS\ vs\ DB/root/branch_1/branch_A/leave.txt`
 - Relative path:
     - From `/home/ruben/Documents/curso\ 2023/FS\ vs\ DB`
         - `root/branch_1/branch_A/leave.txt`
     - From `/home/ruben/Documents/curso\ 2023/FS\ vs\ DB/root/branch_2`
         - `../branch_1/branch_A/leave.txt`
     - From `/home/ruben/Documents/curso\ 2023/FS\ vs\ DB/root/branch_1/branch_A/`
         - `leave.txt`
         - `./leave.txt`

Not to confuse with $PATH environment variable:
 - \$PATH is a list of directories where the system will search for executables when a command is issued on the terminal.


In [5]:
!ls /home/ruben/bin/*in*

/home/ruben/bin/accounting		  /home/ruben/bin/pringer_time.py
/home/ruben/bin/hostinfo		  /home/ruben/bin/shrinkpdf.sh
/home/ruben/bin/nuredduna-monitoring.txt


In [6]:
!/home/ruben/bin/hostinfo

8 Cores - 15862 MB


In [7]:
!echo $PATH|grep --color /home/ruben/bin

/home/ruben/.pyenv/versions/curso_python_23/bin:/home/ruben/.pyenv/libexec:/home/ruben/.pyenv/plugins/python-build/bin:/home/ruben/.pyenv/plugins/pyenv-virtualenv/bin:/home/ruben/.pyenv/plugins/pyenv-update/bin:/home/ruben/.pyenv/plugins/pyenv-installer/bin:/home/ruben/.pyenv/plugins/pyenv-doctor/bin:/home/ruben/.pyenv/plugins/pyenv-virtualenv/shims:/home/ruben/.pyenv/shims:/home/ruben/.pyenv/bin:/common/opt_intel/oneapi/vtune/2021.3.0/bin64:/common/opt_intel/oneapi/vpl/2021.2.2/bin:/common/opt_intel/oneapi/mkl/latest/bin/intel64:/common/opt_intel/oneapi/itac/2021.2.0/bin:/common/opt_intel/oneapi/itac/2021.2.0/bin:/common/opt_intel/oneapi/inspector/2021.2.0/bin64:/common/opt_intel/oneapi/dpcpp-ct/2021.2.0/bin:/common/opt_intel/oneapi/dev-utilities/2021.2.0/bin:/common/opt_intel/oneapi/debugger/10.1.1/gdb/intel64/bin:/common/opt_intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/llvm/aocl-bin:/common/opt_intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/bin:/common/opt_intel/oneapi/compile

In [8]:
!hostinfo

8 Cores - 15862 MB


# Inodes
Inodes are a very important part of file systems. For each file or directory of the file system structure, an inode is created.
 - Inodes store useful information about the data.
     - Owner
     - Access/creation/modification times
     - Permissions
     - The first data block 
 - The number of inodes available is usually fixed on file system creation
     - A very large amount of folders, or very small files (even empty) would exhaust the number of inodes. That would lead to: "No space left on device"
     


In [5]:
!#mkdir disk1m
!#dd if=/dev/zero of=disk1m.img bs=1k count=1k
!#mkfs.ext4 disk1m.img 
!#sudo mount -o loop disk1m.img disk1m/
!#sudo chown -R ruben.users disk1m/

In [12]:
## !tree disk1m
!echo -e "\nSIZE OF FILE SYSTEM\n-------------------"
!df -h disk1m
!echo -e "\nINODES\n------"
!df -i disk1m


SIZE OF FILE SYSTEM
-------------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop20     992K  8.0K  916K   1% /home/ruben/Documents/curso 2023/01_Storing_Data/disk1m

INODES
------
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/loop20       128    10   118    8% /home/ruben/Documents/curso 2023/01_Storing_Data/disk1m


In [10]:
!mkdir disk1m/directory1
!mkdir -p disk1m/directory2/dirA
!mkdir -p disk1m/directory2/dirB disk1m/directory2/dirC disk1m/directory2/dirD
!for i in `seq 1 120`; do touch disk1m/directory1/file_$i; done

mkdir: cannot create directory ‘disk1m/directory1’: Bad message
mkdir: cannot create directory ‘disk1m/directory2’: Bad message
mkdir: cannot create directory ‘disk1m/directory2’: Bad message
mkdir: cannot create directory ‘disk1m/directory2’: Bad message
mkdir: cannot create directory ‘disk1m/directory2’: Bad message
touch: cannot touch 'disk1m/directory1/file_1': Bad message
touch: cannot touch 'disk1m/directory1/file_2': Bad message
touch: cannot touch 'disk1m/directory1/file_3': Bad message
touch: cannot touch 'disk1m/directory1/file_4': Bad message
touch: cannot touch 'disk1m/directory1/file_5': Bad message
touch: cannot touch 'disk1m/directory1/file_6': Bad message
touch: cannot touch 'disk1m/directory1/file_7': Bad message
touch: cannot touch 'disk1m/directory1/file_8': Bad message
touch: cannot touch 'disk1m/directory1/file_9': Bad message
touch: cannot touch 'disk1m/directory1/file_10': Bad message
touch: cannot touch 'disk1m/directory1/file_11': Bad message
touch: cannot touc

In [74]:
!rm -r disk1m/*

# Using the File System from your program
- Open (a path must be provided)
    - Access (Read or write)
- Close
    
Example: Opening the file `/home/ruben/Documents/curso\ 2023/FS\ vs\ DB/root/branch_1/branch_A/leave.txt` means:
 - Reading the first inode in the path (/home/
 - Check permissions for that directory
 - Get the inode number for the next directory in the path (/home/ruben/)
 - Check permissions... and repeat till the last directory.
 - Get the disk block corresponding to the file `leave.txt`
 - Start reading or writing as needed.
 
While the file remains "open" in your program, you keep on the main memory the pointer to that particular file on disk, and some buffers/caches that accelerate & optimize the access to the file. 

That's why it is **very important** to close the files when you finish operations on it.


In [13]:
file = open('file.txt','r')
data = file.read()

print(data)
print("doing something more")
print("...and forgot closing...")



file.close()


Esto es todo amigos

doing something more
...and forgot closing...


In [None]:
with open('file.txt','r') as file:
    data=file.read()
print(data)
print("doing something more")
print("...but file was already closed...")


# Compression

Compression is possible thanks to redundancy inherent to data.

A loosless compression algorithm analizes the input data and tries to remove redundancy storing only the 

**Internet Slang compression example:**
 * By the way, for your information: As far as I know, if you read the fine manual as soon as possible you'll understand.
 * BTW, FYI: AFAIK, if you RTFM ASAP you'll understand.

**Pros and cons**
 * Pros:
   * Lower space occupied on disk
   * Faster data transfers

 * Cons:
   * Higher CPU utilization
   * Increase time to access the compressed data

Using compression to store data is a trade-off. Before deciding we shoud consider:
 * How much "compressible" is my data (Compress ratio)
 * How often will I have to access/modify that data
 * Compress all data in a big file / or is it better to divide it in chunks

Most frequent compression algorithms:

 * **Gzip** (1992 - most popular, very low memory footprint)
 * **Bzip2** (1996 - Better compression, CPU intensive, More memory)
 * **Zstandard** (2015 - Better compression than GZIP, much faster, specially decompressing. Memory intensive)

In [14]:
import gzip
import bz2
import zstandard as zstd

data = """
Demasiadas palabras y muy poco espacio en disco.

    Vamos a guardar esto comprimido.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla sagittis ante. Sed sed nisl facilisis, eleifend 
libero sit amet, sollicitudin risus. Nunc ex urna, pellentesque 
quis tincidunt quis, sagittis ut lacus. Praesent cursus semper enim,
et varius augue sollicitudin nec. Nam quis neque in nunc tristique 
hendrerit sit amet sit amet quam. Nam interdum ligula 
sit amet tincidunt aliquet. Pellentesque vehicula tristique luctus.
Donec nisi nunc, hendrerit et scelerisque ac, gravida sed metus.   
"""

with open("compress.txt", "w") as file:
    file.write(data)

with gzip.open("compress.gz", "w") as file:
    file.write(data.encode())

with bz2.open("compress.bz", "w") as file:
    file.write(data.encode())

with zstd.open("compress.zstd", "w") as file:
    file.write(data)

In [15]:
!ls -l compress.*
!file compress.*

-rw-r--r-- 1 ruben users 379 Mar 27 09:55 compress.bz
-rw-r--r-- 1 ruben users 362 Mar 27 09:55 compress.gz
-rw-r--r-- 1 ruben users 603 Mar 27 09:55 compress.txt
-rw-r--r-- 1 ruben users 349 Mar 27 09:55 compress.zstd
compress.bz:   bzip2 compressed data, block size = 900k
compress.gz:   gzip compressed data, was "compress", last modified: Mon Mar 27 07:55:54 2023, max compression, original size modulo 2^32 603
compress.txt:  ASCII text
compress.zstd: Zstandard compressed data (v0.8+), Dictionary ID: None


In [18]:
with gzip.open("compress.gz", "r") as file:
    output = file.read()
print(output.decode())


Demasiadas palabras y muy poco espacio en disco.

    Vamos a guardar esto comprimido.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla sagittis ante. Sed sed nisl facilisis, eleifend 
libero sit amet, sollicitudin risus. Nunc ex urna, pellentesque 
quis tincidunt quis, sagittis ut lacus. Praesent cursus semper enim,
et varius augue sollicitudin nec. Nam quis neque in nunc tristique 
hendrerit sit amet sit amet quam. Nam interdum ligula 
sit amet tincidunt aliquet. Pellentesque vehicula tristique luctus.
Donec nisi nunc, hendrerit et scelerisque ac, gravida sed metus.   



# File Systems Limitations
 - Storage is not structured/ordered
     - You can store any kind of data of any size in any file
     - You can even store data and folders containing other type of data on the same folder. 
     - Structuring and ordering the data depends on the user
 - Storage is not indexed
     - Searching on the FS tree means walking through all the directories and files.
 - Coherence and duplicity
     - User can store the same data in different locations, or even contradictory data.
 
 ## One big file or many little files?
 - If I store all my data in a big file, I will need to load all the big file every time I need to read something... And my computer may crash
 - But if i store it in little files & folders... accessing many of them will slow down everything
 
 The answer: Structured data, Data storing/formating libraries, Databases