# Files, file systems & Bash

This notebook contains exercises for learning about _files_ and _file systems_. The code cells will execute commands in the _Bourne Again SHell_, known simply as `bash`.

__NB:__ These operations may be more informative/natural to perform in an terminal console!

# Files

Files can either be _textual_ or _binary_ (there are cases where both are mixed, but this is rare). The former you can read, the latter is better left for the computer to!

All files are just a sequence of bits (0's and 1's) saved to some _persistent medium_. Bits are grouped together into "words" of 8 bits: this unit is known as a "_byte_".

## File system

A system of files, organised in folders, spread across some medium that can store bytes in a way that when the power is cut, the information remains (to some limits). We often distinguish between:

* local file system: the medium is built into the computer in question
* removable file system: _e.g._ a USB hard- or thumb drive
* network file system: non-local, accessed via a network, physically resides on _some other computer_

# Directory navigation & file manipulation

In [None]:
ls

In [None]:
pwd

In [None]:
cd ..

In [None]:
pwd

In [None]:
cd exercises

In [None]:
pwd

## What type is file X?

Note that the _suffix_ of a file does not necessarily carry _any_ information on its contents! Some suffixes, such as `.txt` are considered "standardised", but the operating system does not enforce any rules on the naming of files.

In [None]:
file short_textual_file.xyz

In [1]:
file long_textual_file.xyz

long_textual_file.xyz: UTF-8 Unicode text


## ASCII and Unicode

American Standard Code for Information Interchange. Computers can only understand numbers, so an ASCII code is the numerical representation of a character such as 'a' or '@' or an action of some sort. See more information [on the Wiki](https://en.wikipedia.org/wiki/ASCII).

Unicode...

## Displaying the contents of a textual file in bash

It is often most efficient to view the contents of small files in the terminal, instead of a graphical text editor. 

In [2]:
cat short_textual_file.xyz

# The hash-sign is often considered a "comment"

Lines without hash-signs may carry important information, such as: 42

Encoding of "non-ASCII" characters can sometimes be challenging: å, æ, ø, ä, ö.


For longer files, the `less` command is very handy.

__NB__ This will not work in the `notebook`, must be run in a terminal!

Some navigation commands include:

* [arrow keys]: move up/down a line at a time
* [space or n]: move down a page (next page)
* [b]: move up (back) a page
* [q]: quit viewing the file

## Binary

The term "binary" simply refers to the fact that data in the file is written as a _sequence of bits_. To contrast this to a textual file, we could say:

> Textual files have a pre-defined encoding of the bytes it contains, _e.g._, as ASCII characters ('a' is saved as the byte `0x61`, which in decimal numbers equates to 97).

> To extract data from a binary file, it is necessary to know how the bytes were encoded. Otherwise it's gibberish!

In [1]:
# This kills the notebook :(
# Could be prevented due to a security risk due to arbitrary code execution?
#cat binary_image_file.jpg



### Header files

Usually: textual files that define the encoding of a binary data file.