# Learning Objectives

You will be able to:
* Distinguish file archiving from file compression
* Create archived files, and unpack them.
* Apply commands to compress, decompress, and extract files from archives. 

# Archiving and compression

* Archiving and compression are distinct processes, which are usually combined. 
* Archiving is the process of storing information that you don’t use regularly but want to preserve. 
* An **“archive file”** is a collection of data files and directories that are stored as a single file.
* Archiving makes the collection more portable and serves as a backup in case of loss or corruption.
* File compression, on the other hand, involves reducing the size of a file by taking advantage of redundancy in its information content.
* The main advantages of compression include preserving storage space, speeding up file transfers, and reducing bandwidth loads.

# Directory tree archiving

![image.png](attachment:359f31f1-0ba4-4ca2-9f76-e6f3fdc24196.png)

Now, suppose you have created a **“notes”** directory for keeping track of your course materials.

You decide it would be a good idea to archive your notes in case you need them in the future.

Your **“notes”** directory tree has the following structure:
* It contains two subfolders, called **“math”** and **“physics”**, 
* Each of which contains files called by the same names, **“week1”** and **“week2”**.

Using the `ls` command with the `–r` option, you can recursively list all the directories and files in your current directory tree.

You can see the correspondence with the graphical representation of the tree, starting with the parent **“notes”** directory, the **‘math’** and **‘physics’** subdirectories, and the **‘week1’** and **‘week2’** files within the math and physics folders.

# File archiving and compression

**`tar`**:
* Stands for **Tape Archiver**.
* Archive & extract files.

You can use the **“tar”** or **“tape archiver”** command to **archive** and **de-archive** files and directories.

A popular term for an archived tar file is a **tar ball**.

![image.png](attachment:d9262af8-3e73-4552-8695-8beafa12b718.png)

**`tar -cf notes.tar notes`**:
* To archive your entire notes directory, including its subdirectories and all files within an archived file, called **notes.tar**, followed by the file or directory you wish to archive, which is **notes**.
* The **c** option means **create a new archive**.
* The `f` flag tells tar to interpret its input from the file rather than from the default, which is standard input.

Entering `ls` shows that your current directory now contains the original notes folder as well as the **notes.tar** archive file.

**`tar -czf notes.tar notes`**:
* It would archive & compress the **tar**, by filtering the archive file through a new compression program called **gzip**.
* Adding the suffix **“.gz”** to the output name ensures that Windows-based programs, for example, will correctly recognize the file type.

Entering `ls` now shows the compressed **notes.tar.gz** file that you created.

# Checking your archive contents

**`tar -tf notes.tar`**:
* The `-t` option lists all the files and directories in your tarball, that is, **notes.tar**.

![image.png](attachment:3f3e8376-6d16-4ee1-aac8-de81ad8d89f2.png)

# Extracting archived files

![image.png](attachment:fa345981-2202-460a-82db-d99bddbf27c1.png)

**`tar -xf notes.tar notes`**:
* The `-x` option tells tar to extract files and directory objects from the archive tarball into the destination directory specified.
* Here, it will extract the files & directories from **notes.tar** and put in **notes** folder.

Now, if you enter **`ls -R`**, you can see that the archived notes folder has been de-archived into a parent folder called **notes**. 

Subfolders called **“math”** and **“physics”**, and the **“week1”** and **“week2”** files you started with. 

This verifies that the original structure of your **"notes"** directory is intact. 


# Decompressing and extracting archives

**`tar -xzf notes.tar.gz notes`**: 
* It will decompress **notes.tar.gz** and extract the files from it and put in the **notes** folder.
* Again, by entering `ls -R`, you can see the directories and files have been unpacked as expected.

![image.png](attachment:75665744-7bcc-4cae-8f0b-3abd07987360.png)

# File compression and archiving

![image.png](attachment:6cf0546b-8510-4a0d-948d-d238efac9364.png)

You can use the **zip** command to compress files and directories and package them into a single archive.
* Notice the order of operations that **zip** implements.
* Zip **compresses files before bundling them**, whereas `tar -z` option achieves compression by applying **“gzip”** on the entire **tarball**, but only after bundling it.

To compress your `notes` directory and package it to a `zip` file, enter `zip notes.zip notes`.

After entering `ls`, you can see that the **notes.zip** archive has been created.

# Extracting and decompressing archives

![image.png](attachment:86ff69f7-3987-44fc-ae7b-003c1acc093b.png)

The **unzip** command, as you might guess, extracts compressed files from a zip archive and decompresses them.
* To **unzip** your **notes.zip** file, simply enter `unzip notes.zip`. 
* After entering **`ls -R`**, you can see that **unzip** has created your **notes** folder, and unpacked your directories and your **“week1”** and **“week2”** files, as expected.

# Recap

You learned that:
* The main advantages of file compression include preserving storage space, speeding up file transfers, and reducing bandwidth loads. 
* You can use the **“zip”** command to compress files and directories and package them into a single archive of compressed files.
* You can use **tar** to archive files and directories into a tarball and optionally apply **gzip** compression to the **tarball** file.
* You can use **unzip** to unpack and decompress a zipped archive. 
* Finally, you can use **tar** to decompress and unpack a **tar.gz** archive.