# 2. Working with files and directories

<hr>
<center>This is part 2 of 5 of an <a href="00-unix-intro.ipynb" target="_blank">introduction to Unix</a>, it assumes <a href="01-unix-intro.ipynb" target="_blank">part 1</a> was done first.</center>
<hr>

> **Things covered here:**
> * Working with files
> * What a plain-text file is
> * Working with directories

---

**To be sure we are starting in the right place, let's run this first (and if you need to restart your kernel, you will need to re-run this command):**

In [None]:
cd ~/unix_intro

---

## Working with plain-text files and directories
Next we're going to look at some more ways to learn about and manipulate file and directories at the command line.

### Working with files
We will often want to get a look at a file to see how it's structured or what's in it. We've already used one very common tool for peeking at files, the `head` command. There is also `tail`, which prints the last 10 lines of a file by default:

In [None]:
head example.txt

In [None]:
tail example.txt

Programs like this can be especially helpful if a file is particularly large, as `head` will just print the first 10 lines and stop. This means it will be just as instananeous whether the file is 10kB or 10GB.

Another standard useful program for viewing the contents of a file is `less`, which opens a searchable, read-only program that allows us to scroll through the file. But `less` doesn't work in a notebook setting like this, so we'll try it out when we switch to a terminal later.

The `wc` command (for **w**ord **c**ount) is useful for counting how many lines, words, and characters there are in a file: 

In [None]:
wc example.txt

<div class="alert alert-block alert-success">
<center><b>QUICK PRACTICE!</b></center>

In the above open code cell, try to find an option/argument that will let us get <i>only</i> the number of lines in a file from the `wc` command.

<details>
<summary><b>Solution</b></summary>

<br>

`wc -l example.txt`

Adding the optional flag `-l` will print just how many lines are in a file. We could find this out by running `wc --help` if we could find it in the help menu or by visiting our good friend Google 🙂 

</details>
</div>

The most common command-line tools like these and many others we'll see are mostly only useful for operating on what are known as **plain-text files** – also referred to as "flat files". 

#### BONUS ROUND: What's a plain-text file?
A general definition of a plain-text file is a text file that doesn't contain any special formatting characters or information, and that can be properly viewed and edited with any standard text editor.  

Common types of plain-text files are those ending with extensions like ".txt", ".tsv" (**t**ab-**s**eparated **v**alues), or ".csv" (**c**omma **s**eparated **v**alues). 

Some examples of common file types that are *not* plain-text files would be ".docx", ".pdf", or ".xlsx". This is because those file formats contain special types of compression and formatting information that are only interpretable by programs specifically designed to work with them.

<div class="alert alert-block alert-info">
    <b>A note on file extensions</b>
    <br>
    File extensions themselves do not actually do anything to the file format. They are <i>mostly</i> there just for our convenience/organization – "mostly" because some programs require a specific extension to be present for them to even try interacting with a file. But this has nothing to do with the file contents, just that the program won't let us interact with it unless it has a specific extension.
</div>

---

### Copying, moving, and renaming files

<div class="alert alert-block alert-warning">
    <b>WARNING!</b>
    <br>
Using commands that do things like create, copy, and move files at the command line will overwrite files if they have the same name in the same location. And using commands that delete things will do so permanently. Use caution while getting used to things – and then forever after 🙂
</div>

#### `cp`

The commands `cp` and `mv` (**c**o**p**y and **m**o**v**e) have the same basic structure. They both require two positional arguments – the first is the file we want to act on, and the second is where we want it to go (which can include the name we want to give it). 

To see how this works, let's make a copy of "example.txt". Here's what is in our current working directory initially:

In [None]:
ls

Then here is how we can make a copy, naming it what we want the copy to be when we run this based on what we put as the second positional argument:

In [None]:
cp example.txt example_copy.txt

And then we can see that copy was made by running `ls` again:

In [None]:
ls

By just giving the second argument a file name and nothing else (meaning no path or directories in front of the name), we are implicitly saying we want it copied to where we currently are. 

To make a copy and put it somewhere else, like in our subdirectory "data", we could change the second positional argument using a **relative path** ("relative" because it starts from where we currently are).

Let's look what's in the data sub-directory, then let's make a copy of "example.txt" in that sub-directory:

In [None]:
ls data

In [None]:
cp example.txt data/example_copy.txt

In [None]:
ls data

To copy it to that sub-directory but keep the same name, we could type the whole name out, but we can also just provide the directory and leave off the file name, and it will automatically keep the same original name:

In [None]:
cp example.txt data/

In [None]:
ls data

If we wanted to copy something *from somewhere else to our current working directory* and keep the same name, we can use another special character, a period (`.`), which specifies the current working directory. So here for the second positional argument, we just provide `.`:

In [None]:
ls

In [None]:
cp experiment/notes.txt .

In [None]:
ls

#### `mv`

The `mv` command is used to **m**o**v**e files. Let's move the "example_copy.txt" file into the "experiment" subdirectory:

In [None]:
ls

In [None]:
ls experiment

In [None]:
mv example_copy.txt experiment/

Now it is no longer here:

In [None]:
ls

But is in the "experiment" sub-directory:

In [None]:
ls experiment

The `mv` command is also used to *rename* files. This may seem strange at first, but remember that the path (address) of a file actually includes its name too (otherwise everything in the same directory would have the same path). So when we change the file name of something, we are also changing it's path/address.

In [None]:
ls

In [None]:
mv notes.txt notes_old.txt

In [None]:
ls

#### `rm`

To delete files there is the `rm` command (**r**e**m**ove). This requires at least one argument specifying the file we want to delete. But again, caution is warranted. There will be no confirmation or retrieval from a waste bin afterwards.

In [None]:
ls

In [None]:
rm notes_old.txt

In [None]:
ls

---

### Working with directories
Commands for working with directories for the most part operate similarly. 

#### `mkdir`

We can make a new directory with the command `mkdir` (for **m**a**k**e **dir**ectory): 

In [None]:
ls

In [None]:
mkdir subset

In [None]:
ls

#### `rmdir`

And similarly, directories can be deleted with `rmdir` (for **r**e**m**ove **dir**ectory):

In [None]:
rmdir subset/

In [None]:
ls

The command line is a little more forgiving when trying to delete a directory. If the directory is not empty, `rmdir` will give us an error:

In [None]:
rmdir experiment/

---

## Summary
So far we've only seen individual commands and printing information to the screen. This is useful for in-the-moment things, but not so much for getting things done. Next we're going to start looking at some of the things that make the command line so versatile and powerful, starting with "redirectors" and "wildcards" in [03-unix-intro.ipynb](03-unix-intro.ipynb).

**Commands introduced:**

|Command     |Function          |
|:----------:|------------------|
|`head`      |prints the first lines of a file|
|`tail`      |prints the last lines of a file|
|`less`      |allows us to browse a file (exit with `q` key)|
|`wc`       |count lines, words, and characters in a file|
|`cp`      |copy a file or directory (use with caution)|
|`mv`      |mv a file or directory (use with caution)|
|`rm`      |delete a file or directory (use with caution)|
|`mkdir`       |create a directory|
|`rmdir`     |delete an empty directory|
|`nano`     |create and edit plain text files at the command line|


<br>

**Special characters introduced:**

|Characters     | Meaning          |
|:----------:|------------------|
| `.` | specifies the current working directory |


<br>

---
---

<a href="01-unix-intro.ipynb"><b>Previous:</b> 1. Getting started</a>

<div align="right"><a href=”03-unix-intro.ipynb”><b>Next:</b> 3. Redirectors and wildcards</a></div>

---

<div class="alert alert-block alert-info" align="center">
<font size="-1">This is a notebook implementation of the <a href="https://astrobiomike.github.io/unix/unix-intro" target="_blank">Unix introduction</a> from <a href="https://astrobiomike.github.io" target="_blank">Happy Belly Bioinformatics.</a></font>
</div>
    
---