# Episode 3 - Working with Files and Directories
This notebook is based on a snapshot of [Episode 3](https://kmichali.github.io/SC-shell-novice/03-create/index.html) of the [Unix Shell lesson](https://kmichali.github.io/SC-shell-novice/) from the [Software Carpentry](https://software-carpentry.org). The original material has more detail.

### Questions:
- How can I create, copy, rename and delete files and directories?

### Objectives:
- Create new files and directories
- Copy file and move files and directories
- Use delete safely

<hr style="border: solid 1px red; margin-top: 1.5% ">

### Video
Learn with video:
- [part 1](https://imperial.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=94a14b5d-dad4-49f9-9fd2-abd50118f1b2)
- [part 2](https://imperial.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=450d3756-0838-473f-82e1-abd5011eb442)

### Practice data in Google Colab
If you are viewing this notebook in Colab and have saved it in your Drive ("File"->"Save a copy in Drive"), run the cell below to download practice data.


In [None]:
%%bash
[ -e data-shell ] && echo "data already exists" || { wget https://kmichali.github.io/SC-shell-novice/data/data-shell.zip; unzip data-shell.zip; } 

<hr style="border: solid 1px red; margin-top: 1.5% ">

If you just opened this notebook, your current directory is most likely to be **`somewhere_in_your_filesystem/GS_comm_line/notebooks`**. Let's start by changing directory to **`data-shell`**.  

Note: This time, we will skip the %%bash magic command so the new current working directory persists.

In [None]:
%%bash
pwd

In [None]:
cd data-shell



## Make a new directory
<hr style="border: solid 1px gray; margin-top: 1.5% ">

Let’s create a new directory called thesis using the command **`mkdir thesis`**. This command produces no output, so we have to check with **`ls`**.


In [None]:
%%bash 
mkdir thesis
ls -F

Change working directory to **`thesis`**.

In [None]:
cd thesis


## Two ways of doing  the same thing

On your laptop, you can make a new directory (or folder) using a file explorer (File Explorer or Finder). This directory will appear on the command line as well.  Conversely, if you make a new directory with **`mkdir`**, you will see it in the file explorer.


## Good names for files and directories


Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.

- **Don’t use spaces.**

Spaces are used to separate arguments on the command line it is better to avoid them in names of files and directories. You can use **`-`** or **`_`** instead (e.g. **`north-pacific-gyre/`** rather than **`north pacific gyre/`**).

- **Don’t begin the name with `-`(dash).**

Commands treat names starting with **`-`** as flags.

- **Stick with letters, numbers, `.` (period or ‘full stop’), `-` (dash) and `_` (underscore).**

Many other characters have special meanings on the command line and cause your command to not work as expected and can even result in data loss.


If you need to refer to names of files or directories that have spaces or other special characters, you should surround the name in quotes.



## Creating files
<hr style="border: solid 1px gray; margin-top: 1.5% ">

The **`thesis`** directory is empty. Let's create an empty file called **`draft.txt`** with **`touch`**.

Use **`ls -lh`** to see the file and note the file is empty.

In [None]:
%%bash
touch draft.txt
ls -lh


## Editing a file

Later, we will learn about a simple text editor **`nano`** for the command line.  For now, we'll use **`echo`** (similar to print) to add a bit of text to **`draft.txt`**.

We can show the file content with **`cat`** (catalogue).

In [None]:
%%bash
echo "It is not publish or perish, it is share and thrive." > draft.txt
cat draft.txt

## Practice

In the cell below, make another empty file **`notes.txt`**.

In [None]:
%%bash
# add your command here

ls 

## File extensions

We named the file **`.txt`**.  This is called the **filename extension**.  It does not change the file but it is a good convention and helps us keep different types of files apart.

## Moving and renaming files and directories
<hr style="border: solid 1px gray; margin-top: 1.5% ">

Files can be renamed and moved between directories.

In the next example **`mv draft.txt quotes.txt`**, the command **`mv`** (move) will rename **`draft.txt`** to **`quotes.txt`**.  The command produces no output, and **`ls`** will show that there is only one file **`quotes.txt`** in the current directory.

In [None]:
%%bash
mv draft.txt quotes.txt
ls

The command takes two arguments - **target filename and destination**.  These can include directory names as well. In the next example, we'll return to the **`data-shell`** directory and then rename **`quotes.txt`** back to **`draft.txt`** using the relative paths to specify target file and destination.

In [None]:
cd ../

In [None]:
%%bash 
mv thesis/quotes.txt thesis/draft.txt 
ls thesis

In the next example, we'll move **`draft.txt`** to **`data-shell`** (that is the current working directory).

In [None]:
%%bash 
mv thesis/draft.txt .
ls draft.txt

The move command can be used with more than two arguments if the last argument specifies a directory.

## Exercise 1
<hr style="border: solid 1px gray; margin-top: 1.5% ">

After running the following commands, Jamie realizes that she put the files sucrose.dat and maltose.dat into the wrong folder. The files should have been placed in the raw folder.

```
$ ls -F
 analyzed/ raw/
$ ls -F analyzed
fructose.dat glucose.dat maltose.dat sucrose.dat
$ cd analyzed
```

Fill in the blanks to move these files to the raw/ folder (i.e. the one she forgot to put them in).

```
$ mv sucrose.dat maltose.dat __/__
```

Solution can be found at the end of this notebook.


## Copying files and directories
<hr style="border: solid 1px gray; margin-top: 1.5% ">

The **`cp`** (copy) command works similarly to  **`mv`** except it makes a copy of a file instead of moving it.  The following exaple takes  **`draft.txt`** and makes a copy in  **`thesis`** under a new name  **`initial_draft.txt`**.

In [None]:
%%bash
cp draft.txt thesis/initial_draft.txt
ls thesis

It is possible to copy a directory and its contents by using  **`cp`** with a recursive flag  **`cp -r`**.  The following example makes a copy of  **`thesis`** and its contents under a new name  **`thesis_backup`**.

In [None]:
%%bash
cp -r thesis thesis_backup
ls thesis thesis_backup/

## Exercise 2
<hr style="border: solid 1px gray; margin-top: 1.5% ">

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt

After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

1. **`cp statstics.txt statistics.txt`**
1. **`mv statstics.txt statistics.txt`**
1. **`mv statstics.txt .`**
1. **`cp statstics.txt .`**

Solution can be found at the end of this notebook.

## Exercise 3
<hr style="border: solid 1px gray; margin-top: 1.5% ">

What is the output of the closing ls command in the sequence shown below?  To start with, the current directory is **`Users/jamie/data`** and it contains one file **`proteins.dat`**.

```
$ mkdir recombine
$ mv proteins.dat recombine/
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls

```
What is the output of the command sequence above?

1. **`proteins-saved.dat recombine`**
2. **`recombine`**
3. **`proteins.dat recombine`**
4. **`proteins-saved.dat`**

Solution can be found at the end of this notebook.

## Removing files and directories
<hr style="border: solid 1px gray; margin-top: 1.5% ">

The **`rm`** (remove) command removes files and directories permanently.  There is no easy way of recovering a file that has been deleted with **`rm`**.

Let's delete **`draft.txt`** from the current directory.

In [None]:
%%bash
rm draft.txt
ls -F

A directory that is not empty can be removed with recursive **`rm`**.  To remove **`thesis_backup`** in the current directory use:

In [None]:
%%bash
rm -r thesis_backup
ls -F

## Use rm with caution

You can appreciate that the **`rm`** command, if used incorrectly, could cause some damage.  Using **`rm -i`** i.e. in an interactive mode, may be a good idea.  You will be asked to confirm each delete step.

## Wildcards
<hr style="border: solid 1px gray; margin-top: 1.5% ">

**`*`** is a wildcard, which matches zero or more characters. Let’s consider the **`data-shell/molecules`** directory: **`*.pdb`** matches ethane.pdb, propane.pdb, and every file that ends with ‘.pdb’. On the other hand, **`p*.pdb`** only matches pentane.pdb and propane.pdb, because the ‘p’ at the front only matches filenames that begin with the letter ‘p’.

**`?`** is also a wildcard, but it matches exactly one character. So **`?ethane.pdb`** would match methane.pdb whereas *ethane.pdb matches both ethane.pdb, and methane.pdb.

Wildcards can be used in combination with each other e.g. **`???ane.pdb`** matches three characters followed by ane.pdb, giving cubane.pdb ethane.pdb octane.pdb.

When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the command that was asked for. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as an argument to the command as it is. For example typing **`ls *.pdf`** in the molecules directory (which contains only files with names ending with .pdb) results in an error message that there is no file called *.pdf.

In [None]:
cd molecules

In [None]:
%%bash
ls *.pdb

In [None]:
%%bash
ls p*.pdb

<hr style="border: solid 1px red; margin-top: 1.5% ">

## Key points

- **`cp`** old new copies a file.
- **`mkdir`** path creates a new directory.
- **`mv old new`** moves (renames) a file or directory.
- **`rm path`** removes (deletes) a file.
- **`*`** matches zero or more characters in a filename, so *.txt matches all files ending in .txt.
- **`?`** matches any single character in a filename, so ?.txt matches a.txt but not any.txt.
- The shell does not have a trash bin: once something is deleted, it’s really gone.
- Most files’ names are something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.


<hr style="border: solid 1px gray; margin-top: 1.5% ">

### Solution to Exercise 1:
```
$ mv sucrose.dat maltose.dat ../raw
```



### Solution to Exercise 2:
1. No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted.
2. Yes, this would work to rename the file.
3. No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created.
4. No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created.




### Solution to Exercise 3:

We start in the /Users/jamie/data directory, and create a new folder called recombine. The second line moves (mv) the file proteins.dat to the new folder (recombine). The third line makes a copy of the file we just moved. The tricky part here is where the file was copied to. Recall that .. means ‘go up a level’, so the copied file is now in /Users/jamie. Notice that .. is interpreted with respect to the current working directory, not with respect to the location of the file being copied. So, the only thing that will show using ls (in /Users/jamie/data) is the recombine folder.


1. No, see explanation above. proteins-saved.dat is located at /Users/jamie
2. Yes
3. No, see explanation above. proteins.dat is located at /Users/jamie/data/recombine
4. No, see explanation above. proteins-saved.dat is located at /Users/jamie