# Working With Files and Directories

## Creating directories
We now know how to explore files and directories, but how do we create them in the first place?

In this episode we will learn about creating and moving files and directories, using the **exercise-data/writing** directory as an example.

### Step one: see where we are and what we already have
We should still be in the shell-lesson-data directory in our projects folder, which we can check using:

In [None]:
pwd

Next we’ll move to the **exercise-data/writing** directory and see what it contains:

In [None]:
cd exercise-data/writing/

In [None]:
ls -F

### Create a directory

Let’s create a new directory called thesis using the command mkdir thesis (which has no output):

In [None]:
mkdir thesis

As you might guess from its name, <mark>mkdir</mark> means ‘make directory’. Since **thesis** is a relative path (i.e., does not have a leading slash, like **/what/ever/thesis**), the new directory is created in the current working directory:


In [None]:
ls -F

Since we’ve just created the **thesis** directory, there’s nothing in it yet:

In [None]:
ls -F thesis

Note that **mkdir** is not limited to creating single directories one at a time. The **-p** option allows **mkdir** to create a directory with nested subdirectories in a single operation:

In [None]:
mkdir -p ../project/data ../project/results

The **-R** option to the **ls** command will list all nested subdirectories within a directory. Let’s use **ls -FR** to recursively list the new directory hierarchy we just created in the **project** directory:

In [None]:
ls -FR ../project

### *Starting Point 3_11_25*

Last time, we learned a few commands and small pieces of information about the bash shell. We learned the format of bash commands

![image](./images/shell_command_syntax.svg)


We discovered a few of the most common commands.

- What can we use to print the current directory?
- What can we use to list the files in a directory?
- How can I move into the parent directory (the directory that contains my current directory)?
- What is the symbol used to refer to 


We also saw a symbols that refer to specific locations.

- What symbol respresents our home directory?
- What about our current directory? 

Let's move back into our "writing" directory. From 07_Command_Line, the relative path is shell-lesson-data/exercise-data/writing/. But we may be starting somewhere completely different. To move using the absolute path instead...

In [None]:
cd /projects/$USER/CM515-course-2025/modules/07_Command_Line/shell-lesson-data/exercise-data/writing

### *Two ways of doing the same thing*

Using the shell to create a directory is no different than using a file explorer. If you open the current directory using your operating system’s graphical file explorer, the **thesis** directory will appear there too. While the shell and the file explorer are two different ways of interacting with the files, the files and directories themselves are the same.

### *Good names for files and directories*
Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files and directories.

* **Don’t use spaces.**  &nbsp;&nbsp;&nbsp;&nbsp;Spaces can make a name more meaningful, but since spaces are used to separate arguments on the command line it is better to avoid them in names of files and directories. You can use **-** or **_** instead (e.g. **north-pacific-gyre/** rather than **north pacific gyre/**). To test this out, try typing **mkdir north pacific gyre** and see what directory (or directories!) are made when you check with **ls -F**.

* Don’t begin the name with **-** (dash).
 
* Commands treat names starting with **-** as options.

* Stick with letters, numbers, **.** (period or ‘full stop’), **-** (dash) and **_** (underscore).

* **No Special Characters**  &nbsp;&nbsp;&nbsp;&nbsp;Many other characters have special meanings on the command line. We will learn about some of these during this lesson. There are special characters that can cause your command to not work as expected and can even result in data loss.

* If you need to refer to names of files or directories that have spaces or other special characters, you should surround the name in quotes (**""**).

## Create a text file

### *Which Editor?*

**Nano is a text editor**  &nbsp;&nbsp;&nbsp;&nbsp;

When we say, ‘nano is a text editor’ we really do mean ‘text’. It can only work with plain character data, not tables, images, or any other human-friendly media. We use it in examples because it is one of the least complex text editors. However, because of this trait, it may not be powerful enough or flexible enough for the work you need to do after this workshop. On Unix systems (such as Linux and macOS), many programmers use Emacs or Vim (both of which require more time to learn), or a graphical editor such as Gedit or VScode. On Windows, you may wish to use Notepad++. Windows also has a built-in editor called notepad that can be run from the command line in the same way as nano for the purposes of this lesson. 

![image](./images/VIMCheat.png)



**Windows and Unix Have Different End of Lines**

Be aware that Windows text editors will often add a carriage return character (\r) in addition to a newline (\n). This means files created in Windows may not work for Unix programs!

*No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. If you use your computer’s start menu, it may want to save files in your Desktop or Documents directory instead. You can change this by navigating to another directory the first time you ‘Save As…’*

*In Alpine, we will typically save files in a folder in either our home directory (for a smaller files) or the projects directory (once you move on to larger tasks or research projects). What symbol do we use to refer to our home directory again?*

### Using Nano

Let’s change our working directory to **thesis** using **cd**, then run a text editor called Nano to create a file called **draft.txt**:

In [None]:
cd thesis
nano draft.txt

Let’s type in a few lines of text.

![image](./images/nano.png)

Once we’re happy with our text, we can press **Ctrl+O** (press the **Ctrl** or **Control** key and, while holding it down, press the O key) to write our data to disk. We will be asked to provide a name for the file that will contain our text. Press **Return** to accept the suggested default of draft.txt.

Once our file is saved, we can use **Ctrl+X** to quit the editor and return to the shell.

### *Control, Ctrl, or ^ Key*
The Control key is also called the ‘Ctrl’ key. There are various ways in which using the Control key may be described. For example, you may see an instruction to press the **Control** key and, while holding it down, press the **X** key, described as any of:

* Control-X
* Control+X
* Ctrl-X
* Ctrl+X
* ^X
* C-x
  
In nano, along the bottom of the screen you’ll see **^G Get Help ^O WriteOut**. This means that you can use **Control-G** to get help and **Control-O** to save your file.

**nano** doesn’t leave any output on the screen after it exits, but **ls** now shows that we have created a file called **draft.txt**:

In [None]:
ls

### *Creating Files a Different Way*

We have seen how to create text files using the nano editor. Now, try the following command:

In [None]:
touch my_file.txt
ls

In [None]:
head my_file.txt

**TOUCH**  &nbsp;&nbsp;&nbsp;&nbsp;The <mark>touch</mark> command generates a new file called **my_file.txt** in your current directory. You can observe this newly generated file by typing ls at the command line prompt. **my_file.txt** can also be viewed in your GUI file explorer.

* When you inspect the file with **ls -l**, note that the size of **my_file.txt** is 0 bytes. In other words, it contains no data. If you open **my_file.txt** using your text editor it is blank.

* Some programs do not generate output files themselves, but instead require that empty files have already been generated. When the program is run, it searches for an existing file to populate with its output. The touch command allows you to efficiently generate a blank text file to be used by such programs.

* To avoid confusion later on, we suggest removing the file you’ve just created before proceeding with the rest of the episode, otherwise future outputs may vary from those given in the lesson. To do this, use the following command:

In [None]:
rm my_file.txt
ls

### *What's in a name?*

**FILE EXTENSIONS**  &nbsp;&nbsp;&nbsp;&nbsp;You may have noticed that all of Nelle’s files are named ‘something dot something’, and in this part of the lesson, we always used the extension **.txt**. This is just a convention; we can call a file **mythesis** or almost anything else we want. However, most people use two-part names most of the time to help them (and their programs) tell different kinds of files apart. The second part of such a name is called the filename extension and indicates what type of data the file holds: **.txt** signals a plain text file, **.pdf** indicates a PDF document, **.cfg** is a configuration file full of parameters for some program or other, **.png** is a PNG image, and so on.

This is just a convention, albeit an important one. Files merely contain bytes; it’s up to us and our programs to interpret those bytes according to the rules for plain text files, PDF documents, configuration files, images, and so on.

Naming a PNG image of a whale as **whale.mp3** doesn’t somehow magically turn it into a recording of whale song, though it might cause the operating system to associate the file with a music player program. In this case, if someone double-clicked **whale.mp3** in a file explorer program,the music player will automatically (and erroneously) attempt to open the **whale.mp3** file.

As you may have learned at this point, there are two types of files: binary files and text files. We can work with both, but bash has a lot of tools that allow text file manipulation. **This is why many bioinformatic file types are plain text!**

## Moving Files and Directories
Let us returning to the **shell-lesson-data/exercise-data/writing** directory, the parent directory. How do we get there again?

In [None]:
cd 

In our **thesis** directory we have a file **draft.txt** which isn’t a particularly informative name, so let’s change the file’s name using <mark>mv</mark>, which is short for ‘move’:

In [None]:
mv thesis/draft.txt thesis/quotes.txt

The first argument tells **mv** what we’re ‘moving’, while the second is where it’s to go. In this case, we’re moving **thesis/draft.txt** to **thesis/quotes.txt**, which has the same effect as renaming the file. Sure enough, **ls** shows us that thesis now contains one file called **quotes.txt**:

In [None]:
ls thesis

One must be careful when specifying the target file name, since **mv** will silently overwrite any existing file with the same name, which could lead to data loss. By default, mv will not ask for confirmation before overwriting files. However, an additional option, **mv -i** (or **mv --interactive**), will cause **mv** to request such confirmation.

Note that **mv** also works on directories.

Let’s move **quotes.txt** into the current working directory. We use **mv** once again, but this time we’ll use just the name of a directory as the second argument to tell **mv** that we want to keep the filename but put the file somewhere new. (This is why the command is called ‘move’.) In this case, the directory name we use is the special directory name **.** that we mentioned earlier.

In [None]:
mv thesis/quotes.txt .

The effect is to move the file from the directory it was in to the current working directory. **ls** now shows us that **thesis** is empty:

In [None]:
ls thesis

Alternatively, we can confirm the file **quotes.txt** is no longer present in the **thesis** directory by explicitly trying to list it:

In [None]:
ls thesis/quotes.txt

**ls** with a filename or directory as an argument only lists the requested file or directory. If the file given as the argument doesn’t exist, the shell returns an error as we saw above. We can use this to see that **quotes.txt** is now present in our current directory:

In [None]:
ls quotes.txt

## Copying files and directories

The <mark>cp</mark> command works very much like **mv**, except it copies a file instead of moving it. We can check that it did the right thing using **ls** with two paths as arguments — like most Unix commands, **ls** can be given multiple paths at once:


In [None]:
cp quotes.txt thesis/quotations.txt
ls quotes.txt thesis/quotations.txt

We can also copy a directory and all its contents by using the **recursive** option **-r**, e.g. to back up a directory:

In [None]:
cp -r thesis thesis_backup

We can check the result by listing the contents of both the **thesis** and **thesis_backup** directory:

In [None]:
ls thesis thesis_backup

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it statstics.txt

After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

- cp statstics.txt statistics.txt
- mv statstics.txt statistics.txt
- mv statstics.txt .
- cp statstics.txt .

## Removing files and directories
Returning to the **shell-lesson-data/exercise-data/writing** directory, let’s tidy up this directory by removing the **quotes.txt** file we created. The Unix command we’ll use for this is <mark>rm</mark> (short for ‘remove’):

In [None]:
rm quotes.txt

We can confirm the file has gone using **ls**:

In [None]:
ls quotes.txt

### *IMPORTANT: Deleting is Forever*
The Unix shell doesn’t have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unlinked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there’s no guarantee they’ll work in any particular situation, since the computer may recycle the file’s disk space right away.

### *Using rm Safely*
What happens when we execute **rm -i thesis_backup/quotations.txt**? Why would we want this protection when using rm?

In [None]:
rm -i thesis_backup/quotations.txt

The **-i** option will prompt before (every) removal (use **Y** to confirm deletion or **N** to keep the file). The Unix shell doesn’t have a trash bin, so all the files removed will disappear forever. By using the **-i** option, we have the chance to check that we are deleting only the files that we want to remove.

If we try to remove the **thesis** directory using **rm thesis**, we get an error message:

In [None]:
rm thesis

This happens because **rm** by default only works on files, not directories.

**rm** can remove a directory and all its contents if we use the recursive option **-r**, and it will do so *without any confirmation prompts*:

Given that there is no way to retrieve files deleted using the shell, **rm -r** should be used with great caution (you might consider adding the interactive option **rm -r -i**).

## Operations with multiple files and directories

Oftentimes one needs to copy or move several files at once. This can be done by providing a list of individual filenames, or specifying a naming pattern using wildcards. Wildcards are special characters that can be used to represent unknown characters or sets of characters when navigating the Unix file system.

### *Copy with Multiple File Names*

For this exercise, you can test the commands in the shell-lesson-data/exercise-data directory.

In the example below, what does cp do when given several filenames and a directory name?


In [None]:
mkdir backup
$ cp creatures/minotaur.dat creatures/unicorn.dat backup/

If given more than one file name followed by a directory name (i.e. the destination directory must be the last argument), cp copies the files to the named directory.

In [None]:
$ cp minotaur.dat unicorn.dat basilisk.dat

If given three file names, cp throws an error such as the one below, because it is expecting a directory name as the last argument.

### Using wildcards for accessing multiple files at once

### *Wildcards*

**\*** is a <mark>wildcard</mark>, which represents zero or more other characters. Let’s consider the **shell-lesson-data/exercise-data/alkanes** directory: **\*.pdb** represents **ethane.pdb**, **propane.pdb**, and every file that ends with ‘.pdb’. On the other hand, **p\*.pdb** only represents pentane.pdb and propane.pdb, because the ‘p’ at the front can only represent filenames that begin with the letter ‘p’.

**?** is also a wildcard, but it represents exactly one character. So **?ethane.pdb** could represent **methane.pdb** whereas **\*ethane.pdb** represents both **ethane.pdb** and **methane.pdb**.

Wildcards can be used in combination with each other. For example, **???ane.pdb** indicates three characters followed by **ane.pdb**, giving **cubane.pdb** **ethane.pdb** **octane.pdb**.

When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the preceding command. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as an argument to the command as it is. For example, typing **ls \*.pdf** in the **alkanes** directory (which contains only files with names ending with **.pdb**) results in an error message that there is no file called **\*.pdf**. However, generally commands like **wc** and **ls** see the lists of file names matching these expressions, but not the wildcards themselves. It is the shell, not the other programs, that expands the wildcards.

### *List filenames matching a pattern*
When run in the **alkanes** directory, which **ls** command(s) will produce this output?

**ethane.pdb methane.pdb**

1. **ls \*t*ane.pdb**
2. **ls \*t?ne.\***
3. **ls \*t??ne.pdb**
4. **ls ethane.\***

## Key Points
* **cp \[old] \[new]** copies a file.
* **mkdir \[path]** creates a new directory.
* **mv \[old] \[new]** moves (renames) a file or directory.
* **rm \[path]** removes (deletes) a file.
* **\*** matches zero or more characters in a filename, so **\*.txt** matches all files ending in **.txt**.
* **?** matches any single character in a filename, so **?.txt** matches **a.txt** but not **any.txt**.
* Use of the Control key may be described in many ways, including **Ctrl-X**, **Control-X**, and **^X**.
* The shell does not have a trash bin: once something is deleted, it’s really gone.
* Most files’ names are **something.extension**. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.
* Depending on the type of work you do, you may need a more powerful text editor than Nano.