# Creating, moving, removing, and copying files

> Unix has a couple of unifying ideas or metaphors that shape its APIs and the development style that 
> proceeds from them. The most important of these are probably the "everything is a file" model and the
> pipe metaphor built on top of it.
>
> The Art of Unix Programming, http://www.catb.org/esr/writings/taoup/html/ch03s01.html#id2892028

The "everything is a file" model provides a single abstraction for performing input and output. Regular files,
directories, storage devices, peripheral devices, and even some networking can all be accessed using the
same set of basic commands.

What most people think of as a computer file is simply a collection of bytes. Reading the contents of a
file can be thought of as reading a stream of bytes. Writing data to a file can be thought of as
writing a stream of bytes. 

<div class="alert alert-block alert-info">
    This code in this notebook assumes that the current working directory is 
    <code>./scripts/files</code>. Run the next cell once when using this notebook.
    <br /><br />
    If you see errors related to missing files then the working directory is probably incorrect.
</div>

In [None]:
# run this cell exactly once each time you open this notebook
cd ./scripts/files

## Filenames

In UNIX, any character may appear in a filename except for the path separator character `/` and the null
character `\0`.

Filenames are case sensitive. The filename `abc` is distinct from the filename `Abc`.

A filename that begins with a `.` is usually considered to be a hidden file.

As we have already seen, there are characters that have special meaning to the shell. While not forbidden,
it is generally a good idea to avoid using the characters <code>!"#$&'()*;<=>?[\]^\`{|}~</code> in a filename.
Additionally, it is generally a good idea to avoid using whitespace in a filename (use the underscore `_` instead
of a space).

The hyphen `-` should never be the first character of a filename because it may be interpreted as a
command option.

A great deal of advice related to shell programming found on the Internet is devoted to explaining why
certain constructs are needed to deal with unusual characters in filenames.

## Creating files

Obviously we can use a software program such as a word processor, text editor, or drawing program to create
files. This section is concerned with creating text based files from the command line.

touch

mkdir

redirection

#### `touch`

The `touch` command creates one or more named files if they do not already exist. The created files are empty.

In [None]:
ls -l                  # should list only dir1, file1, and A.txt the first time you run this notebook 
touch file2 file3

If a file already exists, then `touch` updates the access and modification times of the file to the current time.

In [None]:
ls -l file1            # note the modification time of file1
touch file1
ls -l file1            # note the modification time of file1

#### `mkdir`

The `mkdir` command creates one or more named directories.

In [None]:
echo "before mkdir: " && ls
mkdir dir2 dir3
echo "after mkdir: " && ls

When creating a hierarchy of directories, `mkdir` will fail if one or more of the parent directories are missing.

In [None]:
# fails because xxx and yyy do not exist
mkdir xxx/yyy/zzz

You could manually create the missing parent directories:

```sh
# could do this, but see next cell for an easier way
mkdir xxx
mkdir xxx/yyy
mkdir xxx/yyy/zzz
```

but it is easier to use the `-p` option to create missing parent directories when needed.

In [None]:
mkdir -p xxx/yyy/zzz
tree

The `tree` command lists contents of directories in a tree-like format.

#### Redirecting command output

Many commands produce text output. It is possible to redirect this output directly to a file. The redirection
operator `>` redirects the non-error output of a command to a specified file overwriting the file if it already
exists.

In [None]:
cowsay "Mooooo" > moo.txt

Notice that the output that would have normally appeared in the terminal no longer appears. Instead, the output
has been redirected to the file `moo.txt`. We can verify that this has happened by printing the contents of the file:

In [None]:
cat moo.txt

To append to an existing file (or create a non-existing file), use the operator `>>`:

In [None]:
cowsay "Mooooo again" >> moo.txt

Printing the file contents should show two cows:

In [None]:
cat moo.txt

We will revisit the idea of redirection in a later notebook.

## Moving files

The `mv` command renames or moves files. To change the name of a file named *fname* to *newname* use:

`mv` *fname* *newname*

For example:

In [None]:
touch afile
mv afile bfile      # rename afile to bfile
ls

The destination directory for the newly named file may differ from the current directory. For example we can
rename `bfile` to `cfile` and put the renamed file in the existing directory `dir1`:

In [None]:
mv bfile dir1/cfile
ls                  # no bfile in current dir
ls dir1/cfile   

To move a file *fname*, without renaming it, to a different directory *dirname* use:

`mv` *fname* *dirname*

For example we can create a file named `zfile` and move it to directory `dir1`:

In [None]:
touch zfile
mv zfile dir1
ls                  # no zfile
ls dir1

Renaming or moving a file to an existing filename will overwrite the existing file, and there is no way to
recover the overwritten file. You may use the `-i` option to warn that overwriting is about to occur.
Interactive prompts do not work in Jupyter; readers should try the following example in an actual shell:

```sh
touch xfile yfile
mv -i xfile yfile
```

## Removing files

In UNIX, file deletion is permanent; there is no reliable way to completely recover the contents of a deleted
file. This is considered to be one of the problems in the design of UNIX (The Art of Unix Programming, 
http://www.catb.org/esr/writings/taoup/html/ch20s03.html#id3015538). Readers should consider themselves
warned!

The `rm` command removes a file or files. To remove a file named *fname* use:

`rm` *fname*

For example:

In [None]:
touch a.txt b.txt c.txt d.txt e.txt
ls

In [None]:
rm a.txt b.txt
ls

In [None]:
rm *txt
ls

The `-i` option can be used to interactively prompt whether a file should be removed. This is occassionally useful
when removing many files with a wildcard pattern.

The `rmdir` command will remove an empty directory.

In [None]:
mkdir somedir
ls

In [None]:
rmdir somedir
ls

Attempting to remove a non-empty directory using `rmdir` produces an error:

In [None]:
mkdir somedir
touch somedir/somefile
rmdir somedir

To remove a directory and all of the files in the directory (including subdirectories and their contents), use
the `-r` option of `rm` to recursively remove the directory:

In [None]:
rm -r somedir

## Copying files

The `cp` command copies one file to a second file, or copies multiple files to a directory.

To make a copy of a single file named `fname` use:

`cp` *fname* *newname*

where *newname* is the filename of the copy. If *newname* is the name of an existing file, then `cp` will 
overwrite the existing file with the copy. The following example copies the file `A.txt` to the file `A_words.txt`:

In [None]:
cp A.txt A_words.txt
ls

In [None]:
cat A.txt

In [None]:
cat A_words.txt

Multiple files can be copied into a directory; for example, we can copy `A.txt` and `A_words.txt` into the directory
`dir1`:

In [None]:
cp A*txt dir1
ls dir1

Directories and their contents can be copied using the `-r` option:

In [None]:
cp -r dir1 words

The directory `words` contains copies of all of the files that are in `dir1`.

## Linking to files

#### Hard links

Recall that a hard link is a directory entry that maps a filename to an inode. On the author's computer,
the mappings between filenames and inodes for the directory `scripts/files` obtained using `ls -i1` is:

| inode number | filename |
| :--- | :--- |
| 655532 | `A.txt` |
| 667642 | `dir1` |
| 662687 | `file1` |

If you run this example on your own computer, you may see more files if you have run the previous
examples in this notebook, and the inode numbers will almost certainly be different.

The *long listing* option of `ls` obtained using `ls -il` produces the following output on the author's computer:

```
total 8
655532 -rw-rw-r-- 1 cisc220 cisc220   97 Sep  3 17:41 A.txt
667642 drwxrwxr-x 2 cisc220 cisc220 4096 Sep  3 17:07 dir1
662687 -rw-rw-r-- 1 cisc220 cisc220    0 Sep  3 14:48 file1
```

The numbers in the third column of information are the number of hard links to each inode, or the 
number of hardlinks in a directory.

The `ln` command can be used to create a hard link that maps a new filename to the inode that another
filename maps to. The syntax is:

`ln` *target* *link_name*

where *target* is an existing filename and *link_name* is the filename of the link.
For example, we can create a new filename `A` that maps to the inode that `A.txt` maps to like
so:

In [None]:
ln A.txt A

The *long listing* option of `ls` obtained using `ls -il` produces the following output on the author's computer:

```
total 12
655532 -rw-rw-r-- 2 cisc220 cisc220   97 Sep  3 17:41 A
655532 -rw-rw-r-- 2 cisc220 cisc220   97 Sep  3 17:41 A.txt
667642 drwxrwxr-x 2 cisc220 cisc220 4096 Sep  3 17:07 dir1
662687 -rw-rw-r-- 1 cisc220 cisc220    0 Sep  3 14:48 file1
```

Notice that the number of hardlinks to inode 655532 is now equal to `2` indicating that there are two
filenames that map to the inode. Also
notice that the files `A` and `A.txt` have the same inode number. This means that the filenames `A` and `A.txt`
both refer to the same file data. Editing the file using the filename `A` modifes the same data that using
`A.txt` would. For example, we can append the string `ATM` to the end of the file via the filename `A`:

In [None]:
echo "ATM" >> A

Printing the contents of the file using `A.txt` shows that the file has been modified:

In [None]:
cat A.txt

If there is exactly one hard link to an inode, then removing the file by name causes the inode to be deleted.
If there are more than one hard links to an inode, then removing the file by name only removes the filename
and decreases the hard link count by one. For example, if we remove the file named `A`:

In [None]:
rm A

then the inode (and the associated file data) is not removed because there is still a filename that links to
the inode (namely `A.txt`). The *long listing* option of `ls` obtained using `ls -il` produces the following output on the author's computer:

```
total 8
655532 -rw-rw-r-- 1 cisc220 cisc220  101 Sep  3 20:03 A.txt
667642 drwxrwxr-x 2 cisc220 cisc220 4096 Sep  3 17:07 dir1
662687 -rw-rw-r-- 1 cisc220 cisc220    0 Sep  3 14:48 file1
```

Notice that the hard link count to inode 655532 has decreased to `1` and that the filename `A.txt` still
maps to inode 655532.

#### Symbolic or soft links

A *symbolic* or *soft link* maps a filename to another filename. The `ln` command can be used to create
a symbolic link using the `-s` option. The syntax is:

`ln -s` *target* *link_name*

where *target* is the filename to link to and *link_name* is the filename of the link. For example, we can
create a symbolic link to `A.txt` like so:

In [None]:
ln -s A.txt words_starting_with_A

The *long listing* option of `ls` obtained using `ls -il` produces the following output on the author's computer:

```
total 8
655532 -rw-rw-r-- 1 cisc220 cisc220   97 Sep  3 20:13 A.txt
667642 drwxrwxr-x 2 cisc220 cisc220 4096 Sep  3 17:07 dir1
662687 -rw-rw-r-- 1 cisc220 cisc220    0 Sep  3 14:48 file1
655553 lrwxrwxrwx 1 cisc220 cisc220    5 Sep  3 20:21 words_starting_with_A -> A.txt
```

Notice that the long listing obtained using `ls` shows the symbolic link as
`words_starting_with_A -> A.txt`. Also notice that the symbolic link is a file having its own
inode (655553).

The file corresponding to `A.txt` can be acessed using either the name `A.txt` or the link name `words_starting_with_A`. For example, we can append the string `AWOL` to the end of the file via the link name:

In [None]:
echo "AWOL" >> words_starting_with_A

Printing the contents of the file using `A.txt` shows that the file has been modified:

In [None]:
cat A.txt

What happens if we rename the file `A.txt`? The symbolic link `words_starting_with_A` no longer maps to
a valid filename. Trying to use the symbolic link to access the data in the file now fails because the
symbolic link refers to a non-existant filename:

In [None]:
mv A.txt a.txt
cat words_starting_with_A

A symbolic link that refers to a non-existant filename is called a *broken*, *dangling*, or *dead* link.
The link can be repaired by recreating the filename that the link maps to:

In [None]:
mv a.txt A.txt
cat words_starting_with_A

Many Linux distributions make extensive use of soft links to map names in `/bin` to executable programs located
elsewhere on the filesystem.