<h1 style = "font-size: 35px">Terminal Commands and Bash Scripting</h1>

<img src="img/command_anatomy.png" width='800px'>

# File System Navigation

## `pwd`: where am I?

What is the path to my current location (aka working directory) in the file system?

In [None]:
pwd

<img src="img/whoami.jpg">

## To continue the existential questions...

Who am I logged in as? What is my username?

In [None]:
whoami

## `tree`: show the file system structure in a tree format
```bash
tree ~/module-1-programming
```
```
~/module-1-programming
├── Day1_1_Intro_to_Programming.ipynb
├── Day1_2_Python_Packages.ipynb
├── Day1_3_Intro_to_Pandas.ipynb
├── Day2_1_R_Basics.ipynb
├── Day2_2_Terminal_Commands_and_Bash_Scripting.ipynb
├── README.md
├── bash_playground
│   ├── scripts
│   │   ├── hello.sh
│   │   ├── hello_to_all_of_you.sh
│   │   └── hello_you.sh
│   └── sequences.txt
├── data
│   ├── chrom_lengths.tsv
│   ├── gene_chrom.tsv
│   └── hg19.chrom.sizes.txt
└── img
```

## `cd`: **c**hange my working **d**irectory

We can use the `cd` command to navigate to a directory, given the path to that directory:
```bash
cd [path_to_dir]
```
Without any arguments, `cd` will go to your home directory.

In [None]:
cd

Now where are we?

In [None]:
pwd

Usually, you'll want to give `cd` the path to a directory, though. Let's go to the bash_playground folder.

In [None]:
cd module-1-programming/bash_playground

You can go back *up* a level with the ".." notation

In [None]:
cd ..
pwd

And you can go back to your last directory by using `-` as the directory path. This will also print the new working directory.

In [None]:
cd -

## `ls`: **l**i**s**t contents of a directory

We can use the `ls` command to print the contents of a directory, given the path to that directory:
```bash
ls [path_to_dir_or_file]...
```
Without any arguments, `ls` will show you the contents of your current working directory.

In [None]:
ls

But we can also give it the path to a file or directory (or multiple files or directories).

In [None]:
ls scripts/hello.sh scripts/hello_you.sh

Adding flags get us more information. Let's use the short-format.

In [None]:
ls -lh scripts/hello.sh scripts/hello_you.sh

Use the `-a` flag to show _all_ files, including hidden ones, prefixed with a single dot `.`.

In [None]:
ls -a

Just like with `cd`, we can also specify the path to the directory above our current one using the ".." notation.

In [None]:
ls ..

In any path, the tilde `~` represents our home directory, so we can view the files in that directory like this:

In [None]:
ls ~

# Paths

## Relative vs absolute paths

There are two types of paths:
1. relative - interpreted relative to your current working directory
2. absolute - independent of your current working directory (ie relative to the root directory)

So these paths refer to the same file:
- `scripts/hello.sh` (relative)
- `~/module-1-programming/bash_playground/scripts/hello.sh` (absolute)

If we changed our working directory from `bash_playground`, then the first path would break but the second one wouldn't.

If we changed the location of the `module-1-programming` directory, then the second path would be the one to break. But the first path wouldn't break, assuming our working directory is still `bash_playground`!

## Different ways of writing a path
These are all equivalent paths!
- `scripts/hello.sh`
- `../bash_playground/scripts/hello.sh`
- `scripts/../scripts/hello.sh`
- `scripts/./hello.sh`

The double-dot `..` refers to the folder above and the single-dot `.` refers to the current folder.

# Understanding help and manual pages

Most commands have usage documentation available via `--help` (long format) **and/or** `-h` (short format)
```bash
cmd --help
```
```bash
cmd -h
```

Alternatively, some commands will also register their help via the built-in manual pages in the `man` command
```bash
man cmd
```
(press **q** to quit or exit-out after opening the manual)

## An example manual page: `man ls`
<img src="img/man_ls.png" width='800px'>

# Tab completion
While typing a command, you can press `<tab>` to use the terminal's auto-completions. This helps save you time and reduces the chance you'll make typos while you type. So you should make an effort to use tab completion as often as possible.

For example, let's tab-complete the `scripts/` directory.
```bash
cd sc<tab>
```
## double tap when there are multiple options
If there are multiple possible ways to complete your command, nothing will appear at first.
```bash
ls scripts/hello<tab>
```
Press `<tab>` twice in quick succession to display the possible options.
```bash
ls scripts/hello<tab><tab>
```

# Altering the file system

## `mkdir`: **m**a**k**e a new **dir**ectory
```bash
mkdir path_to_dir...
```

In [None]:
mkdir shiny_new

Look, it's there!

In [None]:
ls

## `cp`: **c**o**p**y a file or directory
```bash
cp path_to_source... path_to_dest
```

In [None]:
cp sequences.txt shiny_new/sequences_copy.txt
ls shiny_new

### `cp -r` to copy a directory and all of its contents

In [None]:
cp -r shiny_new same_old
ls

We should now have two new directories in our bash playground: `same_old/` and `shiny_new/`.

In [None]:
ls same_old

And shiny new should contain the same contents as same old.

In [None]:
ls shiny_new

## `touch`: create a new, empty file
```bash
touch path_to_file...
```

In [None]:
touch empty_inside.txt

Don't read into that name too much. Just trust me, it's empty. Look at the size. Or go check it yourself later!

In [None]:
ls -lh

## `mv`: **m**o**v**e a file or directory
```bash
mv path_to_source... path_to_dest
```

Let's move `empty_inside.txt` to the `shiny_new/` directory.

In [None]:
mv empty_inside.txt shiny_new/
ls

Check: is it there?

ls shiny_new

### renaming a file
Just use `mv` where the destination is in the same folder!

Let's convert `empty_inside.txt` to `fulfilled.txt`. Hopefully, you'll be the same way after this lesson ;)

In [None]:
mv shiny_new/empty_inside.txt shiny_new/fulfilled.txt
ls

## `rm`: **r**e**m**ove a file or directory
```bash
rm path_to_file_or_dir...
```
With great power comes great responsibility... use this carefully! There is no "undo" button or trash can in the terminal!

In [None]:
rm shiny_new/fulfilled.txt
ls shiny_new

### `rm -r`: delete a directory and all of its contents, **r**ecursively

In [None]:
rm -r same_old
ls

## `rmdir`: **r**e**m**ove an empty **dir**ectory
```bash
rmdir path_to_empty_dir...
```

In [None]:
rmdir shiny_new
ls

## `ln`: create a symlink to a file or directory
```bash
ln -s path_to_target... path_to_link
```
A symlink is a shortcut to a file or directory. This can be helpful when you want to have a file in multiple locations but don't want to copy it.

Recall the `data/` directory in the folder above `bash_playground/`.

In [None]:
ls ../data

Let's create a symlink to the `data/` directory from within the `bash_playground/` directory.

In [None]:
ln -s ../data data-sym
ls

A symlink is a special file that just references another file (or directory). You can view the path of a symlink using `ls -l`. 

In [None]:
ls -l data-sym

Symlinks will act exactly the same as their original file or directory.

In [None]:
ls data-sym

In [None]:
rm data-sym
ls

Deleting a symlink doesn't delete the target it points to.

In [None]:
ls ..

# Printing things

## `cat`: print the contents of files
```bash
cat path_to_file...
```

In [None]:
cat sequences.txt

If you provide more than one file, `cat` will _concatenate_ them (aka append them to each other). This is where it gets its name.

In [None]:
cat sequences.txt .you_found_me.txt

## `head`: print the top of a file
```bash
head [path_to_file]...
```
By default, `head` will print the first 10 lines of a file.

In [None]:
head sequences.txt

But you can also specify the number of lines to the `-n` flag.

In [None]:
head -n 5 sequences.txt

## `tail`: print the end of a file
```bash
tail [path_to_file]...
```
You can use `tail` in the exact same way as `head`.

In [None]:
tail sequences.txt

In [None]:
tail -n 5 sequences.txt

## `echo`: print a string
```bash
echo [string]...
```

In [None]:
echo is anybody out there?

# Wildcards and globbing

Question marks `?` and asterisks `*` are considered _wildcards_.

You can use a question-mark `?` in place of any character. Bash will find all of the files that match. This is called _globbing_.

In [None]:
ls ?equences.txt

Use an asterisk `*` when you want to match zero or more characters.

In [None]:
ls *es.txt

This is useful when you want to refer to a bunch of similarly named files simultaneously.

In [None]:
ls scripts/*.sh

Wildcards get expanded to a space-separated list prior to the command executing. A great way to see this is to prepend the command with an `echo`.

In [None]:
echo ls scripts/*.sh

# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [None]:
cat sequences.txt

### You can look at just the beginning or just the end of a file

In [None]:
head -2 sequences.txt

In [None]:
tail -2 sequences.txt 

### You can split up a file by columns!

In [None]:
cut -f1 sequences.txt

In [None]:
### YOU TRY: print out just the second column



### Piping 

<img src="img/mariopipe.png">

#### You can use pipes ("|") to send the output of one command to another command as input

In [None]:
### What do you think will be output using this combination of commands?
head -3 sequences.txt | tail -2

In [None]:
### NOW YOU TRY: retrieve just the second column of the third line (seq C) from the sequences.txt file using 
# a combination of tail, head, and cut with a couple of pipes in the mix.



### grep: searching for specific text within a file

In [None]:
grep "TAAC" sequences.txt

#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [None]:
grep "[B|D]" sequences.txt

#### Use the "man" keyword before any command to learn more about how to use it

In [None]:
man grep 

In [None]:
### NOW YOU TRY: Okay, try finding the line that contains "TAAC" plus the line after it using grep.




### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [None]:
cat sequences.txt | tr T U

#### We can use tr to find the complement of DNA sequences

In [None]:
cut -f2 sequences.txt | tr ACGT TGCA


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [None]:
cut -f2 sequences.txt | tr ACGT TGCA | rev


## wc: word/line/character count

In [None]:
wc sequences.txt

#### Get just lines

In [None]:
wc -l sequences.txt

#### Get just words

In [None]:
wc -w sequences.txt

#### Get just characters

In [None]:
wc -c sequences.txt

## sort: sort values in a column

In [None]:
cut -f2 sequences.txt | sort 

## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [None]:
cut -f2 sequences.txt | sort | uniq

# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [None]:
# This will download chromosome chr21 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

#### Now we have it locally

In [None]:
ls 

### gunzip and gzip: unzipping and zipping files

In [None]:
gunzip chr21.fa.gz

In [None]:
ls

In [None]:
# Let's sneak a peak
head -194500 chr21.fa | tail

#### Ok back in the box you go

In [None]:
gzip chr21.fa

In [None]:
ls

# Advanced scripting

## For loops

In [None]:
# Remember, ls * gives us a list of all files in the curent directory.

In [None]:
for FILE in $(ls *)
do
    echo "check out this sweet little file called" $FILE
done

## If statements

#### Change this code up so that we can switch things up...

In [None]:
you="super crazy enthusiastic about all this bash stuff"
if [ $you = "bored" ]
then
    echo "okay on to the next adventure!"
fi

In [None]:
for name in "Dolly" "Kitty" "Poppet";
    do echo $name;
done;