# Terminal Commands and Bash Scripting

Adapted from http://people.duke.edu/~ccc14/duke-hts-2018/cliburn/Bash_in_Jupyter.html 

## pwd: where am I?

In [None]:
pwd

<img src="img/whoami.jpg">

## To continue the existential questions...

In [None]:
whoami

## cd: navigate between directories

* cd <dirname> --> go to a directory

In [None]:
cd bash_playground

In [None]:
# Now where are we?
pwd

###### You can go back *up* a level with the ".." notation

In [None]:
cd ..

In [None]:
pwd

Let's go back into our playground

In [None]:
cd bash_playground

## echo: print out a string

In [None]:
echo "is there anybody out there?"

## ls: check out the contents of a folder

In [None]:
ls

#### extra tags get you more information

In [None]:
ls -lh

#### what if we only want to list files with a certain extension?

In [None]:
# The asterisk is a "wild card" that matches any character
ls *.sh

In [None]:
ls -lh *.sh

In [None]:
### YOU TRY: list only the files that end in .txt

## mkdir: make a directory

In [None]:
mkdir shiny_new

##### Look it's there!

In [None]:
ls -lh

## cp: Copy a file

In [None]:
cp sequences.txt shiny_new/sequences_copy.txt
ls -lh shiny_new

#### cp -r to copy a directory and all of its contents

In [None]:
cp -r shiny_new same_old


In [None]:
# So now we should have two new directories in our bash playground: same_old and shiny_new
ls -lh

In [None]:
ls -lh same_old

In [None]:
# And shiny new should contain the same contents as same old
ls -lh shiny_new

## touch: make an empty file

In [None]:
touch empty_inside.txt

In [None]:
# Don't read into that name too much. Just trust me, it's empty. Look at the size. Or go check it yourself later!
ls -lh

## mv: Move or rename a file
#### if you use mv where the destination is in the same folder you will simply rename the file

In [None]:
# You before this lesson --> you after this lesson? 
mv empty_inside.txt fulfilled.txt

In [None]:
ls -lh

### If you use mv where the destination is in a different folder you will move the file

In [None]:
mv fulfilled.txt shiny_new

In [None]:
# Not here at the main folder level anymore...
ls -lh

In [None]:
# Here!
ls -lh shiny_new

### With great power comes great responsibility... use this carefully! There is no "undo" in terminal...
# rm: delete file

In [None]:
rm shiny_new/fulfilled.txt

In [None]:
ls -lh shiny_new

### rm -r: delete directory and all of its contents

In [None]:
rm -r shiny_new


In [None]:
### YOU TRY: delete the directory "same_old" 



In [None]:
ls -lh

# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [None]:
cat sequences.txt

### You can look at just the beginning or just the end of a file

In [None]:
head -2 sequences.txt

In [None]:
tail -2 sequences.txt 

### You can split up a file by columns!

In [None]:
cut -f1 sequences.txt

In [None]:
### YOU TRY: print out just the second column



### Piping 

<img src="img/mariopipe.png">

#### You can use pipes ("|") to send the output of one command to another command as input

In [None]:
### What do you think will be output using this combination of commands?
head -3 sequences.txt | tail -2

In [None]:
### NOW YOU TRY: retrieve just the second column of the third line (seq C) from the sequences.txt file using 
# a combination of tail, head, and cut with a couple of pipes in the mix.



### grep: searching for specific text within a file

In [None]:
grep "TAAC" sequences.txt

#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [None]:
grep "[B|D]" sequences.txt

#### Use the "man" keyword before any command to learn more about how to use it

In [None]:
man grep 

In [None]:
### NOW YOU TRY: Okay, try finding the line that contains "TAAC" plus the line after it using grep.




### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [None]:
cat sequences.txt | tr T U

#### We can use tr to find the complement of DNA sequences

In [None]:
cut -f2 sequences.txt | tr ACGT TGCA


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [None]:
cut -f2 sequences.txt | tr ACGT TGCA | rev


## wc: word/line/character count

In [None]:
wc sequences.txt

#### Get just lines

In [None]:
wc -l sequences.txt

#### Get just words

In [None]:
wc -w sequences.txt

#### Get just characters

In [None]:
wc -c sequences.txt

## sort: sort values in a column

In [None]:
cut -f2 sequences.txt | sort 

## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [None]:
cut -f2 sequences.txt | sort | uniq

# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [None]:
# This will download chromosome chr21 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

#### Now we have it locally

In [None]:
ls 

### gunzip and gzip: unzipping and zipping files

In [None]:
gunzip chr21.fa.gz

In [None]:
ls

In [None]:
# Let's sneak a peak
head -194500 chr21.fa | tail

#### Ok back in the box you go

In [None]:
gzip chr21.fa

In [None]:
ls

# Advanced scripting

## For loops

In [None]:
# Remember, ls * gives us a list of all files in the curent directory.

In [None]:
for FILE in $(ls *)
do
    echo "check out this sweet little file called" $FILE
done

## If statements

#### Change this code up so that we can switch things up...

In [None]:
you="super crazy enthusiastic about all this bash stuff"
if [ $you = "bored" ]
then
    echo "okay on to the next adventure!"
fi

In [None]:
for name in "Dolly" "Kitty" "Poppet";
    do echo $name;
done;