# Terminal Commands and Bash Scripting

Adapted from http://people.duke.edu/~ccc14/duke-hts-2018/cliburn/Bash_in_Jupyter.html 

## cd: navigate between directories

* cd <dirname> --> go to a directory

In [1]:
cd bash_playground

## echo: print out a string

In [2]:
echo "is there anybody out there?"

is there anybody out there?


## ls: check out the contents of a folder

In [3]:
ls

hello.sh	hello_you.sh	same_old	sequences.txt	shiny_new


#### extra tags get you more information

In [4]:
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  4 erickofman  staff   128B Dec 31 10:07 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  3 erickofman  staff    96B Dec 31 10:07 shiny_new


#### what if we only want to list files with a certain extension?

In [5]:
# The asterisk is a "wild card" that matches any character
ls *.sh

hello.sh	hello_you.sh


In [6]:
### YOU TRY: list only the files that end in .txt

## mkdir: make a directory

In [7]:
mkdir shiny_new

mkdir: shiny_new: File exists


: 1

##### Look it's there!

In [8]:
ls

hello.sh	hello_you.sh	same_old	sequences.txt	shiny_new


## cp: Copy a file

In [9]:
cp sequences.txt shiny_new/sequences_copy.txt
ls shiny_new

sequences_copy.txt


#### cp -r to copy a directory and all of its contents

In [10]:
cp -r shiny_new same_old
ls same_old

sequences_copy.txt	shiny_new


In [11]:
ls same_old/shiny_new

sequences_copy.txt


## touch: make an empty file

In [12]:
touch empty_inside.txt

In [13]:
# Trust me, it's empty. Or go check it yourself later!
ls 

empty_inside.txt	hello_you.sh		sequences.txt
hello.sh		same_old		shiny_new


## mv: Move or rename a file
#### if you use mv where the destination is in the same folder you will simply rename the file

In [14]:
# You before this lesson --> you after this lesson? 
mv empty_inside.txt fulfilled.txt

In [15]:
ls

fulfilled.txt	hello_you.sh	sequences.txt
hello.sh	same_old	shiny_new


### If you use mv where the destination is in a different folder you will move the file

In [16]:
mv fulfilled.txt shiny_new

In [17]:
# Not here anymore...
ls

hello.sh	hello_you.sh	same_old	sequences.txt	shiny_new


In [18]:
# Here!
ls shiny_new

fulfilled.txt		sequences_copy.txt


### To continue the existential theme:
## pwd: where am I?


In [19]:
pwd

/Users/erickofman/UCSD/cmm262-2021/module-1-programming/bash_playground


### With great power comes great responsibility... use this carefully! There is no "undo" in terminal...
# rm: delete file

In [20]:
rm shiny_new/fulfilled.txt

In [21]:
ls shiny_new

sequences_copy.txt


### rm -r: delete directory and all of its contents

In [22]:
rm -r shiny_new


In [23]:
### YOU TRY: delete the directory "same_old" 

In [24]:
ls

hello.sh	hello_you.sh	same_old	sequences.txt


# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [25]:
cat sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
seqE	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can look at just the beginning or just the end of a file

In [26]:
head -2 sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC


In [27]:
tail -2 sequences.txt 

seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
seqE	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can split up a file by columns!

In [28]:
cut -f1 sequences.txt

seqA
seqB
seqC
seqD
seqE


In [29]:
### YOU TRY: print out just the second column

### Piping 

#### You can use pipes ("|") to send the output of one command to another command as input

In [30]:
### What do you think will be output using this combination of commands?
head -3 sequences.txt | tail -2

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA


### grep: searching for specific text within a file

In [31]:
grep "TAAC" sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG


#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [32]:
grep "[B|D]" sequences.txt

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [33]:
cat sequences.txt | tr T U

seqA	CCCUAACCCCCUAACCCCCUAACCCUCAGUCGGGGAGGCGACAAUAGCUG
seqB	GUCAUAUGUUCUGUACGUUAUUGGCCAACUGAUCAUACCUGAAUCGAGCC
seqC	GAACCGGGAUUAUCAAAGACGAACAUGGUCGGGUCCUUGAACCAAACGAA
seqD	UCUCCGUCCGCUGGCGUGUUUUUCUUUUCUCAAGUGGGCAAGUUACCCGG
seqE	UCUCCGUCCGCUGGCGUGUUUUUCUUUUCUCAAGUGGGCAAGUUACCCGG


#### We can use tr to find the complement of DNA sequences

In [34]:
cut -f2 sequences.txt | tr ACGT TGCA


GGGATTGGGGGATTGGGGGATTGGGAGTCAGCCCCTCCGCTGTTATCGAC
CAGTATACAAGACATGCAATAACCGGTTGACTAGTATGGACTTAGCTCGG
CTTGGCCCTAATAGTTTCTGCTTGTACCAGCCCAGGAACTTGGTTTGCTT
AGAGGCAGGCGACCGCACAAAAAGAAAAGAGTTCACCCGTTCAATGGGCC
AGAGGCAGGCGACCGCACAAAAAGAAAAGAGTTCACCCGTTCAATGGGCC


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [35]:
cut -f2 sequences.txt | tr ACGT TGCA | rev


CAGCTATTGTCGCCTCCCCGACTGAGGGTTAGGGGGTTAGGGGGTTAGGG
GGCTCGATTCAGGTATGATCAGTTGGCCAATAACGTACAGAACATATGAC
TTCGTTTGGTTCAAGGACCCGACCATGTTCGTCTTTGATAATCCCGGTTC
CCGGGTAACTTGCCCACTTGAGAAAAGAAAAACACGCCAGCGGACGGAGA
CCGGGTAACTTGCCCACTTGAGAAAAGAAAAACACGCCAGCGGACGGAGA


## wc: word/line/character count

In [36]:
wc sequences.txt

       5      10     280 sequences.txt


#### Get just lines

In [37]:
wc -l sequences.txt

       5 sequences.txt


#### Get just words

In [38]:
wc -w sequences.txt

      10 sequences.txt


#### Get just characters

In [39]:
wc -c sequences.txt

     280 sequences.txt


## sort: sort values in a column

In [40]:
cut -f2 sequences.txt | sort 

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [41]:
cut -f2 sequences.txt | sort | uniq

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [None]:
# This will download chromosome chr21 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

#### Now we have it locally

In [None]:
ls 

### gunzip and gzip: unzipping and zipping files

In [None]:
gunzip chr21.fa.gz

In [None]:
ls

In [None]:
# Let's sneak a peak
head -194500 chr21.fa | tail

#### Ok back in the box you go

In [None]:
gzip chr21.fa

In [None]:
ls

# Advanced scripting

## For loops

In [42]:
for FILE in $(ls *)
do
    echo "check out this sweet little file called" $FILE
done

check out this sweet little file called hello.sh
check out this sweet little file called hello_you.sh
check out this sweet little file called sequences.txt
check out this sweet little file called same_old:
check out this sweet little file called sequences_copy.txt
check out this sweet little file called shiny_new


## If statements

#### Change this code up so that we can switch things up...

In [None]:
you="superenthusiastic"
if [ $you = "bored" ]
then
    echo "okay let's get on with our lives!"
fi