# Terminal Commands and Bash Scripting

Adapted from http://people.duke.edu/~ccc14/duke-hts-2018/cliburn/Bash_in_Jupyter.html 

## pwd: where am I?

In [12]:
pwd

/Users/erickofman/UCSD/cmm262-2021/module-1-programming/bash_playground


<img src="img/whoami.jpg">

## To continue the existential questions...

In [118]:
whoami

erickofman


## cd: navigate between directories

* cd <dirname> --> go to a directory

In [14]:
cd bash_playground

bash: cd: bash_playground: No such file or directory


: 1

In [15]:
# Now where are we?
pwd

/Users/erickofman/UCSD/cmm262-2021/module-1-programming/bash_playground


## echo: print out a string

In [16]:
echo "is there anybody out there?"

is there anybody out there?


## ls: check out the contents of a folder

In [18]:
ls

hello.sh	hello_you.sh	sequences.txt


#### extra tags get you more information

In [19]:
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt


#### what if we only want to list files with a certain extension?

In [20]:
# The asterisk is a "wild card" that matches any character
ls *.sh

hello.sh	hello_you.sh


In [27]:
ls -lh *.sh

-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh


In [28]:
### YOU TRY: list only the files that end in .txt

## mkdir: make a directory

In [29]:
mkdir shiny_new

##### Look it's there!

In [31]:
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  2 erickofman  staff    64B Jan  3 10:46 shiny_new


## cp: Copy a file

In [32]:
cp sequences.txt shiny_new/sequences_copy.txt
ls -lh shiny_new

total 8
-rw-r--r--  1 erickofman  staff   280B Jan  3 10:47 sequences_copy.txt


#### cp -r to copy a directory and all of its contents

In [33]:
cp -r shiny_new same_old


sequences_copy.txt


In [36]:
# So now we should have two new directories in our bash playground: same_old and shiny_new
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 shiny_new


In [37]:
ls -lh same_old

sequences_copy.txt


In [41]:
# And shiny new should contain the same contents as same old
ls -lh shiny_new

total 8
-rw-r--r--  1 erickofman  staff   280B Jan  3 10:47 sequences_copy.txt


## touch: make an empty file

In [44]:
touch empty_inside.txt

In [46]:
# Don't read into that name too much. Just trust me, it's empty. Look at the size. Or go check it yourself later!
ls -lh

total 24
-rw-r--r--  1 erickofman  staff     0B Jan  3 10:48 empty_inside.txt
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 shiny_new


## mv: Move or rename a file
#### if you use mv where the destination is in the same folder you will simply rename the file

In [47]:
# You before this lesson --> you after this lesson? 
mv empty_inside.txt fulfilled.txt

In [48]:
ls -lh

total 24
-rw-r--r--  1 erickofman  staff     0B Jan  3 10:48 fulfilled.txt
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 shiny_new


### If you use mv where the destination is in a different folder you will move the file

In [49]:
mv fulfilled.txt shiny_new

In [50]:
# Not here at the main folder level anymore...
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt
drwxr-xr-x  4 erickofman  staff   128B Jan  3 10:49 shiny_new


In [51]:
# Here!
ls -lh shiny_new

total 8
-rw-r--r--  1 erickofman  staff     0B Jan  3 10:48 fulfilled.txt
-rw-r--r--  1 erickofman  staff   280B Jan  3 10:47 sequences_copy.txt


### With great power comes great responsibility... use this carefully! There is no "undo" in terminal...
# rm: delete file

In [52]:
rm shiny_new/fulfilled.txt

In [54]:
ls -lh shiny_new

total 8
-rw-r--r--  1 erickofman  staff   280B Jan  3 10:47 sequences_copy.txt


### rm -r: delete directory and all of its contents

In [55]:
rm -r shiny_new


In [59]:
### YOU TRY: delete the directory "same_old" 



In [60]:
ls -lh

total 24
-rw-r--r--  1 erickofman  staff    32B Dec 31 09:32 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 31 09:32 hello_you.sh
drwxr-xr-x  3 erickofman  staff    96B Jan  3 10:47 same_old
-rw-r--r--  1 erickofman  staff   280B Dec 31 09:32 sequences.txt


# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [61]:
cat sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
seqE	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can look at just the beginning or just the end of a file

In [62]:
head -2 sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC


In [63]:
tail -2 sequences.txt 

seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
seqE	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can split up a file by columns!

In [64]:
cut -f1 sequences.txt

seqA
seqB
seqC
seqD
seqE


In [65]:
### YOU TRY: print out just the second column



### Piping 

<img src="img/mariopipe.png">

#### You can use pipes ("|") to send the output of one command to another command as input

In [68]:
### What do you think will be output using this combination of commands?
head -3 sequences.txt | tail -2

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA


In [69]:
### NOW YOU TRY: retrieve just the second column of the third line (seq C) from the sequences.txt file using 
# a combination of tail, head, and cut with a couple of pipes in the mix.



### grep: searching for specific text within a file

In [71]:
grep "TAAC" sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG


#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [72]:
grep "[B|D]" sequences.txt

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


#### Use the "man" keyword before any command to learn more about how to use it

In [91]:
man grep 


GREP(1)                   BSD General Commands Manual                  GREP(1)

NAME
     grep, egrep, fgrep, zgrep, zegrep, zfgrep -- file pattern searcher

SYNOPSIS
     grep [-abcdDEFGHhIiJLlmnOopqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
          [-e pattern] [-f file] [--binary-files=value] [--color[=when]]
          [--colour[=when]] [--context[=num]] [--label] [--line-buffered]
          [--null] [pattern] [file ...]

DESCRIPTION
     The grep utility searches any given input files, selecting lines that
     match one or more patterns.  By default, a pattern matches an input line
     if the regular expression (RE) in the pattern matches the input line
     without its trailing newline.  An empty expression matches every line.
     Each input line that matches at least one of the patterns is written to
     the standard output.

     grep is used for simple patterns and basic regular expressions (BREs);
     egrep can handle extended regular expressions (EREs).  See re_format(7)


             until a match has been found, making searches potentially less
             expensive.

     -R, -r, --recursive
             Recursively search subdirectories listed.

     -S      If -R is specified, all symbolic links are followed.  The default
             is not to follow symbolic links.

     -s, --no-messages
             Silent mode.  Nonexistent and unreadable files are ignored (i.e.
             their error messages are suppressed).

     -U, --binary
             Search binary files, but do not attempt to print them.

     -V, --version
             Display version information and exit.

     -v, --invert-match
             Selected lines are those not matching any of the specified pat-
             terns.

     -w, --word-regexp
             The expression is searched for as a word (as if surrounded by
             `[[:<:]]' and `[[:>:]]'; see re_format(7)).

     -x, --line-regexp
             Only input lines selected against an entire fixed string or regu-
 

In [92]:
### NOW YOU TRY: Okay, try finding the line that contains "TAAC" plus the line after it using grep.




### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [102]:
cat sequences.txt | tr T U

seqA	CCCUAACCCCCUAACCCCCUAACCCUCAGUCGGGGAGGCGACAAUAGCUG
seqB	GUCAUAUGUUCUGUACGUUAUUGGCCAACUGAUCAUACCUGAAUCGAGCC
seqC	GAACCGGGAUUAUCAAAGACGAACAUGGUCGGGUCCUUGAACCAAACGAA
seqD	UCUCCGUCCGCUGGCGUGUUUUUCUUUUCUCAAGUGGGCAAGUUACCCGG
seqE	UCUCCGUCCGCUGGCGUGUUUUUCUUUUCUCAAGUGGGCAAGUUACCCGG


#### We can use tr to find the complement of DNA sequences

In [103]:
cut -f2 sequences.txt | tr ACGT TGCA


GGGATTGGGGGATTGGGGGATTGGGAGTCAGCCCCTCCGCTGTTATCGAC
CAGTATACAAGACATGCAATAACCGGTTGACTAGTATGGACTTAGCTCGG
CTTGGCCCTAATAGTTTCTGCTTGTACCAGCCCAGGAACTTGGTTTGCTT
AGAGGCAGGCGACCGCACAAAAAGAAAAGAGTTCACCCGTTCAATGGGCC
AGAGGCAGGCGACCGCACAAAAAGAAAAGAGTTCACCCGTTCAATGGGCC


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [104]:
cut -f2 sequences.txt | tr ACGT TGCA | rev


CAGCTATTGTCGCCTCCCCGACTGAGGGTTAGGGGGTTAGGGGGTTAGGG
GGCTCGATTCAGGTATGATCAGTTGGCCAATAACGTACAGAACATATGAC
TTCGTTTGGTTCAAGGACCCGACCATGTTCGTCTTTGATAATCCCGGTTC
CCGGGTAACTTGCCCACTTGAGAAAAGAAAAACACGCCAGCGGACGGAGA
CCGGGTAACTTGCCCACTTGAGAAAAGAAAAACACGCCAGCGGACGGAGA


## wc: word/line/character count

In [105]:
wc sequences.txt

       5      10     280 sequences.txt


#### Get just lines

In [106]:
wc -l sequences.txt

       5 sequences.txt


#### Get just words

In [107]:
wc -w sequences.txt

      10 sequences.txt


#### Get just characters

In [108]:
wc -c sequences.txt

     280 sequences.txt


## sort: sort values in a column

In [109]:
cut -f2 sequences.txt | sort 

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [110]:
cut -f2 sequences.txt | sort | uniq

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [None]:
# This will download chromosome chr21 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

#### Now we have it locally

In [None]:
ls 

### gunzip and gzip: unzipping and zipping files

In [None]:
gunzip chr21.fa.gz

In [None]:
ls

In [None]:
# Let's sneak a peak
head -194500 chr21.fa | tail

#### Ok back in the box you go

In [None]:
gzip chr21.fa

In [None]:
ls

# Advanced scripting

## For loops

In [111]:
# Remember, ls * gives us a list of all files in the curent directory.

In [113]:
for FILE in $(ls *)
do
    echo "check out this sweet little file called" $FILE
done

check out this sweet little file called hello.sh
check out this sweet little file called hello_you.sh
check out this sweet little file called sequences.txt
check out this sweet little file called same_old:
check out this sweet little file called sequences_copy.txt


## If statements

#### Change this code up so that we can switch things up...

In [121]:
you="super crazy enthusiastic about all this bash stuff"
if [ $you = "bored" ]
then
    echo "okay on to the next adventure!"
fi

bash: [: too many arguments


In [125]:
for name in "Dolly" "Kitty" "Poppet";
    do echo $name;
done;

Dolly
Kitty
Poppet
