# Terminal Commands and Bash Scripting

Cheatsheets to refer to:

- This notebook (Adapted from http://people.duke.edu/~ccc14/duke-hts-2018/cliburn/Bash_in_Jupyter.html)

- [GettingGeneticsDone.blogspot.com](GettingGeneticsDone.blogspot.com): (via Tufts.edu) Basic cheatsheet for the most common Bash commands

- [https://devhints.io/bash](https://devhints.io/bash): Complete-ish vignette of common Bash commands


## This notebook provides a number of useful commands which you can refer to when navigating the terminal. 
- These are written in Bash, NOT Python (as we were working with in previous notebooks). 
- We are able to use bash commands in limited fashion within Jupyter notebooks by appending the "!" character to the start of each command (ie. "ls" inside a terminal window is equivalent to "! ls" inside a Jupyter notebook)
- Feel free to explore these commands interactively within this notebook, or within a terminal window (Launcher -> Other -> Terminal)


## pwd: where am I?

In [1]:
! pwd

/home/jovyan/work


<img src="public-data/1_programming/img/whoami.jpg">

## To continue the existential questions...

In [2]:
! whoami

jovyan


## cd: navigate between directories

* cd <dirname> --> go to a directory

In [3]:
! cd /home/jovyan/work/img;pwd

/bin/bash: line 0: cd: /home/jovyan/work/img: No such file or directory
/home/jovyan/work


###### You can go back *up* a level with the ".." notation

In [4]:
! cd ..;pwd

/home/jovyan


In [5]:
! pwd

/home/jovyan/work


Let's go back into our playground

## echo: print out a string

In [6]:
! echo "is there anybody out there?"

is there anybody out there?


## ls: check out the contents of a folder

In [7]:
! ls

 1_Intro_to_Programming.ipynb			 public-data
'2_Intro to Pandas.ipynb'			 shared-data
 3_Python_Packages_demo.ipynb			 Welcome.md
'4_Terminal Commands and Bash Scripting.ipynb'


#### extra tags get you more information

In [8]:
! ls -lh

total 336K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


#### what if we only want to list files with a certain extension?

In [9]:
# The asterisk is a "wild card" that matches any character
! ls img/*.png

ls: cannot access 'img/*.png': No such file or directory


In [10]:
! ls -lh img/*.png

ls: cannot access 'img/*.png': No such file or directory


In [11]:
### YOU TRY: list only the files that end in .txt

## mkdir: make a directory

In [12]:
! mkdir shiny_new

##### Look it's there!

In [13]:
! ls -lh

total 340K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  shiny_new
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


## cp: Copy a file

In [14]:
! cp data/hg19.chrom.sizes.txt shiny_new/hg19.chrom.sizes.txt.copy
! ls -lh shiny_new

cp: cannot stat 'data/hg19.chrom.sizes.txt': No such file or directory
total 0


#### cp -r to copy a directory and all of its contents

In [15]:
! cp -r shiny_new same_old


In [16]:
# So now we should have two new directories in our bash playground: same_old and shiny_new
! ls -lh

total 344K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  same_old
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  shiny_new
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


In [17]:
! ls -lh same_old

total 0


In [18]:
# And shiny new should contain the same contents as same old
! ls -lh shiny_new

total 0


## touch: make an empty file

In [19]:
! touch empty_inside.txt

In [20]:
# Don't read into that name too much. Just trust me, it's empty. Look at the size. Or go check it yourself later!
! ls -lh

total 348K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
-rw-r--r-- 1 jovyan  1001    0 Jun 29 00:24  empty_inside.txt
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  same_old
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  shiny_new
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


## mv: Move or rename a file
#### if you use mv where the destination is in the same folder you will simply rename the file

In [21]:
# You before this lesson --> you after this lesson? 
! mv empty_inside.txt fulfilled.txt

In [22]:
! ls -lh

total 348K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
-rw-r--r-- 1 jovyan  1001    0 Jun 29 00:24  fulfilled.txt
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  same_old
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  shiny_new
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


### If you use mv where the destination is in a different folder you will move the file

In [23]:
! mv fulfilled.txt shiny_new

In [24]:
# Not here at the main folder level anymore...
! ls -lh

total 344K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  same_old
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 29 00:24  shiny_new
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


In [25]:
# Here!
! ls -lh shiny_new

total 4.0K
-rw-r--r-- 1 jovyan 1001 0 Jun 29 00:24 fulfilled.txt


### With great power comes great responsibility... use this carefully! There is no "undo" in terminal...
# rm: delete file

In [26]:
! rm shiny_new/fulfilled.txt

In [27]:
! ls -lh shiny_new

total 0


### rm -r: delete directory and all of its contents

In [28]:
! rm -r shiny_new
! rm -r same_old

In [29]:
### YOU TRY: delete the directory "same_old" 



In [30]:
! ls -lh

total 336K
-rw-r--r-- 1 jovyan  1001  39K Jun 29 00:20  1_Intro_to_Programming.ipynb
-rw-r--r-- 1 jovyan  1001  93K Jun 29 00:22 '2_Intro to Pandas.ipynb'
-rw-r--r-- 1 jovyan  1001 145K Jun 29 00:24  3_Python_Packages_demo.ipynb
-rw-r--r-- 1 jovyan  1001  38K Jun 29 00:20 '4_Terminal Commands and Bash Scripting.ipynb'
drwxr-sr-x 8 jovyan users 6.0K Jun 28 23:16  public-data
drwxr-sr-x 2 jovyan  1001 6.0K Jun 28 20:55  shared-data
-r--r--r-- 1 root   root   361 Jun 29 00:15  Welcome.md


# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [31]:
! cat data/sequences.txt

cat: data/sequences.txt: No such file or directory


### You can look at just the beginning or just the end of a file

In [32]:
! head -2 data/sequences.txt

head: cannot open 'data/sequences.txt' for reading: No such file or directory


In [33]:
! tail -2 data/sequences.txt 

tail: cannot open 'data/sequences.txt' for reading: No such file or directory


### You can split up a file by columns!

In [34]:
! cut -f1 data/sequences.txt

cut: data/sequences.txt: No such file or directory


In [35]:
### YOU TRY: print out just the second column



### Piping 

<img src="public-data/1_programming/img/mariopipe.png">

#### You can use pipes ("|") to send the output of one command to another command as input

In [36]:
### What do you think will be output using this combination of commands?
! head -3 data/sequences.txt | tail -2

head: cannot open 'data/sequences.txt' for reading: No such file or directory


In [37]:
### NOW YOU TRY: retrieve just the second column of the third line (seq C) from the sequences.txt file using 
# a combination of tail, head, and cut with a couple of pipes in the mix.



### grep: searching for specific text within a file

In [38]:
! grep "TAAC" data/sequences.txt

grep: data/sequences.txt: No such file or directory


#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [39]:
! grep "[B|D]" data/sequences.txt

grep: data/sequences.txt: No such file or directory


In [40]:
### NOW YOU TRY: Okay, try finding the line that contains "TAAC" plus the line after it using grep.




### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [41]:
! cat data/sequences.txt | tr T U

cat: data/sequences.txt: No such file or directory


#### We can use tr to find the complement of DNA sequences

In [42]:
! cut -f2 data/sequences.txt | tr ACGT TGCA


cut: data/sequences.txt: No such file or directory


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [43]:
! cut -f2 data/sequences.txt | tr ACGT TGCA | rev


cut: data/sequences.txt: No such file or directory


## wc: word/line/character count

In [44]:
! wc data/sequences.txt

wc: data/sequences.txt: No such file or directory


#### Get just lines

In [45]:
! wc -l data/sequences.txt

wc: data/sequences.txt: No such file or directory


#### Get just words

In [46]:
! wc -w data/sequences.txt

wc: data/sequences.txt: No such file or directory


#### Get just characters

In [47]:
! wc -c data/sequences.txt

wc: data/sequences.txt: No such file or directory


## sort: sort values in a column

In [48]:
! cut -f2 data/sequences.txt | sort 

cut: data/sequences.txt: No such file or directory


## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [49]:
! cut -f2 data/sequences.txt | sort | uniq

cut: data/sequences.txt: No such file or directory


# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [50]:
# This will download chromosome chr21 
! wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

--2022-06-29 00:24:43--  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12709705 (12M) [application/x-gzip]
Saving to: ‘chr21.fa.gz’


2022-06-29 00:24:43 (30.9 MB/s) - ‘chr21.fa.gz’ saved [12709705/12709705]



#### Now we have it locally

In [51]:
! ls 

 1_Intro_to_Programming.ipynb			 chr21.fa.gz
'2_Intro to Pandas.ipynb'			 public-data
 3_Python_Packages_demo.ipynb			 shared-data
'4_Terminal Commands and Bash Scripting.ipynb'	 Welcome.md


### gunzip and gzip: unzipping and zipping files

In [52]:
! gunzip chr21.fa.gz

In [53]:
! ls

 1_Intro_to_Programming.ipynb			 chr21.fa
'2_Intro to Pandas.ipynb'			 public-data
 3_Python_Packages_demo.ipynb			 shared-data
'4_Terminal Commands and Bash Scripting.ipynb'	 Welcome.md


In [54]:
# Let's sneak a peak
! head -194500 chr21.fa | tail

TAGACCTTCAACAGTAAGTCAGTTTCACAATACTATTTTTAAATTTCCTA
TTAAAATATCACTCTATTTCTTAGTATATCACTTTGGCATATCTGCTTCT
TTCTCTGTATTAATAAATAGCGCATACAGTTTGCCTTTGGTACTTTGTAC
AATGTTGTTTATCTCAGTGTAAATTGGTAGCGTGTCCACAAAGGCGATTG
GAGTGTGAGGCGTGAGTCCTTAGGAGCCTGTCTGCCATCTAAGCCCTGTT
AGCATTTTCCTTTACTAATGTTGGGGTGGGGGGACCTCAGAAGGGGCACA
GCAAGCATATGAAAGTTTTGTTACAGAGATGCCAGTATTTGTCCTTAGAA
CAGGTCCAGTTGACAAAGGCACTGCAGGATATGAAAGATTCTCATTACAA
TGTCACGGCAACATGACTGAAATTATTAACTCTCCACGTGGGATGATGGA
TGGTATAGGGTGGAGATGTCCTTGGCAGAACATGTTGCTTAATTATCTTC


#### Ok back in the box you go

In [55]:
! gzip chr21.fa

In [56]:
! ls

 1_Intro_to_Programming.ipynb			 chr21.fa.gz
'2_Intro to Pandas.ipynb'			 public-data
 3_Python_Packages_demo.ipynb			 shared-data
'4_Terminal Commands and Bash Scripting.ipynb'	 Welcome.md


#### Remove the files so we are back to square one

In [57]:
! rm chr21.fa.gz