# Getting Started with the Terminal
------
### Learning Objectives:

+ Learn to navigate and interact with a Jupyter notebook

+ Learn how to open a terminal window in the Jupyter notebook and on your own machine

+ Understand how to navigate directory structures with absolute and relative paths

+ Establish best practices for naming files on the terminal

<span style="color:black">Watch [this video](https://www.youtube.com/watch?v=VjAwUJpgZWM&list=PLXaEJPtnQ4w7Vu7vqWbttBjUGrPp4Qa7b&index=2) for an Introduction to Submodule 1</span>


## Jupyter Notebook Basics
--------
Jupyter notebook is a web interface that leverages the python coding language. It is an excellent tool for creating and sharing coding resources as it enables one to build blocks of text called cells in either code format, markdown text format, or raw text format. 


<p align="center">
    <img src="images/jupyterNotebook_annotated.png" alt="jupyterNotebook" width="50%"/>
</p>


Cells in the code format can be selected with your mouse and executed within the notebook such that the results of the code chunk will print to your screen. The cell below is in the **code format** there are three types of lines in this cell. 

- The first line `%%bash` indicates the code that follows is in the BASH coding language.
- The second line is in green and starts with a `#` this is a comment line. Commented lines are used to describe the action the code line below will take. These lines are important because they help you think logically about what the code is doing. Importantly any line that begins with a `#` will not be executed, so if there is code that you don't want to run in a cell you can add a `#` in front of it and the code will not be executed.
- The third line is the code line, in this code we use the `echo` command to print the argument "Hello World" to the screen.

To run the code in a code cell, click on the cell and use the **Run Code** button in the menu above. The code will be executed and the results will be printed below the code cell in your notebook. You can also run the command by clicking on the code cell and using the `shift + enter` keys. 


In [None]:
%%bash

# The echo command prints the string argument to the screen
echo "Hello World"

<div class="alert alert-block alert-warning">
    <i class="fa fa-question-circle-o" aria-hidden="true"></i>
    <b>TEST YOUR SKILLS</b> 
      <p>Practice your skills in the code block below</p>
        <div style="background-color: white ; color:black; padding: 3px;">Copy the command above, but add a # symbol in front of the code line.<br><br>How does this change the output?<br>Why?<br><br> Run the #FLASHCARD code block to see the answer.</div>
    
</div>

In [None]:
%%bash

# TEST YOUR SKILLS (enter and run your answer here)

# Copy the command above but add a # symbol in front of the code line 


# How does this change the output? 
# Why?

In [None]:
# FLASHCARDS

from IPython.display import IFrame
IFrame("quiz_files/quiz1-1.html", width=600, height=350)


You may notice that each cell has a vertical line to the left of it when the cell has been selected, you can click on the vertical line and collapse the cell. The same is true for the output of the code cell. Collapsing cells comes in pretty handy when the results of the code cell are long! Another way to manage commands with lengthy results is by enabling scrolling. To do this right click anywhere within the notebook and select the option for `Enable Scrolling for Outputs`.

<p align="center">
    <img src="images/enableScrolling.png" alt="enableScrolling" width="30%"/>
</p>


**Markdown text cells** are cells with the text formatted as you might see it in a document created with a word processor. These cells make annotating the code blocks easier to read as you can **bold text** or *italicize text* in addition to using headers, creating tables, etc. The cell you're currently reading is in the markdown format and if you double click it you will notice the shorthands used to format the text and can add your own text to the cell. 

**Raw text cells** are useful for taking your own notes as you move through the lesson. 

The integration of the code cells, formatted explanations of the code in the markdown format, and ability to take your own notes in the raw text format make Jupyter notebook a powerful teaching tool for interfacing with computational data without the need for multiple windows open to interface with your data. However, Jupyter notebook is not a scalable solution for working with large genomics datasets. For these datasets you will need to interact with the terminal environment. You can open a terminal window within your Jupyter notebook by using the *File* menu and selecting *New* and *Terminal* from the drop down menu. 

<p align="center">
<img src="images/openTerminal.png" alt="openTerminal" width="50%"/>
</p>


The terminal window will be in a separate tab from your notebook but you can (and SHOULD) copy the code from the code cells of the notebook to get practice and experience with navigating the terminal environment. One thing to note when copy and pasting is that each Jupyter notebook code cell will begin with the line `%%bash`, this does not need to be copied into the terminal. This line indicates that the *BASH* coding language is used in the code cell. *BASH* is the default coding language for most Linux based machines, i.e., it comes preinstalled on your machine and is the default interface for the terminal application. The base coding language in the Jupyter notebook is python, which is why the line is required for the notebook but not in the terminal.   


## Terminal Basics
--------

The terminal environment is a program that takes commands from your keyboard and passes them to an operating system that will execute them. If you have not used a terminal environment to interact with genomic data you might have used a *Graphical User Interface (GUI)* which is a pictorial wrapper for the terminal environment. Rather than passing commands through a keyboard a GUI enables you to select preinstalled options that can be handed to the operating system. 

Interacting with a system through the terminal environment has many advantages over a GUI. The terminal allows you to quickly and easily navigate through directories on your computer, make, copy, and search files in a systematic way, and construct pipelines that will execute complex tasks on big datasets.

Importantly, the terminal allows us to do each of these in the context of bioinformatics data and bioinformatics software.

### Why Learn to Use the Terminal?

GUIs enable you to interact with your files and software in very limited ways by clicking buttons or selecting check boxes that correspond to choices you can make about how the software can run. Due to the design of the GUI the options available for any piece of software are limited to the most popular options, but these options do not represent the full potential of the software used by the GUI wrapper. These options may not be optimal for the dataset that you are working with, and the software might not be the latest version. When this happens you will run into what we call "bugs" where the GUI crashes or times out before the data are processed. In this case you may be able to update your GUI, but often you're left looking desperately for another tool that will do something similar. 


<table>
<tr><th>Advantages of using the terminal </th></tr>
<tr><td><table></table>

|Considerations|GUI|Terminal| 
|--|--|--|
|Software options available| Limited options|All possible options|
|Software version|Static|Easy to update |
|Debugging|Difficult without input from the developer|Some Google-fu required from the user with information from error logs|

</td></tr> </table>

In the terminal environment it is easy to update the software package if it crashes or times out on while processing your data. You also have access to the full suite of possibilities intended by the software developer by interacting with the software through the CLI. This enables more flexibility in your analysis and the ability to leverage options that are optimal for processing your dataset. Lastly when the software does crash there is generally an error message or a log file explaining what process caused the crash. Mitigating these issues requires a little "google-fu" on your part combing through stack exchange messages where previous users of the same software got the same error message and have implemented various fixes, more on this in submodule 7 **Error mitigation**. 



### Getting Started in the Terminal Environment

There are different types of terminal environments, however the most common is *BASH* (the *Bourne Again Shell*). As I mentioned previously this is the type of terminal used in the Jupyter notebook. You also have access to a terminal environment on your local machine.

- On a Mac or Linux system, the *Terminal* application provides access to the shell. There are also applications that you can download that provide customization not present in the Terminal application, such as [iTerm2](https://iterm2.com/).
- On a Windows system, you can use an application such as [MobaXterm](https://mobaxterm.mobatek.net/).

When you open your terminal window you will be presented with the command prompt `$` where you are able to input commands. If the terminal is busy and cannot currently accept new commands, you will not be presented with the prompt.

When the prompt is shown, you can enter commands by typing them in after the prompt. Commands are typically composed of three components:  
- the command itself  
- any flags or options you wish to run the command with (not always required)
- and an argument


<p align="center">
  <img src="images/terminal_annotated.png" width="60%"/>
</p>


In the above example, we are asking the terminal to pass the `mkdir` command to the operating system (for making directories) with the `-p` option (which just let's us make parent and sub directories at the same time) and the argument detailing the name of the directories we want the command to make.

Manual pages for specific commands can be accessed using the `man` command. You can run this command by selecting the code cell below and pressing `shift + return` to execute the command in the notebook. You can also copy the command and paste it into your terminal window tab. The output of the command will be identical.


In [None]:
%%bash

# Look at the manual for the mkdir command
man mkdir

Notice that in the terminal window to exit out of the manual we had to use the `q` to get the prompt to return. Another way of looking at a summary of the information in the manual page is with the `--help` flag. Try typing `mkdir --help` in the terminal window. The information in the manual pages tends to be more clearly laid out and better organized, but the `--help` flag is a quick and easy way to remind yourself of the flags available.

The terminal has a number of commands that allow us to explore our current working directory, as well as change the current working directory to another location. 

For example:


In [None]:
%%bash

# 'ls' command lists files in our current working directory
ls

In [None]:
%%bash

# Run ls with the '-a' flag to include hidden files
ls -a

In [None]:
%%bash

# Now let's check the permissions of files in the current working directory 
ls -la

# In the command above I combined 2 flags 'l' and 'a' behind a single hyphen I could also have written that command as 
#ls -l -a

In [None]:
%%bash

# let's look at all of the options available with the 'ls' command
ls --help

<div class="alert alert-block alert-warning">
    <i class="fa fa-question-circle-o" aria-hidden="true"></i>
    <b>TEST YOUR SKILLS</b> 
      <p>Practice your skills in the code block below</p>
        <div style="background-color: white ; color:black; padding: 3px;"> Follow the prompts to create commands that leverage different flags of the ls command: <br><br> 1. Write a command that uses the long format to list files but with a human readable format. <br>2. Write a command that lists the files in the reverse order (including hidden files). <br> 3. Write a command that lists the contents of the directory called 'figures'.</div>
    
</div>

In [None]:
%%bash

# TEST YOUR SKILLS - (enter and run your answers here) 
#Follow the prompts to create commands that leverage different flags of the ls command:

# Write a command that uses the long format to list files but with a human readable format

# Write a command that lists the files in the reverse order (including hidden files)

# Write a command that lists the contents of the directory called 'figures'


## Paths, Where Is Your Data Stored?
-------------------

<span style="color:black">**Learn more by watching [this video](https://www.youtube.com/watch?v=1UsjiH4h7iA&list=PLXaEJPtnQ4w7Vu7vqWbttBjUGrPp4Qa7b&index=3)**</span>

A path is the address that indicates where your data are stored. If I told you that my data was stored in a directory (folder) called `data` , you would understand that you should search for a folder on my computer called `data`. In some cases there may be multiple directories called data and we would need to distinguish which directory contains the data of interest. In the image below there is a directory (folder) called `data_analysis`, which contains two directories called `project-1` and `project-2`. Each project directory contains another directory called `data`.

<p align="center">
    <img src="images/directoryStructure.png" alt="jupyterNotebook" width="30%"/>
</p>

The path `data_analysis/project-1/data/` refers specifically to the `data` directory in `project-1`. Each directory and sub-directory in the path are separated by the forward slash `/` to indicate the path through the directories to the directory of interest. For most bioinformatic software you will need to submit the location of the files you would like to analyze using a path. 

The command `pwd` in BASH is used to print the path of the current directory, `pwd` stands for *print working directory*. Test out the **print working directory** `pwd` command below to see what is returned.


In [None]:
%%bash

# Run `pwd` to find out where we are in the virtual machine
pwd

In [None]:
%%bash

# Run the `ls` command to look at what the organization of the working directory looks like
ls

You can see that we are in the `jupyter` directory and this directory is inside a directory called `home`. 

Within `jupyter` there is another directory called `images`. We can move into that directory with the `cd` *change directory* command and check our path again.


In [None]:
%%bash

# Move into the images directory
cd images

#Check the path to your current working directory
pwd

In [None]:
%%bash

# Now let's make a new folder in this directory
mkdir -p images/new_folder

# Navigate to the new folder and print path
cd images/new_folder
pwd

### Absolute vs. Relative Paths

The command `pwd` returns the **absolute path** to your current working directory which is the list of all directories and subdirectories to get from the current directory to the root or home directory. You can see that absolute paths can get long and unwieldy, especially if you have very detailed or long directory names. 

One "shortcut" that makes navigating the command line a bit easier is using a **relative path**. A **relative path** uses the directory structure (which we can see in our absolute path returned by the command `pwd`) to move up or down through directories using shortcuts. One very common shortcut is `..` which translates to the directory one level "above" your current directory. 

From `/home/jupyter/images/new_folder` we could get back to `/home/jupyter/images` using the **absolute path** with the command `cd /home/jupyter/images`or we can use the **relative path** `cd ../`. 

This shortcut saves a lot of time and typing BUT it requires that you have a good understanding of where you are in your directory structure, so do not be shy about using the `pwd` command. 

A key difference to remember the **absolute path** will always point you to the same location regardless of your current working directory. A **relative path** like `cd ../` will always point to the directory above your current working directory but your new location will be *relative* to your current working directory. 


<div class="alert alert-block alert-warning">
    <i class="fa fa-question-circle-o" aria-hidden="true"></i>
    <b>TEST YOUR SKILLS</b> 
      <p>Practice your skills in the code block below</p>
    <div style="background-color: white ; color:black; padding: 3px;">1. Create a directory in /home/jupyter called: my_directory<br>2. Create a directory in /home/jupyter called: new_dir<br>3. Practice using `..` to move through your directories<br><br> Run the #FLASHCARD code block to see the answers.</div>
    
</div>

In [None]:
%%bash
#  TEST YOUR SKILLS (enter and run you answers here)

# Create a directory in /home/jupyter called my_directory

# Create a directory in /home/jupyter called new_dir

# Practice using the shortcut `..` to move through 


In [None]:
#FLASHCARD
from IPython.display import IFrame
IFrame("quiz_files/quiz1-2.html", width=600, height=350)

<div class="alert alert-block alert-warning">
    <i class="fa fa-question-circle-o" aria-hidden="true"></i>
    <b>TEST YOUR SKILLS</b> 
      <p>Practice your skills in the code block below</p>
    <div style="background-color: white ; color:black; padding: 3px;">1. Navigate to the directory you just made called my_directory using an absolute path<br>2. Check where you are with pwd<br>3. Use a relative path to navigate to /home/jupyter/new_dir<br><br> Run the #FLASHCARD code block to see the answer.</div>
    
</div>

In [None]:
%%bash
# TEST YOUR SKILLS (enter and run your answers here)

# Navigate to the directory /home/directory/images/new_folder with an absolute path

# Check where you are with pwd

# Use a relative path to navigate to the directory /home/jupyter/my_directory

In [None]:
# FLASHCARD
from IPython.display import IFrame
IFrame("quiz_files/quiz1-3.html", width=600, height=350)

## File Naming Considerations
---------

You might have noticed that all of my filenames use `_` or `-` to separate words rather than spaces. Within commands spaces are generally used to separate arguments. Spaces within filenames will need to be *escaped* with the back slash symbol `\`in the terminal environment. That is we will need to explicitly indicate that the space is part of the filename and NOT an indication that a second argument is being provided to the command.

To see what I mean let's try creating a new empty file with spaces in the name. To create our new empty file we will use the `nano` text editor application. This application isn't available in the Jupyter notebook so follow the steps below in your terminal window.

1. Type `nano` into the terminal window
2. Copy and paste the following into the window : `Some silly text to demo bad file naming syntax`
3. Use the `ctrl + X` keys to exit the nano text editor
4. Use the `Y` key to indicate that you would like to save this file
5. Copy and paste this name into the text editor `Some filename with spaces.txt`


Now let's have a look at the first ten lines of our file with the command `head`.

In [None]:
%%bash

# Look at the first ten lines of our file with the head command 
head Some filename with spaces.txt


The error you received was because the spaces were not escaped, so the command interpreted each line of text separated by a space as a new argument. Essentially this command was looking for one file called "Some", one called "filename", one called "with", and one more called "spaces.txt". 

Now let's use the `head` command again but this time with the spaces escaped.

In [None]:
%%bash

# Look at the first ten lines of our file with the head command
head Some\ filename\ with\ spaces.txt


You can see what I mean by spaces in filenames being annoying, this is especially true if spaces are in both directory names and filenames, your paths will get confusing. Instead most programmers will opt to use a substitute for the space :

**1. Using _ instead of spaces:** <br>
 filename_with_no_spaces.txt <br>
**2. Using . instead of spaces:** <br>
 filename.with.no.spaces.txt<br>
**3. Using - instead of spaces:**<br>
 filename-with-no-spaces.txt<br>
**4. Using camel case, every word is capitalized to distinguish a new word starting:**<br>
 fileNameWithNoSpaces.txt<br>
