<img src=images/ucsc_banner.png width="500">

# Command Line and Git

## Command line interface

The command line interface (CLI) is a powerful way to interact with your computer's operating system. In contrast a graphical user interface (GUI) may be more convenient to use, but you have less control in how you use the software. Also, most cutting-edge research tools are first developed for command line interfaces. 

<img src=images/gvng.jpg height="75%">
*http://www.datacarpentry.org/shell-genomics/lessons/01_the_filesystem.html

The shell takes commands from the command line and converts them into instructions for your computer to execute. There are many shells out there. The most popular shell is called bash, but other popular shells include zsh, csh, and tcsh. We are going to use bash here, but I recommend trying out zsh. The shell is started in a terminal emulator and how you start your terminal session depends on your operating system. Macs and linux machines usually have a terminal program pre-installed, but windows users will need to download a terminal emulator.

### Mac
Applications -> Utilities -> Terminal

### Windows
Download and install a terminal emulator like PuTTY or MobaXterm

### Linux 
On Ubuntu there is a shortcut: ctrl-alt-t

## Basic Commands

We're going to be working with data on a remote linux server. The interface is entirely in text, so we will need to use the command line. Log onto your server using SSH and then we are going to explore the filesystem. The first command we are going to look at is **ls** - list directory contents

<img src=images/command_line.jpg>


Try these different flags:
* ls --help
* ls --version
* ls -lha

The manual page for **ls** can be found using this command:
```
man ls
```
You can search the manual page by typing / and then a search term. Try searching for what -h does by typing /-h and hitting enter. You can move the cursor to the next instance of the search term by typing n. Shift-N searches in the reverse direction.

Another important command is **cd** - change directory. First, let's go to the root of the filesystem. This can be done using:
```
cd /
```
Use ls to list all of the files in your root directory. These folders have all of the programs and configuration files that make up your operating system. 

### Pick a program in /bin and look at it's manual page

<img src=images/filesystem.jpg>
*http://www.blackmoreops.com/wp-content/uploads/2015/02/Linux-file-system-hierarchy-Linux-file-structure-optimized.jpg


To go back to your home directory type:
```
cd ~
```
Another important utility is **mkdir** - make directory. Let's make a directory in our home directory called **bin**.
```
mkdir bin 
```
Your ~/bin directory is a good place to store executables.

# More shell commands
Every program on the command line has
    1. STDIN = data fed into the program
    2. STDOUT = output of the program that by default is printed to the terminal
    3. STDERR = error messages, that also default to the terminal

The ">" symbol allows us to redirect the STDOUT to a file, instead of printing it to the screen. 

We can run unix shell commands within jupyter notebooks by adding a "!" in front of the command. 

### Redirecting command output to a file

In [2]:
#list the files in the current directory
!ls

1-Introduction.ipynb                Pipfile
2-Shell-and-Git.ipynb               TODO
3-Anaconda-Pandas-Dataframes.ipynb  [34mdata[m[m
4-Evolution-and-Mutations.ipynb     [34mimages[m[m
5-Visualization.ipynb               install.sh
6-Next-Generation-Sequencing.ipynb  style-notebook.css
Bonus-SSH-and-Cloud-Computing.ipynb style-table.css


In [3]:
#save the output of the ls to a file 
!ls > ls-file.txt

In [4]:
#read the file
!cat ls-file.txt

1-Introduction.ipynb
2-Shell-and-Git.ipynb
3-Anaconda-Pandas-Dataframes.ipynb
4-Evolution-and-Mutations.ipynb
5-Visualization.ipynb
6-Next-Generation-Sequencing.ipynb
Bonus-SSH-and-Cloud-Computing.ipynb
Pipfile
TODO
data
images
install.sh
ls-file.txt
style-notebook.css
style-table.css


### Piping

We can use the "|" symbol to "pipe" the output of one command into the input of another command.

In [5]:
#piping the ls outputs to the head commnad, which shows the first 5 lines
!ls | head -5

1-Introduction.ipynb
2-Shell-and-Git.ipynb
3-Anaconda-Pandas-Dataframes.ipynb
4-Evolution-and-Mutations.ipynb
5-Visualization.ipynb


In [6]:
#adding another pipe, using tail to read the last line
!ls | head -5 |tail -1

5-Visualization.ipynb


In [7]:
#we can now even add this new output to the ls-file we previously made by using ">>"
!ls | head -5 |tail -1 >> ls-file.txt

In [8]:
#read the updated file
!cat ls-file.txt

1-Introduction.ipynb
2-Shell-and-Git.ipynb
3-Anaconda-Pandas-Dataframes.ipynb
4-Evolution-and-Mutations.ipynb
5-Visualization.ipynb
6-Next-Generation-Sequencing.ipynb
Bonus-SSH-and-Cloud-Computing.ipynb
Pipfile
TODO
data
images
install.sh
ls-file.txt
style-notebook.css
style-table.css
5-Visualization.ipynb


## Pro Practice

1. Count the number of lines in data/titles.csv (look at **wc**)

2. Use the **find** command to locate a file called cpuinfo and report the number of cpus on the machine.  

3. Pipe several commands together to sort the titles in data/titles.csv. Try to remove the comma and the date at the end of the line without modifying the file (use **cat**, **sed** or **awk** and **sort**)

# Version control with Git

A great way to merge code from several collaborators and to fix broken code while also developing new features is to use a version control system called **git**. Git keeps a history of code changes in a source repository that you can share with other programmers.

<img src=images/git_fig.png width="500">
*https://www.atlassian.com/git/tutorials/why-git/git-for-developers

## Install git
Many computers do not come with git by default. To install git, follow instructions at the link [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).

On Linux, use this command to install git.
```
sudo apt-get install git
```
This is a command to install many different kinds of software. It will show you all of the software to be installed and ask you if you want to continue with the installation. Type Y and enter. The sudo command gives you special permission to modify important files and directories. With great power comes great responsibility, so be careful whenever you use the sudo command!

Now enter **git** into the command line and you should see all of the git commands you can run.

## Make a github account

Go to [github](https://github.com/) and click the sign up button. This is a great place to save your code in case your laptop breaks or gets stolen. It's also a great place to collaborate on projects and to share your code with others. Git and github have special vocabulary you will need to learn to get the most out of this software. For instance, you store your code in what is called a github repository or repo.

Once you have your github account set up, go to the [workshop repository](https://github.com/bsaintjo/BD2K-Summer-Workshop). Here is all of the code we prepared for the workshop. To make your own working copy of the repository, **fork** it by clicking the fork button. Now you own a copy of the repo you can download and make changes to. To download your workshop repo go to the terminal session that is connected to the remote server and type in this command:
```
git clone https://url/to/your/repo
```
Let's **cd** into the repo and list the files using **ls**. Let's check the status of the repo using this command:
```
git status
```
Now let's make a change to the repo. We will add a file that lists the contributors to the repo.
```
echo John Smith > contributors.txt 
```
Run ```git status``` again and you should see that git recognizes a new file in your workspace. Now we need to add the file to the git index.
```
git add contributors.txt
```

Now contributors.txt is in the index and next we want to commit the file to your local repository. At this point, we are going to add a message to our commit so that we can keep track of all the commits in the repository.
```
git commit -m 'Add a contributors file'
```
Now that you have committed changes to your local repository, let's copy these changes to your github repository. To do this, we **push** changes to the remote repository. The remote repository is called the **origin** and **master** is the branch.
```
git push origin master
```
Git will prompt you for a github username and password. If someone else makes changes to your github repository, you will need to update your local repository using the **pull** command.
```
git pull origin master
```
When you start working on a new feature in a git repo, it's a good idea to work from a separate branch. This way, if you break something you can just delete the branch and switch back to the master branch. If you want to add your changes to master you can **rebase** your branch onto the master branch. To make a new branch, use the **branch** command.
```
git branch
``` 
```
git branch bd2k-workshop-2019
```
Use the **checkout** command to switch to your new branch.
```
git checkout bd2k-workshop-2019
```

## Pro practice

* Commit a new change to your branch and do an interactive rebase onto the master branch
* Check out this article about writing informative [commit messages](http://chris.beams.io/posts/git-commit/). 

NIH BD2K Center for Big Data in Translational Genomics, UCSC Genomics Institute