# First TSCC login and program setup

**1) Log on to TSCC!**

In your terminal window, type:

    ssh ucsd-train##@tscc-login.sdsc.edu
    
Make sure you replace the "#" with the specific number you have been assigned. You will receive a number assignment from me after you send me the email with a copy of your public key.

For windows users, open putty and load your saved "tscc" settings along with the appropriate private key.

**2) Download and install anaconda**

Download the Anaconda Python/R package manager using wget (web-get). The link below is from the Anaconda downloads [page](https://www.continuum.io/downloads). TSCC runs on linux, so we will download the linux 64 bit version for python 2.7. Do this in your home directory. Find the link on the webpage and right click on the download button. Select "Copy Link Address" to copy it into your command history. Paste into your terminal window after the wget command.

    cd ~
    
    wget https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh
    
Use ls to see that your file is now in the directory:

    ls
    
Your result should look like:

    Anaconda2-4.2.0-Linux-x86_64.sh

To install anaconda, run the shell script with bash (this will take some time). It will ask you a bunch of questions, and use the defaults for all of them (press enter for all). Remember tabs to avoid typos!

    bash Anaconda2-4.2.0-Linux-x86_64.sh
    
or
    
    bash Ana<tab>
 
    
One of the important questions it will ask during installation is if you want to add this to the PATH variable in your ~/.bashrc. You should choose yes, but we can do this manually if you do not get it added by default. Once the installation has finished, you can check that this was completed by looking at your bashrc file:

    less ~/.bashrc
    
There should be a line in there that looks something like this located BELOW the line "#User specific alises and functions":
    
    export PATH=/home/ecwheele/anaconda2/bin:$PATH
    
If this is in there, then success! To activate the changes, source your .bashrc

    source ~/.bashrc
    
If it is not in your ~/.bashrc, you need to put it in there. Do so by copying the following:

    export PATH=/home/ucsd-train##/anaconda2/bin:$PATH
    
Remember to change the ## into your number. Then edit your ~/.bashrc with vi and paste the line below where it says #User specific aliases and functions. 

    vi ~/.bashrc
    i 

**Use the arrow keys to move around the file and paste the following line BELOW "#User specific aliases and functions":** 

export PATH=/home/ucsd-train01/anaconda2/bin:$PATH

    esc
    :wq
    
    source ~/.bashrc
    
The file is automatically sourced when you login to TSCC, but since we did not log out and back in since installing anaconda, we need to manually run the source. 

Check that it is working properly by searching for python:

    which python
    
The output should look something like:

    ~/anaconda2/bin/python
    
What python version are you running?

    python --version
    
My output looks like:
    
    Python 2.7.12 :: Anaconda 4.2.0 (64-bit)
    

# Download some more useful programs with Anaconda

**Conda, bioconda, and pip to install programs**

Anaconda is really nice beacuse it automates all the installations and downloads for us, rather than having to download the source code and install each one manually. [Conda](http://conda.pydata.org/docs/intro.html) is the preferred installation method, but not all programs are available with this method. Some things that are not  available in conda are available in [bioconda](https://bioconda.github.io/). And finally if it isn't available in either, you can try pip. As a general rule of thumb, try conda first, then bioconda, then pip. If it is not available with any of those, then you have to download the sourcecode and follow the specific installation instructions provided in the README document for the software. You can also turn to google with those specific keywords (conda, bioconda, pip) to figure out how to install packages.

    conda install package_name
    
    conda install -c bioconda package_name
    
    pip install package_name
    

We want to install a few more packages that we will use later on.
    
    conda install -c bioconda STAR
    
    conda install -c bioconda fastqc
    
    conda install -c bioconda samtools 
    
For all of these, use the defaults on what will be automatically upgraded (yes to all)

*NOTE - if later on you get error messages that a specific package name can not be found when trying to run a program - use one of these methods to install that package and then try to run your program again*

    

# Jupyter Notebook Setup

[Jupyter](http://jupyter.org/) notebooks are a great tool to keep track of the workflow for your data analysis. You can load up your results, maniplate them, make pretty figures, export your final data and figures to a file, all in one place!

Jupyter came installed by default when we downloaded anaconda, so no more installations will be necessary. You can see the executibles for jupyter in your anaconda bin.

    ls ~/anaconda2/bin
    
**1) To start a notebook, on TSCC run the following command.** Replace the 4-digit number at the end with a random number between 2000 and 9999. Do a good job of picking randomly! If anyone else is using this number, your notebook will not load. Add the & sign at the end of the line to allow this command to "run in the background". Without it, you will not be able to return to the command line while running a notebook in this window.

    jupyter notebook --no-browser --port #### &
    
What a minute for the following to appear on your screen:

    [1] 40110
    [ucsd-train01@tscc-login1 ~]$ [W 12:06:56.912 NotebookApp] Unrecognized JSON config file version, assuming version 1
    [I 12:06:57.957 NotebookApp] [nb_conda_kernels] enabled, 2 kernels found
    [I 12:06:59.812 NotebookApp] ✓ nbpresent HTML export ENABLED
    [W 12:06:59.813 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
    [I 12:06:59.837 NotebookApp] [nb_conda] enabled
    [I 12:07:00.201 NotebookApp] [nb_anacondacloud] enabled
    [I 12:07:00.222 NotebookApp] Serving notebooks from local directory: /home/ucsd-train01
    [I 12:07:00.222 NotebookApp] 0 active kernels 
    [I 12:07:00.222 NotebookApp] The Jupyter Notebook is running at: http://localhost:6221/
    [I 12:07:00.222 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    
**2) Then press enter to return to the command line.**

**3) Pay attention to the number after login on your command line.** This is your login node. TSCC has two login nodes and you are randomly assigned one when logging into tscc. However, we need to tunnel with the same login node, so this number is important for the next step. In this example, I am on login node 1.

    [ucsd-train01@tscc-login1 ~]$ 

**3) Now move to a new tab on your local machine (not TSCC)**

**MAC** 

We are going to tunnel our connection through our local laptop in order to view Jupyter notebooks in a web browser. Remember, TSCC does not have a web interface so we have to take this extra step. Run the following command:

    ssh -NL ####:localhost:#### ucsd-train##@tscc-login#.sdsc.edu

There are a couple things that will be specific to you. 1)The 4-digit numbers should be the same that you chose above. 2) The two numbers after ucsd-train should be the numbers you were assigned as your login username. 3) The number after login should be the specific node you found in step 3. 

You will prompted to enter your password. Do that to continue.

**WINDOWS** 

Step 1: Create a new Putty session.
So that you only need to have one Putty session open, we'll make a new TSCC Session. Create one with ucsd-train##@tscc-login#.sdsc.edu, and call the session "TSCC Jupyter"

Step 2: Add your private key and allow forwarding
Go to Connection > SSH > Auth > Load your private key file

Step 3: Add a tunnel
Go to "Connnection > SSH > Tunnels" Then:
Click the checkbox next to "Local ports accept connections from other ports"
Add your #### for your source port
Add localhost:#### for your Destination
Click "Local"
Click "Add"

Step 4: Save your settings!
So you don't have to do this every time... Save your settings! Go all the way back to the "Session" window and click "Save" Remember to save this with a different name then your normal login information. Maybe "tscc_jupyter"

Step 5: Click open and continue through the login information. 

**4) Open a web browser.** In the URL link, type the following command with your specific 4 digit random number.

    localhost:####
    
**5) Success! You have now started a jupyter notebook!** Look, you should be able to see your TSCC directory. Make a new folder in your home called jupyter_notebooks (either on this interface, or on the command line with mkdir). Move into that folder to start a new notebook. You can confirm that this notebook is running on TSCC with the following command:

    ps -u username
    
For example my output looks like....

      PID TTY          TIME CMD
     4758 ?        00:00:00 sshd
     4759 pts/108  00:00:00 bash
    24349 ?        00:00:00 sshd
    24350 pts/298  00:00:00 bash
    40110 pts/136  00:00:01 jupyter-noteboo
    40400 ?        00:00:02 sshd
    40401 pts/209  00:00:00 bash
    51978 pts/136  00:00:00 ps
    60325 ?        00:00:00 sshd
    60326 pts/136  00:00:00 bash
    
If I want to kill my jupyter notebook I can do that with:

    kill -9 40110
    
Notice 40110 is the PID of the notebook.

**6) Starting a notebook** Play around with all the features of the notebooks that you see. We will work through these together initially. Notice when you select "File - New Notebook" You can select python to open a new python notebook.

# Now organize your home directory

**1) Make a softlink to your scratch directory in your home**

Softlinks are a great way to easily access files without copying the entire thing into a new directory. Copying files uses a lot of unnecessary space, but sometimes it is annoying to have to give the full path of a filename every time you want to use it. To get around this, we make a softlink which is a pointer to the real file that you can put wherever you want that doesn't require the space of the full file. Since we will be using scratch a lot, we are going to make a softlink to that file in our home. 

To make a softlink:

    ln -s sourcefilename destination
    
For example, mine looks something like the following. Remember to replace my username (ecwheele) with your username (ucsd-train##).

    ln -s /oasis/tscc/scratch/ecwheele ~/scratch
    
Check that your softlink worked properly:

    ls -l scratch
    
My output looks like this:

    ecwheele yeo-group 28 Jun 30  2015 scratch -> /oasis/tscc/scratch/ecwheele
    
This is great because now when I want to access a file in my scratch directory, I can use:

    ls ~/scratch/filename.txt
    
Rather than:

    ls /oasis/tscc/scratch/ecwheele/filename.txt
    
If you messed up and want to delete a softlink, use:

    rm -f bad_softlink_filename.txt
    

**2) Add folders for scripts, processed data, raw data**

Organization is a really difficult thing in computational biology, and everyone has their own preferences on how to organize files. I recommend making at least these two folders in your home in addition to sub-folders within your each directory as we add new projects. Really it doesn't matter how you do this, as long as you are organized and understand your own setup. For the purposes of this class, it is easiest for discussion if we are all operating under the same setup. So make the following directories:

    mkdir ~/projects
    mkdir ~/raw_data
    mkdir ~/processed_data