# In this notebook, we will describe how to log onto TSCC, download programs, and setup a basic filestructure 

**1) Log on to TSCC!**

In your terminal window, type:

    ssh ucsd-train##@tscc-login.sdsc.edu
    
Make sure you replace the "#" with the specific number you have been assigned. You will receive a number assignment from me after you send me the email with a copy of your public key.

For windows users, open putty and load your saved "tscc" settings along with the appropriate private key.

**2) Download and install anaconda**

Download the Anaconda Python/R package manager using wget (web-get). The link below is from the Anaconda downloads [page](https://www.continuum.io/downloads). TSCC runs on linux, so we will download the python 2.7 linux version (make sure to switch to the linux tab on the downloads page). Right click on the download button and choose "Copy Link Address". Move to your home directory on TSCC and paste the link address after a wget command. (wget = web get).

    cd ~
    
    wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh
    
Use ls to see that your file is now in the directory:

    ls

To install anaconda, run the shell script with bash (this will take some time). It will ask you a bunch of questions, and use the defaults for all of them (press enter for all). Remember tabs to avoid typos!

    bash Anaconda2-4.4.0-Linux-x86_64.sh
    
or
    
    bash Ana<tab>
    
Start pressing ENTER through all the license agreements. Type 'yes' when it asks if you agree to everything. 
 
    
One of the important questions it will ask during installation is if you want to add this to the PATH variable in your ~/.bashrc. You will get a prompt that looks like this:

    Anaconda2 will now be installed into this location:
    /home/ucsd-train01/anaconda2

      - Press ENTER to confirm the location
      - Press CTRL-C to abort the installation
      - Or specify a different location below

    [/home/ucsd-train01/anaconda2] >>>
    
Press ENTER to confirm the location.

At the end of the installation you might get this warning message:

    WARNING:
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Anaconda2.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Anaconda2: /home/ucsd-train01/anaconda2
    Do you wish the installer to prepend the Anaconda2 install location
    to PATH in your /home/ucsd-train01/.bashrc ? [yes|no]
    
Type yes to add the path.

**Check that path was properly appended to the bashrc**

Check that the anaconda bin was added to your path by looking at your bashrc file:

    less ~/.bashrc
    
There should be a line in there that looks something like this located BELOW the line "#User specific aliases and functions":
    
    export PATH=/home/ucsd-train01/anaconda2/bin:$PATH


#### NOTE:  
You can choose whether to prepend or append the directory to your $PATH variable.  

    export PATH=$PATH:/home/ucsd-train01/anaconda2/bin 
    
would put this directory at front of your $PATH variable, whereas

    export PATH=/home/ucsd-train01/anaconda2/bin:$PATH 
    
would put this directory at the end of your $PATH variable.

**The priority of directories to search for executables depends on its order in your $PATH variable, so this can be useful if you have multiple versions of programs.**  



If this is in there, then success! To activate the changes, source your .bashrc

    source ~/.bashrc
    
If it is not in your ~/.bashrc, you need to put it in there. Do so by copying the following line of code into your .bashrc file:

    export PATH=/home/ucsd-train##/anaconda2/bin:$PATH
    
Remember to change the ## into your number. Edit your ~/.bashrc with vi and paste the line below where it says #User specific aliases and functions. 

    vi ~/.bashrc
    i 

**Use the arrow keys to move around the file and paste the following line BELOW "#User specific aliases and functions":** 

export PATH=/home/ucsd-train01/anaconda2/bin:$PATH

    esc
    :wq
    
    source ~/.bashrc
    
The file is automatically sourced when you login to TSCC, but since we did not log out and back in since installing anaconda, we need to manually run the source. 

Check that it is working properly by searching for python:

    which python
    
The output should look something like:

    ~/anaconda2/bin/python
    
What python version are you running?

    python --version
    
    
My output looks like:
    
    Python 2.7.13 :: Anaconda 4.4.0 (64-bit)
    

# Download some more useful programs with Anaconda

**Conda, bioconda, and pip to install programs**

Anaconda is really nice beacuse it automates all the installations and downloads for us, rather than having to download the source code and install each one manually. Conda is the preferred installation method, but not all programs are available with this method. Some things that are not  available in conda are available in the bioconda channel. And finally if it isn't available in either, you can try pip. As a general rule of thumb, try conda first, then bioconda, then pip. If it is not available with any of those, then you have to download the sourcecode and follow the specific installation instructions provided in the README document for the software. You can also turn to google with those specific keywords (conda, bioconda, pip) to figure out how to install packages. 

    conda install package_name
    
    conda install -c bioconda package_name
    
    pip install package_name
    
We are going to use the following programs so install them all from the bioconda channel:

    conda install -c bioconda STAR
    
    conda install -c bioconda fastqc
    
    conda install -c conda-forge -c bioconda samtools bzip2
    
For all of these, use the defaults on what will be automatically upgraded (yes to all)

You can check if it installed properly by searching for the executible command with which:

    which STAR
    
    which fastqc
    
    which samtools
    
**Why did it find these executibles in your ~/anaconda2/bin/ folder?**

Because you added it to your path of places to search!

*NOTE - if later on you get error messages that a specific package name can not be found when trying to run a program - use one of these methods to install that package and then try to run your program again*

    

# Now organize your home directory

**1) Make a softlink to your scratch directory in your home**

Softlinks are a great way to easily access files without copying the entire thing into a new directory. Copying files uses a lot of unnecessary space, but sometimes it is annoying to have to give the full path of a filename every time you want to use it. To get around this, we make a softlink which is a pointer to the real file that you can put wherever you want that doesn't require the space of the full file. Since we will be using scratch a lot, we are going to make a softlink to that file in our home. 

To make a softlink:

    ln -s sourcefilename destination
    
For example, the command would look like the following. Remember to replace the ## with your ucsd-train number.

    ln -s /oasis/tscc/scratch/ucsd-train## ~/scratch
    
Check that your softlink worked properly:

    ls -l scratch
    
My output looks like this:

    ucsd-train01 biom262-group  32 Sep 16 11:36 scratch -> /oasis/tscc/scratch/ucsd-train01
    
This is great because now when I want to access a file in my scratch directory, I can use:

    ls ~/scratch/filename.txt
    
Rather than:

    ls /oasis/tscc/scratch/ecwheele/filename.txt
    
If you messed up and want to delete a softlink, use:

    rm -f bad_softlink_filename.txt
    
**Now try it on your own**

Make another softlink to our shared class folder. The full path of this file is:

    /oasis/tscc/scratch/biom200/
    
Make a softlink to here in your home and call it biom200_shared

If you do this correctly, in your home directory:

    ls -l
    
Should output:

    ucsd-train01 biom262-group  28 Sep 21 11:05 biom200_shared -> /oasis/tscc/scratch/biom200/

**2) Add folders for scripts, processed data, raw data**

Organization is a really difficult thing in computational biology, and everyone has their own preferences on how to organize files. I recommend making at least these two folders in your home to start. As you add new projects, consider adding each project as a directory within these folders. Really it doesn't matter how you do this, as long as you are organized and understand your own setup. For the purposes of this class, it is easiest for discussion if we are all operating under the same setup. So make the following directories:

    mkdir ~/projects
    mkdir ~/raw_data