# How to submit jobs to the cluster and receive the output

**Why submit jobs?**

When you login to TSCC you are automatically placed on the **Head node** or **login node**. The amount of resources dedicated to the login node are enough for basic commands like making files and moving things around, but are not enough to power computationally intensive jobs like aligning reads to a genome or quantifying gene expression. To do these things we will ask for more resources and submit that request as a job to the cluster. 


**Submission script**

To submit a job, we will use PBS flags to tell the supercomputer how many resources we need and how to keep track of our job. All the details of this are [here](http://www.sdsc.edu/support/user_guides/tscc-quick-start.html) but we will go over the most important points together. The first lines of your submission script contain all the parameters the computer needs to know how to run your job. 

    #!/bin/bash
    #PBS -q hotel
    #PBS -N jobname
    #PBS -l nodes=1:ppn=8
    #PBS -l walltime=1:00:00
    #PBS -o outputfile
    #PBS -e errorfile

-q: this is the submission queue where the job will be sent. You have access to hotel and condo. 

-N: give your job a unique name so you can track it!

-l: Request nodes, processers per node, and walltime. This will vary depending on the program you are running. You can request up to 16 processors per node. It is good practice to request as many processors on one node before requesting multiple nodes. The walltime is approximately how long you think your job will need to take to run. Overestimate this value, because if your job is not finished when the walltime ends it will kill it in the middle. But if your job finishes before the walltime ends, it will terminate and send you the output. 

-o: name of the job outputfile that you get from this run. I find it helpful to be the same name as the jobname with a .out extension. More on this later. 

-e: name of the job errorfile. Again, I like to use the same name as the jobname with a .err extension. 

**How to run a job**

Let's practice submitting a command as a job. For example, we are going to make a job that will write the phrase "I successfully submitted a job!" to a file. First let's do this in a new folder. In your home directory, make a new folder called job_submission_test.

    cd ~
    mkdir job_submission_test

The command to write these words to a file is as follows. Try it out on the login node (your current command prompt) to see that it works. 

    echo "I successfully submitted a job" > ~/job_submission_test/test.txt
    
Did it work? Move into that directory and check out the file:

    cd job_submission_test
    ls
    
    less test.txt
    
Great! We made a file. Now lets write a script to submit as a job that will do the same thing. Edit a new file called test_submission.sh (.sh is the suffix used for bash scripts) and put your submission parameters, and command inside it. Ask for 1 node, 1 processor per node, and 5 minutes of walltime (this is a tiny job). Give your job a meaningful name, output file, and error file. Make sure you know what directory you are in and where these things are going. I want to put the script in the job_submission_test folder, so I am first going to check that I am there with pwd (print working directory).

    pwd
    
    /home/ucsd-train01/job_submission_test
    
Great! I'm where I want to be. Now let's make the file:

    vi test_submission.sh
    i 
    
    #!/bin/bash
    #PBS -q hotel
    #PBS -N test_job_submission
    #PBS -l nodes=1:ppn=1
    #PBS -l walltime=00:05:00
    #PBS -o test_job_submission.out
    #PBS -e test_job_submission.err

    echo "I successfully submitted a job" > ~/job_submission_test/test_submission.txt
    
    esc
    :wq
    
Notice I gave this file a different name (test_submission.txt) so that I can compare to what we ran on the head node and see if it is the same. 

After you have saved your file, submit it with qsub!

    qsub test_submission.sh
    
You will get a confirmation that the job was submitted. Mine looks like this:

    9687451.tscc-mgr.local
    
**Check the status of the job**

You can check the status of your job with qstat and your username.

    qstat -u ucsd-train##

Mine looks like this:

    Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time
    ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
    9687451.tscc-mgr.local  ucsd-train0 hotel    test_job_submiss      0     1      1    --   00:05:00 R  00:00:51
    
Check out the "S" column. This is your status. R means running, Q means it is sitting in the queue, C means the job is complete. 

If you want to delete the job you can use qdel followed by the JOBID listed next to your job. So to delete this job the command would be as follows. I am not actually going to run it though, because I want my job to finish!

    qdel 9687451
    
**Job outputs, outfile, errorfile**

There are 3 key outputs that you should look for after submitting a job. 

1) The output of your script. What command did you include in the job? Did that command execute properly and give you the output you expected? In this example, you are looking for the test_submission.txt file. 

2) The job .out file. What did you call this file? Where did it end up? This file will automatically go into the directory where you submitted the job and be called whatever you named it with the -o flag in your script. Sometimes when a command it running, it will automatically output things on your screen with updates on how the job is progressing. These will get saved in this file. This can be useful later on for debugging to find out how long your command progressed before dying. Take a look in the file. With this example, it will only print the nodes that were used for processing. 

3) The job .err file. Same rules apply with the .out file, you should also find it in the directory where you submitted your job. This file contains any errors associated with your script. If you don't get the output of your command that you expected, this is a good place to go to read error messages that tell you what went wrong. Take a look inside the file from this job. Is there anything in it? If not, great! Your command ran successfully. 


**Congratulations!! You are now an expert on submitting jobs to the supercomputer.**

# Interactive Jobs

Submitting jobs is nice because you can send your command away and the computer will do its job and then let you know when it's done. No babysitting required! But sometimes you want to work interactively. You don't want the job to be hidden from you until it is done. You want the output in real time and the flexibility to make changes on your command line. But you still need more resources than are available on the head node. To do this, you can request an interactive job that will give you the compute resources you want, but let you work directly on the command line. 

**Requesting an interactive job**

You need to specify the queue, nodes, processors per node, and walltime for interactive jobs. The command is as follows:

    qsub -I -l nodes=1:ppn=1 -l walltime=1:00:00 -q hotel
    
In this example, I am requesting 1 node, 1 processor, for 1 hour. With interactive nodes, they will run exactly for the length of time you request unless you quit them early. Try running the command to get on an interative node. It will let you know when it is ready:

    qsub: waiting for job 9687582.tscc-mgr.local to start
    qsub: job 9687582.tscc-mgr.local ready
    
Notice that your command prompt changed. Instead of just having your login name:

    [ucsd-train01@tscc-login1 ~]$
    
It has node information in your prompt:

    [ucsd-train01@tscc-2-55 ~]$
    
You can check the status of this job with qstat (same as before)

    qstat -u ucsd-train01
    
My output looks like this:

    tscc-mgr.local: 
                                                                                      Req'd    Req'd       Elap
    Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time
    ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
    9687582.tscc-mgr.local  ucsd-train0 hotel    STDIN             31418     1      1    --   01:00:00 R  00:01:56
    
Notice that the interactive job has been running for 1 minute and 56 seconds. When I am finished using this job, I can get out of it by typing exit on the command line:

    exit
    
You will get this message letting you know the job shutdown:

    logout

    qsub: job 9687582.tscc-mgr.local completed
    
It is wasteful to have resources on an interactive job that you aren't using, so make sure you exit out of it when you are finished. This can be a great tool to debug a command that you have never used before so that you can get the outputs in real time and not have to wait for a job submission to send the reports back to you in .out and .err files. 