# DAY 2

Today we will be looking at|

1. Structure of the UNAM HPC
2. PBS Job Scheduler
3. Creating a PBS script
4. Job scheduling and submission
5. Controlling and monitoring jobs
6. Job outputs

### TIPS

* Have a terminal open and logged into the UHPC.

## 1. Structure of UHPC
<img src="structure.png" style="width:401px">

### Some terminology

* Login (manager and storage) node: Refers to the computer we access via the internet to reach the cluster. This computer manages the compute nodes as well as serving as a storage server for our files.
* Compute (worker) node: individual computer (housed in a rack), responsible for performing computations in a cluster. Typically has a couple of CPUs, onboard RAM and runs its own instance of an operating system.
* HPC/Cluster/Supercomputer: A set of individual compute nodes connected together via a computer network and in some aspects viewed as a single computer. 
* CPU (processor): integrated electronic circuitry within a computer that executes instructions that make up a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program.
* CPU Core: a single processor on a multi processor CPU. CPUs can have multiple cores which can independently perform different instructions.
* Walltime: estimate amount of time the job might need on the cluster.


### After logging onto to the HPC you should see a message such as this:


<img src="greeting.png" style="width:400px">

## 2. PBS Job Scheduler

* A job schedulare on an HPC is software that manages how, when and where processes (jobs) run on the cluster.
* This ensures that jobs run at their highest performance without overloading a compute node while giving each user's job a chance to be processed "fairly"


![](job_scheduler.svg)

### The Portable Batch System (PBS)

* The UHPC uses PBS for job scheduling.
* PBS is responsible for allocating computational tasks, i.e., jobs, among the available computing resources (nodes).
* PBS takes a set of commands in script that will be processed on a node.

## 3. Creating a PBS Script

### Simplest form of a PBS script.

Save the text below in a text file called "submission"

In [None]:
nano submission

In [None]:
#!/bash/bin

echo "Hello World"

Line 1: Defines the language used in the script </br>
Line 3: Prints the words "Hello World" on the console.

### qsub

Now that the "submission" file has been created we can submit it to the job schedular to find us a possible node to process the job. </br>

To do this we use the __*qsub*__ command

In [None]:
qsub submission

<xxxx\>.uhpc.unam.na

*xxxx* = job number automatically assigned by PBS

If this file was executed on the headnode where we are logged in we would expect an output:

"Hello World"

### So why did we not see anything?

Since the script was submitted to the job scheduler it was processed on the node and we "can't" interactively see what is happening there.</br>

Therefore PBS produces an __OUTPUT__ and an __ERROR__ file. Let us look at these

In [None]:
ls

submission &emsp; submission.e<xxxx\> &emsp; submission.o<xxxx\>

In [None]:
cat sumbission.o<xxxx>

Hello World

So the file was processed on a node and the output was placed into the output file, if there had been any errors they would go into the error file. 


We saw the simplest form of the PBS script however there are quite a bit of details we may want to give to PBS so it know how to schedule your job. This is done with a number PBS specific "directives" placed in our submission file.



__Format of the PBS file__

In [None]:
#!/bin/bash

PBS directives

bash command

Some PBS directions


| Directive | Description |
| --- | --- |
| #PBS -­‐S /bin/bash | Sets the shell that the job will be executed on the compute node |
| #PBS ‐l nodes=N:ppn=N | Requests for N processors on N node. |
| #PBS ­‐l walltime=HH:MM:SS | Requests amount of time the job will need on the cluster |
| #PBS -N JobName | Specifies the job name in the script |
| #PBS -o stdout_file | Specifies the output file for the job |
| #PBS -e stderr_file | Specifies file to store errors for the job |

So let us edit our current current submission file to give PBS some more details

In [None]:
nano submission

In [None]:
#!/bin/bash

#PBS -N Hello
#PBS -o output.txt
#PBS -e error.txt
#PBS -l walltime=2:00:00

echo "Hello World"
touch text.txt
echo "Hello text file" > text.txt


In [None]:
qsub submission

<xxxx\>.uhpc.unam.na

Now we have specified quite a number of things to PBS.

* Line 3: The name of our Job
* Line 4: The output file we would like to use
* Line 5: The error file we would like to use
* Line 6: The maximum time this job would take.

We also do a little more than just output some text.

* Line 8: Output text like before</br>
* Line 9: create a text.txt file</br>
* Line 10: echo some text and _append_ it to the text.txt file</br>

In [None]:
ls

error.txt &emsp; output.txt &emsp; submission &emsp; text.txt

Since we have specified the ouput file and the error file these two files have been created.

In [None]:
cat output.txt

Hello World

In [None]:
cat text.txt

Hello text file

## 4. Job Scheduling and Submission

### A few things to keep in mind when writing your script

1. The script will behave as if you just logged into the cluster and are starting to do some work.
    - Thus the directory you are in by default is the __home__ directory.
    - It is a common mistake to think you will be in the directory you were when you submitted the job.
2. All outputs go into the present working directory unless otherwise specified.

    PBS has a solution to this...

### PBS variables

PBS sets a number of variables everytime a job is submitted

| variable | Description |
| --- | --- |
| $PBS_JOBNAME | contains the job name supplied by the user |
| $PBS_O_WORKDIR | the absolute path of the current working directory of the qsub command |

To name a few.

With this information we can now use the $PBS_O_WORKDIR to go to the directory where we submitted the job while in the bash script.

In [None]:
cd $PBS_O_WORKDIR
echo $PBS_JOBNAME > text.txt

* Line 1: change directory to the directory from which I submitted __this__ job
* Line 2: create a text file by the name _text.txt_ and __echo__ the name of the of __this__ job there.

## 5. Controlling and Monitoring Jobs

It is necessary for us to manage our job once it is running.

### A few useful PBS commands 

| command | Description |
| --- | --- |
| qsub | Submit a job |
| qdel | Delete a batch job |
| qsig | Send a signal to batch job |
| qhold | Hold a batch job |
| qrls | Release held jobs |
| qrerun | Rerun a batch job |
| qmove | Move a batch job to another queue |
| qstat | see status of running jobs |



### qstat

This command lists the current jobs that are running and with the right options you can also list the jobs that are done running.



In [None]:
qstat -fax

In [None]:
uhpc.unam.na: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
7532.uhpc.unam. alimbo   workq    MyTestJob   32509   5  10    --    --  F 00:00
7533.uhpc.unam. daniels  workq    sub         24137   1   1    --    --  F 00:00
7534.uhpc.unam. jshapopi workq    submission   6749   1   1    --    --  F 00:00
7535.uhpc.unam. test2    workq    submission  14016   1   1    --    --  F 00:00
7536.uhpc.unam. jshapopi workq    tester      14164   1   1    --    --  F 00:00


### qdel

This command simply deletes a running job



In [None]:
qstat <xxxx>

## 6. Outputs

We discussed quite a lot now abou the outputs so let us run a job using all the information we learned.

Yesterday you were asked to produce the following directory structure:

<img src="structure3.png" style="width:700px">

Create a job1.sh file in the Jobs directory with the following contents

In [None]:
#!/bin/sh

#PBS -N tester
#PBS -e /home/<username_here>/Documents/Jobs/error.txt
#PBS -o /home/<username_here>/Documents/Jobs/output.txt


export PBS_O_HOME=$PBS_O_WORKDIR
cd $PBS_O_WORKDIR
SERVER=$PBS_O_HOST
WORKDIR=/scratch/PBS_$PBS_JOBID
PERMDIR=${HOME}

SERVPERMDIR=${PBS_O_HOST}:${PERMDIR}

echo server is $SERVER
echo workdir is $WORKDIR
echo permdir is $PERMDIR
echo servpermdir is $SERVPERMDIR
echo working is $PBS_O_WORKDIR
echo home is $PBS_O_HOME

touch one
touch two
touch three
python /home/<username_here>/Documents/Code/Python/hello.py

### Important lines description

* Line 1-5: Prepare the environment and tell PBS where to put the error files
* Line 9: Go to the Jobs directory
* Line 8-21: Echo some things.
* Line 23-25: Create 3 text files
* Line 26: run the python code

Create another python file called hello.py in the Python directory with the following:

In [None]:
print("Hello World")
text = "This is some text I want to output to a file"
fi = open("python_text.txt", "w")
fi.write(text)
fi.close()

* Line 1: Print Hello World to console
* Line 2: create a variable with some text
* Line 3: open a text file for writing
* Line 4: write the variable in line 2 to the file opened
* Line 5: close the file

Now go back to the Jobs directory and submit the job1.sh

In [None]:
qsub job1.sh

## Pop Quiz

1. Where will all the outputs go?
2. Why was it necessary to give the full path of the python file line 26 of the job1.sh file?
3. Does _qsub job1.sh_ and _qsub /home/<username_here>/Documents/Jobs/job1.sh_ have the same output?
4. What will be different?


# Advanced Content

1. The bashrc file
2. How to install programs
3. _rsync_

# 1. The bashrc file

There is a file in all oour home directories called the bashrc file. This file is named _.bashrc_ This means that it is hidden and cannot be seen unless you explicitly tell the _ls_ command to show all hidden files

In [None]:
ls -la

.bashrc &emsp; ...

The .bashrc file is executed everytime someone logs in. Therefore it is a usefull place to put some things that you usually do when you log in, i.e. the things you would like to have automatically done when logging in. For many people this file can be quite large.

### Common things you may put in a .bashrc file

1. aliases: these are shortcuts to otherwise very long commands
    - e.g.; _alias jobs='qstat -r -wf | grep Job_Name'_
    - This command lists all the active jobs by jobname
    - This is long and I am lazy, so I make an alias (shortcut)
    - Now when I type _jobs_ I will get what I desire
    - You can also use these if there are commands you easily forget.
2. Environment variables:
    - If you would like to store some variables or change a default variable everytime you log in this would be the place to do it.

#### Basically you can use this file for just about anyting.



## 2. How to install programs

Most basic pragrams used in science are installed on the HPC but if there is something not installed you have two options:

1. Compile it from source code (This can get technical)
2. Ask your closest system admin (Myself, Anton or Hiiko)

## 3. _rsync_

_rsync_ is another way of coping files that is most commonly used for making large file transfers.

* _rsync_ is used to synchronize data between local computer and the HPC 
* It is important because it does the job in a smart way. 
* Synchronizing two directories. 
* Works similar to scp.
    - Specifying the port is a little different.
    
**Synopsis**: <em>rsync "ssh -p 1510" <source\> <destination\><em>