# Running Python Jobs on OSC
The jupyter environment that we have been using for the past 2 months is a powerful platform for testing code and exploring datasets and tools, but it is not the right platform for a production environment.   In addition, once we start analyzing very large datasets that will necessarily require long processing times, it may be more sensible to use straight python scripts.   When running such scripts on a farm-processing system such as OSC with potentially hundreds of simultaneous users, there is an additional requirement to use a **scheduling system** to handle the running of each person's **job**.   This notebook will walk us through each of these situations.

In this notebook, I will give you instructions showing you:
1.  How to build a simple python script based on work we have already done in previous jupyter notebooks.
2.  How to run that job interactively from the **command line**.   Note this only should be done for **very,very** short jobs.
3.  How to build a **bash** script to run that python script using the **pbs** batch submission system on OSC.
4.  How to utilize the gpu capabilities of the OSC system.

# Getting started
You should already have a jupyter session started on the OSC Pitzer system.   We will need to use this **as well as** a separate terminal window for **shell** access.   The terminal window will give us the ability to use the **command line**.   We can get shell access in one of two ways:
1.  Go to your OnDemand dashboard, select the **Clusters** tab along the top, then select **>_Pitzer Shell Access**.   This will open a tab in your browser which is connected directly to your **/home** file system on OSC.   **This is the option I will assume for the directions below.**
2.  Go to your OnDemand dashboard, select the **Interactive Apps** tab along the top, then select **Pitzer Desktop**.  You will be prompted to set up your environment - the only change I would recommend is the number of hours, use the defaults for everything else.   You will then need to wait till the resources for your desktop are allocated.   Once they are ready you click **Launch**, and then a graphical window will open up showing the "desktop" OSC environment.   If you click on the **terminal** icon (along the bottom of the desktop window), a terminal window will open and you can follow the remaining instructions below.   The primary difference with this method versus the **Pitzer Shell** approach is that you can view graohical objects (like jpeg/png/pdf) in the Pitzer Desktop.  The other difference is that the Pitzer Shell has no time limit, while the Pitzer Desktop does.

# Making a python script
In the terminal window, navigate to your assignments/assignments_11_prep directory.

We will need to make a new file, and to do this we need an editor.   If you are using the graphical enviroment (Pitzer Desktop) you can use a graphical editor such as **gedit**, but if you are using the Pitzer Shell like I am, you need a command line editor.  There are a number of such editors available, but the easiest to use is called **nano**.  Note that nano can also be used in the terminal window on the Pitzer Desktop as well.

At the command prompt (which I am assuming is the "$\$$" symbol), type:

     $ nano cnn_intro.py

and then hit return.  

You will see a screen that looks like this:
![nano](files/nano_screenshot.png "nano screen")

An important feature about nano: the mouse does not do anything!   To move around in the window, you can use:
* arrow or page keys
* "^V" (or control-V) to move down
* "^Y" (or control-Y) to move up

To put code in the window, you can of course type it in, but I want you to copy the following three code blocks and past them in the nano window (in the order shown):



In [1]:
import numpy as np
#
# Used to implement the multi-dimensional counter we need in the performance class
from collections import defaultdict
def autovivify(levels=1, final=dict):
    return (defaultdict(final) if levels < 2 else
            defaultdict(lambda: autovivify(levels-1, final)))
def getPerformance(network,images,labels_cat,labels):
#
# Get the overall performance for the test sample
    loss, acc = network.evaluate(images,labels_cat)
#
# Get the individual predictions for each sample in the test set
    predictions = network.predict(images)
#
# Get the max probabilites for each rows
    probs = np.max(predictions, axis = 1)
#
# Get the predicted classes for each row
    classes = np.argmax(predictions, axis = 1)
#
# Now loop over the first twenty samples and compare truth to prediction
#print("Label\t Pred\t Prob")
#for label,cl,pr in zip(smear_labels[:20],classes[:20],probs[:20]):
#    print(label,'\t',cl,'\t',round(pr,3))
#
# Get confustion matrix
    cf = autovivify(2,int)
    for label,cl in zip(labels,classes):
        cf[label][cl] += 1
#
    return loss,acc,cf


In [3]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

#
# Change the folling to False to run on the full data
# NOTE: Keep true when running interactively!!
short = True
if short:
    train_images = train_images[:7000,:]
    train_labels = train_labels[:7000]
    test_images = test_images[:3000,:]
    test_labels = test_labels[:3000]
#
print("Train info",train_images.shape, train_labels.shape)
print("Test info",test_images.shape, test_labels.shape)
train_images = train_images.reshape((train_images.shape[0],28*28))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((test_images.shape[0],28*28))
test_images = test_images.astype('float32')/255
from keras.utils import to_categorical

train_labels_cat = to_categorical(train_labels)
test_labels_cat = to_categorical(test_labels)


Train info (7000, 28, 28) (7000,)
Test info (3000, 28, 28) (3000,)


In [8]:
from keras import models
from keras import layers
from keras.callbacks import EarlyStopping, ModelCheckpoint
#
# Make sure the shape of the input is correct (the last ",1" is the number of "channels"=1 for grayscale)
train_images = train_images.reshape((train_images.shape[0],28,28,1))
test_images = test_images.reshape((test_images.shape[0],28,28,1))
#
cnn_network = models.Sequential()
#
# First convolutional layer
cnn_network.add(layers.Conv2D(30,(5,5),activation='relu',input_shape=(28,28,1)))
# Pool
cnn_network.add(layers.MaxPooling2D((2,2)))
#
# Second convolutional layer
cnn_network.add(layers.Conv2D(25,(5,5),activation='relu'))
# Pool
cnn_network.add(layers.MaxPooling2D((2,2)))
#
# Connect to a dense output layer - just like an FCN
cnn_network.add(layers.Flatten())
cnn_network.add(layers.Dense(64,activation='relu'))
cnn_network.add(layers.Dense(10,activation='softmax'))
#
# Compile
cnn_network.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])

patienceCount = 10
callbacks = [EarlyStopping(monitor='val_loss', patience=patienceCount),
             ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
#
# Fit/save/print summary
history = cnn_network.fit(train_images,train_labels_cat,epochs=50,batch_size=256,callbacks=callbacks,validation_data=(test_images,test_labels_cat))
cnn_network.save('fully_trained_model_cnn.h5')
print(cnn_network.summary())
#
# Get the overall performance for the test sample
test_loss, test_acc = cnn_network.evaluate(test_images,test_labels_cat)
print("Test sample loss: ",test_loss, "; Test sample accuracy: ",test_acc)

loss,acc,cf = getPerformance(cnn_network,test_images,test_labels_cat,test_labels)

print("Test confusion matrix")
for trueClass in range(10):
    print("True: ",trueClass,end="")
    for predClass in range(10):
        print("\t",cf[trueClass][predClass],end="")
    print()
print()


Train on 7000 samples, validate on 3000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 24, 24, 30)        780       
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 12, 12, 30)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 8, 8, 25)          18775     
_________________________________________________________________
max_pooling2d

## Saving our changes
Once you have entered the above three blocks, make sure you can move up and down through the code using the arrow keys as well as the page up and down functions (I found that control-y and control-Y moved up, but that only control-V moved down).

To save your changes, type control-x.  You will be prompted if you want to "Save modified buffer", and answer "Yes" by typing y.

You will then be asked for the file name you want to save your changes as.  If you had originally just typed "nano" with no file, you would enter a new file name here.  If as above you had typed "nano cnn_intro.py", you should see:

    File Name to Write: cnn_intro.py 
    
Just hit return and it will be saved as that file name.

## Running our python script
To run our script, simply type:

    $ python cnn_intro.py 
    
If you see the following message:

    ImportError: No module named keras.datasets
    
Then there is something wrong with your python version (probably), so type:

    $ python --version
    
If this returns "Python 2.7.5" then you have the wrong version of python.  To fix this, type:

    module load python/3.6-conda5.2
    
Try "python --version" again, and you should see: "Python 3.6.6 :: Anaconda custom (64-bit)".   Now try running your python script again:

    python cnn_intro.py 
    
Your code should now run, and after the 2 epochs (running on the small sample), you should see an accuracy of around 88%.   We kept the sample size and number of epochs low because we have an interactive session that we are sharing with many users.  Below we will show how we can run much longer jobs.

**Aside:** what if in the future I can't remember the exact version of python?  Or if I want to know if other modules/software are available (like matlab)?  I can use the "module spider xxxx" command, where xxxx is the base name of the software I am interested in.  For example, if I type:  

    module spider python  

I get the following output:  
    Versions:  
        python/2.7-conda5.2  
        python/3.6-conda5.2  


## The PBS System
The batch system at OSC can do some incredibly complicated things, but our initial use will be very simple: run a single, short "job", and return the results.  We will submit the above python script to the batch system and let it process the data - we simply sit back and wait for it to finish!

There is alot of detail that I will skip over, since you can get alot out of the simple things I will show you.  However, after class I encourage you to take a look at the full documentation available [here](https://www.osc.edu/supercomputing/batch-processing-at-osc).

Let's first submit a job, and then while we wait, we can look at how we control the job we submitted.

In the terminal window, let's use nano again to make a new file, this time a **bash** script. 

    nano pbs_run.sh

Copy the following code block (which is **not** python but **bash** shell commands) into the above file.   After enter the text, type control-x to save the file.


In [None]:
#PBS -N cnn_mnist
#PBS -l walltime=0:30:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=4761MB
#PBS -j oe

# uncomment if using qsub
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR

module load python/3.6-conda5.2
python -u cnn_intro.py >& cnn_intro_output.lg


## Submitting a job to the batch system.
If you type "ls -alrt" and hit return in your terminal window, you should see something like the following (of course the time stamps as well as the formatting will be different):  
xxxxxxx   3335 Mar  6 22:24 cnn_intro.py  
xxxxxxx 394688 Mar  6 22:31 fully_trained_model_cnn.h5  
xxxxxxx    242 Mar  6 22:43 pbs_run.sh  

"ls -alrt" lists information about files in a folder (or directory), in "reverse time order", so that the most recently modified files are at the bottom.   You will find this to be a very useful command!

The above files are the following:
*  **cnn_intro.py**: Your original python script.
*  **fully_trained_model_cnn.h5**:  The trained model you got from running thst python script interactively.
*  **pbs_run.sh**: The script you just made to run the python script on the batch system

To "submit" your job, simply type:

    qsub pbs_run.sh
    
You may be asked to specify an account.  If so, modify the above to be something like:

    qsub -A PASxxxx pbs_run.sh

where "PASxxxx" is a valid account number.  If your command is successful, you should see a message like this printed to the screen:

    408559.pitzer-batch.ten.osc.edu

That first number (408559) is the session ID for the job.   To get an idea if your job is running (or queued or completed), type:

    qstat -u osuXXXX
    
where "osuXXXX" is your user ID.   You can find this on your OnDemand desktop in the upper right hand corner.

Your job may take a few minutes to start, and then it will finish quickly.   While we wait for that, let's look at the bash script in detail.

## The bash script
A **bash** script is simply a collection of commands that you could execute in the terminal.  However, "scripting" the commands can get fairly complex, to the point where a bash script could look much like a "real" programming script (like python).   For our batch scripts, they will be fairly simple.   A key difference for our **batch** scripts, is that they contain commands (the \#PBS lines below) that **also** tell the batch submission system **PBS** how to execute our job.

Let's walk through the above batch script line by line, and see what each line does.
* #PBS -N cnn_mnist  
  This command tells PBS what your job's name is.  
* #PBS -l walltime=0:30:00  
  This command tells PBS how much time your job will need.   You want this to be comfortably larger than the time your job will actually take, since as soon as you exceed this time, your job is stopped. 
* #PBS -l nodes=1:ppn=1 
    This command tells PBS how many nodes and ppn (processors per node, or cores) your job needs.  Since each core contributes about 4GB to the available memory, specifying the number of cores also determines how much memory your job has.
* #PBS -j oe  
    This command tells PBS to join standard output as well as errors into the same file.  
* \# uncomment if using qsub  
    In batch scripts, lines beginning with \# are ignored by the system.  Any other line is executed as though it was a command you typed in the terminal.  So this line is just a comment.   The above \#PBS lines are **also** comments - but they are special in that the batch system is designed to look for lines beginning with \#PBS and interpret them as commands.
* cd \$PBS_O_WORKDIR  
    This line "changes directory" to the **environment variable** pointed to by **\$PBS_O_WORKDIR**.   This is the directory the job was submitted from.  
* echo \$PBS_O_WORKDIR  
    The bash **echo** command is like the python **print** command - it prints things to the screen.   This simply prints the **value** of the \$PBS_O_WORKDIR environment variable.  
* module load python/3.6-conda5.2  
    This makes sure your submitted job uses the correct version of python.  
* python -u cnn_intro.py >& cnn_intro_output.lg  
    Finally!   Our actual python script!   There are some changes to how we ran it before (which we did with a simple "python cnn_intro.py".
    * The "-u" says "print stuff to the screen and don't buffer it first (which is the default).
    * The ">& cnn_intro_output.lg" says: put all output that would go to the screen including all errors, and put it into a file named "cnn_intro_output.lg".   You can look at this file while your job is running.
    * I use "cnn_intro_output.lg" instead of "cnn_intro_output.log", because I want to get access to you .lg files for the assignment.   By default our .gitignore file would not allow us to track .log files, so I use .lg instead.


## When the job finishes
Once the job starts, as well as when it finishes, new files will appear in the folder that you submitted the job from.
If you type "ls -alrt" and hit return in your terminal window, you should see the following:  
xxxxxxx   3335 Mar  6 22:24 cnn_intro.py  
xxxxxxx 394688 Mar  6 22:31 fully_trained_model_cnn.h5  
xxxxxxx    242 Mar  6 22:43 pbs_run.sh  
xxxxxxx   8031 Mar  6 22:51 cnn_intro_output.lg  
xxxxxxx    407 Mar  6 22:51 cnn_mnist.o408559  

There are two new files:
* **cnn_intro_output.lg**: This file is all of the output that would normally go to the screen, if you were running the job interactively (or in a jupyter notebook).  If something went wrong with your job, you may be able to diagnose it here. 
* **cnn_mnist.o408559**: This is the ".o" file.  Notice it starts with the name you gave the job, and ends with the job ID assigned by the system.  This file contains useful information from the batch system, indicating how much cputime the job took, as well as how much memory it used.   This will be **very** helpful, especially when you are thinking of running larger or smaller jobs.   You can use this information to determine if your next job needs more resources (or else it might crash or end too soon) or less resources (in which case you can usually get your results faster).

## Running a longer job
Now lets run a longer job using the pbs system.  Make the following changes:
* Modify the python script so that it runs over the full MNIST sample.   
* Also, run for 10 epochs instead of 2.

With these changes, do you think we need to modify our bash script to add more time?  I think not, but if I make a mistake in this estimate (especially on a job which might take hours), it might be a sad day!   Let's leave the bash script as it was.  Submit the bash script again using qsub (after having made the above changes to your **python** script).   While it is running see if you can estimate how much longer it might take to finish.

## Using GPUs
As we know, our machine learning programs require extensive matrix mathematics in order to determine the weights and biases for the models we design.   It turns out that **graphics processing units** or **GPUs** are very efficient at matrix mathematics, and if we have access to GPU processing power, we can potentially speed up our jobs by factors of 10 or more.  A nice (and short) article (along with some good references) on the use of GPUs in machine learning can be found [here](https://www.datascience.com/blog/cpu-gpu-machine-learning).

The OSC system has GPU resources on both the **Owens** and **Pitzer** clusters:
* Owens: 160 NVIDIA Tesla P100
* Pitzer: 64 NVIDIA Tesla V100 (two each on 32 nodes)

To take advantage of GPUs, we need to make some modifications to our python environment, specifically with **tensorflow**.   You **need** to have tensorflow-gpu installed to use GPUs.   To check this, type:
    $ pip list
    
You should see a version of tensorflow-gpu listed.   If instead you only have tensorflow, then in your terminal window, type the following commands:

    $ pip uninstall tensorflow  

    $ pip install --user tensorflow-gpu

The **tensorflow-gpu** package can run on both **CPU-only** systems as well as systems which have **GPUs**.  If no GPU is detected, the software will default to the CPU version.

Next, we need to modify our pbs script to tell pbs that we want to use GPUs.  Note that since these are scarce (relative to CPUs), you should be careful to only use them when necessary.  As we will see, GPUs can greatly accelerate your job's **running** time, but in general they will delay your job's **starting** time.  Typically what you care about is your job's **finishing** time (which is the sum of the above starting and running times).

In the terminal window, use nano to make a new pbs script for running our gpu version:

    nano pbs_run_gpu.sh

Cop the following script into the above file, then exit and save.

In [None]:
#PBS -N cnn_mnist_gpu
#PBS -l walltime=0:30:00
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -l mem=4761MB
#PBS -j oe

# uncomment if using qsub
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR

module load python/3.6-conda5.2
module load cuda/10.0.130
python -u cnn_intro.py >& cnn_intro_output_gpu.lg

## The GPU script
The main changes in the pbs submission script are the following:
1. We modified one of the pbs commands to add a single gpu:  
    #PBS -l nodes=1:ppn=1:gpus=1
2. We made sure to load the **cuda** module so that **tensorflow** can take advantage of the available GPU resources.

What is **CUDA**?  CUDA stands for Compute Unified Device Architecture, and is an extension of the C programming language and was created by nVidia. nVidia is a company tha designs graphics processing units for the gaming and professional markets.  Using CUDA allows programmers to take advantage of the massive parallel computing power of an nVidia graphics card in order to do general purpose computation.  We don't explicitly use CUDA in our python scripts, and neither does keras.  It is tensorflow that utilizes CUDA.

We submit the above script in exactly the same way:

    qsub pbs_run_gpu.sh
    
As noted, this may take longer to start, but it will be **much** faster to run.   When it is done, look at the ".o" file returned by the pbs system.   Does the running time make sense?

Given how long the CPU version took compared to the GPU version, does it make sense for a job like this to use GPUs?

# Assignment
1.  Add a confusion matrix calculation to your python code.   Test it with the small sample first.
2.  Increase the number of epochs, and add early stopping to your code.  Test it with the small sample first.   You will have to submit this version to PBS. 
3.  Modify the code to run over the larger MNIST data sample, and then submit both the cpu and gpu versions.   Note that the data size is about 10 times larger so the cpu time needed will also be about 10 times larger.   You may need to adjust (increase) the number of epochs compared to the value you used for the smaller sample.  

The "deliverables" for this assignment are the following:
- the "lg" files for both the cpu and gpu versions, after all of the above modifications.  Rename them **after** the jobs have completed by doing this:
       mv cnn_intro_output.lg  cnn_intro_output_final.lg
       mv cnn_intro_output_gpu.lg cnn_intro_output_gpu_final.lg

