# Testing the Casper Compute Environment and Running a GPU Program

By: Daniel Howard, March 14th, 2022

Here is a test notebook for NCAR's GPU Computing Workshop series. To initialize and run this notebook on Casper, **click this [nbgitpuler link](https://jupyterhub.hpc.ucar.edu/stable/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FNCAR%2FGPU_workshop&urlpath=lab%2Ftree%2FGPU_workshop%2F00_TestCasper%2FTestCasper.ipynb&branch=CSG_tutorial)** to automatically connect to NCAR's JupyterHub portal, pull this git repository into your `$HOME` directory, and load this notebook. If encounter issues and you've already synced this repository to your user space on NCAR's cluster, then load up a JupyterHub server from the [NCAR JupyterHub portal](https://jupyterhub.hpc.ucar.edu/stable/) and navigate to this notebook. For this notebook, choose "Casper login node" under the "Cluster Selection" pulldown. Then, run each code cell below in order by selecting the cell and pressing CTRL+ENTER. Please report if there are any issues or concerns to dhoward@ucar.edu.

For all registered workshop users, your provided NCAR CIT account should have permissions to use the **UCIS0004** project below. Please use this project ID to charge your compute jobs when running work on Casper. You may use this ID for small workshop related learning work on the order of 30 minutes walltime or less, ideally less than 5 minutes. However, no production scale jobs should be submitted using this project's allocation as it is meant to be shared across the full GPU workshop learning community. If you'd like to request your own allocation for more compute intensive work, please reference the [Allocations documentation](https://arc.ucar.edu/knowledge_base/74317835). For student and early career faculty university users, there are [opportunities available](https://arc.ucar.edu/knowledge_base/75694351) for small one-time allocation rewards for unsponsored work, typically to enable dissertation research or provide seed grants towards funded research.

Please run the below cell to initialize the workshop Project ID for later cells. Edit the project code as needed.

In [None]:
export PROJECT=UCIS0004

## Display Information about the GPU

First, we are going to submit a job on Casper's PBS job scheduler to run some simple work on a GPU node. To submit jobs, we are going to use the qsub command. You can learn more about qsub and other options for submitting compute jobs to Casper's HPC cluster, including GPUs, at the documentation portal at [arc.ucar.edu - Starting Casper Jobs with PBS](https://arc.ucar.edu/knowledge_base/72581396).

We will now run a script to display info about the GPUs on Casper. This is achieved by two commands.

* `nvaccelinfo` - Displays static information about all currently connected GPUs.
* `nvidia-smi` - Displays dynamic information about all currently connected GPUs. Able to achieve more detailed queries of the GPU state by referencing options available via the command's help text with `nvidia-smi -h`

You can run the included and pre-configured script [batch_accelinfo.sh](batch_accelinfo.sh) via the PBS batch job scheduler by running the below cell. If you are running on a local GPU enabled machine or are already running interactively on a GPU node, expand and run the second cell (click the ellipse ...). It is best to run on the gpudev queue during the weekday from 8am to 6:30pm MT but if you are running this in non-working hours, edit the queue in [batch_accelinfo.sh](batch_accelinfo.sh) to instead `-q casper`.

In [None]:
qsub -A $PROJECT batch_accelinfo.sh

In [None]:
!./batch_accelinfo.sh

After the job has completed, you should see a new file, `accelinfo.out`, in your working directory in the left panel of Jupyter Lab. Then, you can run the below cell to view the job's output. You should see a printed list of the loaded modules during the job and output from the `nvaccelinfo` and `nvidia-smi` commands respectively.


In [None]:
cat accelinfo.out

You may want to try targetting different GPUs on Casper to see how other devices differ. In this case, edit the file [batch_accelinfo.sh](batch_accelinfo.sh) and set gpu_type to `gp100`. Then re-run the above 2 cells.

## Running a GPU Program
Now we are going to make sure you can compile CPU/GPU programs and run them. We will go over more details about this process in future sessions.

First, the below cell will load the needed compiler software then compile both a CPU and a GPU program that runs a simple Jacobi heat equation solver. To note, the same source files are used in both compilations but the GPU compilation is asked to honor the OpenACC directives which are included as comment lines in the source files [jacobi.f90](jacobi.f90) and [laplace2d.f90](laplace2d.f90). Future sessions will use examples from [miniWeather](https://github.com/mrnorman/miniWeather) and [MPAS](https://ncar.ucar.edu/what-we-offer/models/model-prediction-across-scales-mpas).

To note, small compilation projects and other minimal computational load tasks are permitted to run on the login nodes like we are doing below. But the actual executable and computationally expensive runtimes should only be run on batch compute nodes. If a user process runs an expensive application on the login nodes which impacts other users' experience, their program may be automatically terminated and repeat incidents may cause their user account to be temporarily limited.

In [None]:
module load nvhpc/22.2 &> /dev/null
nvfortran -fast -o laplace_cpu laplace2d.f90 jacobi.f90 && echo 'Compilation for CPU Successful!'
nvfortran -fast -gpu=cc70 -acc -Minfo=accel -o laplace_gpu laplace2d.f90 jacobi.f90 && echo 'Compilation for GPUs Successful!'
rm -f *.o *.mod

We can now run the compiled code on Casper batch compute by running the below qsub commands

In [None]:
qsub -l walltime=00:01:00 -l select=1:ncpus=1 -q casper -A $PROJECT -j oe -o laplace_cpu.out -- `pwd`/laplace_cpu
qsub -l walltime=00:01:00 -l select=1:ncpus=1:ngpus=1 -l gpu_type=v100 -q gpudev -A $PROJECT -j oe -o laplace_gpu.out -- `pwd`/laplace_gpu

Once the job has started, you should see the output files [laplace_cpu.out](laplace.out) and [laplace_gpu.out](laplace_gpu.out) in your job submit directory (see left pane in Jupyter Lab). The CPU program takes ~50 seconds to complete so after waiting a minute, run the below cells to view the results. GPU jobs should complete much faster but depend on availability of GPU nodes at the time. With the output, the program also tracks and prints runtime so you should see a the real measurement of the substantially lower runtime for the GPU program. 

In [None]:
cat laplace_cpu.out

In [None]:
cat laplace_gpu.out

## Conslusion
If you were able to get through all the above examples with no problems, **CONGRATULATIONS!** You should be ready for future interactive sessions as part of this GPU Computing Workshop series.

If not, please reach out to dhoward@ucar.edu or other workshop support team members on the [NCAR GPU Users Slack](https://ncargpuusers.slack.com/).