# Interactive Jupyter Notebooks Documentation

## Cheyenne Introduction to Cheyenne for New users

<p>Constructed from the materials from Brian Vanderwende of the Consulting Services Group at the National Center for Atmospheric Research</p>

<p>Authors: Brian Vanderwende, Thomas Johnson III</p>

In [None]:
#Reference https://github.com/takluyver/bash_kernel
#Commands to install Bash Kernel for Jupyter Lab
# $ pip install bash_kernel
# $ python -m bash_kernel.install
# css display property for putting divs side by side: https://stackoverflow.com/questions/2637696/how-to-place-div-side-by-side/24292602

<div style="background-color:#84d0ff;z-index:-1; border-radius:15px;border-style:solid;border-color:#84d0ff;">
<h2 style="text-align:center;background-color:#84d0ff;">Overview of Tutorial</h2>
<div style="border-color:#84d0ff; border-style:solid; background-color:white; border-radius:15px; padding:5px">
This tutorial will cover the accessing and utiizing Cheyenne's resources for new users. The tutorial is facilitated through the usage of Jupyter Notebooks that allow for the code in select cells to be ran interactively. This notebook is written in the Bash kernel, so bash cammands will be what the Jupyter Notebook is responsive to.
This tutorial will cover:
<ul>
<li>Signing into the HPC Systems and Managing Data</li>
<li>Accessing and building software</li>
<li>Submitting jobs utilizing the PBS and Slurm schedulers</li>
<li>Customizing your user environment</li>
</ul>
For a more in-depth collection of user details, please refer to <a href="https://www2.cisl.ucar.edu/resources/resources-overview"> https://www2.cisl.ucar.edu/resources/resources-overview<a>.
</div>
</div>

<h2 style="text-align:center">Short Breakdown of NCAR HPC Systems</h2>

<div style="margin-left: 5%; margin-right:5%;" id="holding_div">
<div style="display:table-row;" id="row_div">
<div style="background-color:#ceffff; height:600px; display:table-cell; width:50%;" id="column_div_1">
<br>
    <div style="margin-left:5%;">
<h3>HPC-Simulation</h3>
    <h3>Cheyenne</h3> <p>- 4032 node</p>
<ul>
    <li>2-socket 18-core Intel Broadwell Xeon CPUs</li>
    <li>3164 nodes with 64 GB mem</li>
    <li>864 nodes with 128 GB mem</li>
    <li>SUSE Enterprise Linux 12</li>
    <li>PBS job scheduler</li>
</ul>
<br>
    </div>
</div>

<div style="background-color:#a7f9b9; height:600px;display:table-cell;width:50%;" id="column_div_2">
<br>
    <div style="margin-left:5%;">
<h3>Data Analysis, Visualization (DAV) Machine Learning/Deep Learning</h3>
    <h3>Casper</h3> <p>- 26 nodes featuring:</p>
<ul>
    <li>2-socket 18-core Intel Skylake</li>
    <li>2 TB local NVMe SSD storage</li>
    <li>CentOS 7 Linux</li>
    <li>Slurm job scheduler</li>
</ul>
    <p>GPUs available:</p>
    <ul>
        <li>8 nodes - 1 NVIDIA GP100</li>
        <li>2 nodes - 4 NVIDIA Tesla V100s</li>
        <li>2 nodes - 8 NVIDIA Tesla V100s</li>
    </ul>
<br>
</div>
</div>
    </div>
    </div>

<h2 style="text-align:center;">Logging into the HPC Environment</h2>
<ul>
    <li>Use your authentication token (yubikey, Duo) and your username to login:<br>
        <p>ssh -X -l username cheyenne.ucar.edu</p></li>
    <li>You will be placed onto one of six login nodes</li>
    <li>Your default shell istcsh, but you can switch your default shell at <a href="sam.ucar.edu">sam.ucar.edu</a></li>
    <li>cron jobs (scheduled recurring tasks) are shared among all login nodes</li>
</ul>
<p>Please note that login nodes are merely an entry point. To get to your home directory. Please run code and jobs from your work or scratch directory that will eb introduced later.</p>

In [None]:
# Demonstration of login command
# Replace [$USER] with your username when running the script
ssh -X -l [$USER] cheyenne.ucar.edu

<p>Note: If you are using a CISL computer in which your username on the computer's account is the same as the username that you utilize for the accessing NCAR services, then you can avoid putting your NCAR username in the bash script to ssh into the HPC systems.</p>

<h2 style="text-align:center;">Be Aware of Your Usage of Shared Resources such as Login Nodes</h2>
<ul>
<li>Your programs coexist with those fo 10-100s of other users for processing and memory</li>
<li>Therefore, limit your usage to:
<ul>
<li>Reading and writing text/code</li>
<li>Compiling smaller programs</li>
<li>Performing data transfers</li>
<li>Interactig with the job scheduler</li>
</ul>
<li>Programs that use excessive resources on the login nodes will be terminated</li>
</ul>
<p>Please do not attempt any commands that would involve sudo/root priveleges on the HPC system. Instead, use commands and methods that do not require sudo/root priveleges.</p>

<h2 style="text-align:center">Personal Data Storage at NCAR</h2>

<h3>GLADE Parallel Hard-disk Storage</h3>
<ul>
    <li>Optimized for parallel input/output</li>
    <li>Accessible from all HPC systems</li>
</ul>
<div style="background-color:#a7f9b9">
<br>
<table>
    <tr><th>File Space</th><th>Quota</th><th>Backup</th><th>Uses</th></tr>
    <tr><td>Home /glade/u/home/USER</td><td>25GB</td><td>Yes</td><td>Settings,code,scripts</td></tr>
    <tr><td>Work /glade/work/USER</td><td>1TB</td><td>No</td><td>Compiled codes, models</td></tr>
    <tr><td>Flash /glade/flash/USER</td><td>N/A</td><td>No</td><td>Fast temp space (By request)</td></tr>
    <tr><td>Scratch /glade/scratch/USER</td><td>10TB</td><td>Purged!</td><td>Run directories, temp output</td></tr>
</table>
<br>
</div>

Note that USER is equivalent to $USER in the chart above.

<p>To navigate to these directories, use the <code>cd</code> command followed by the directory or subdirectory as an argument. To see where you are in directories or subdirectories, use the <code>pwd</code> command.</p>

In [None]:
#Demonstration of cd and pwd command
#You can change $USER to your username
echo "The present working directory:"
pwd
echo "Change directory to Work:"
cd /glade/work/home/$USER
pwd
echo "Change directory to Flash:"
cd /glade/work/flash/$USER
pwd
echo "Change directory to Scratch:"
cd /glade/work/scratch/$USER
pwd
echo "Change directory to Home:"
cd /glade/u/home/$USER
pwd

<p>To keep track of your resources utilizing GLADE, use the <code>gladequota</code> command.</p>

In [None]:
#Demonstration of the gladequota command
echo "Will print out the glade resource usage."
gladequota

<p>Above is the utilization of your GLADE resources. Remember to keep track of these reources to ensure that you are not overutilizing GLADE resources.</p>

<h2 style="text-align: center;">Collaborative and Long-term Storage</h2>
<ul>
    <li>Dedicated GLADE project spaces.</li>
    <li>Campaign Storage for publication-scale storage lifespans (5-year purge).</li>
    <li>HPSS tape archive for cold archival.</li>
</ul>
<p>Access to these spaces is contingent on your project/lab status. See our web documentation for more details.

<h2 style="text-align: center;">Moving Data to and from GLADE</h2>
<ul>
    <li>For short transfers, you can use scp/sftp to transfer files</li>
    <li>Large or lengthy transfers will benefit from Globus</li>
    <ul>
        <li>To use Globus, create a Globus ID if you do not have one, and search for NCAR GLADE or NCAR Campaign Storage endpoints</li>
        <li>CISL endpoints currently can be activated for up to 30-days</li>
        <li>Globus has a web interface and a command-line interface</li>
        <li>Globus Connect Personal can manage transfers from your local workstation as well</li>
    </ul>
    <li>Transfers to and from the HPSS tape archive are made using the HSI interface and HTAR utility</li>
</ul>

<p style="text-align:center; font-weight:bold">See GLADE tutorial conducted at 10 AM on January 17th for more details on storage!</p>

<h2 style="text-align:center;"> CISL Builds Software for Users to Load with Environment Modules</h2>
<ul>
    <li>Modules provide access to program binaries (e.g., ncl, Python, ifort)</li>
    <li>Many modules will help you compile and link to common libraries (e.g., netCDF, MPI)</li>
    <li>Modules also prevent you from loading conflicting software into your environment</li>
    <li>Note that Cheyenne and Casper each have independent collections of modules!</li>
</ul>
    

<h2 style="text-align:center">Using Environment Modules</h2>
<ul>
    <li><code>module load/unload [software]</code> - load and unload software (ex: Python, MatLab)</li>
    <li><code>module avail</code> - show all currently-loadable modules</li>
    <li><code>module list</code> - show loaded modules</li>
    <li><code>module purge</code> - remove all loaded modules</li>
    <li><code>module save/restore [name]</code> - create/load a saved set of software</li>
    <li><code>module spider [software]</code> - search for a particular module</li>
</ul>

In [None]:
#Demonstrating module avail
module avail

[m
---------------- /glade/u/apps/ch/modulefiles/default/compilers ----------------[m
   gnu/4.9.2        gnu/7.2.0       intel/17.0.1 ([1;33mL[0m,D)    pgi/17.5[m
   gnu/6.2.0        gnu/7.3.0       intel/18.0.1          pgi/17.9 (D)[m
   gnu/6.3.0 (D)    gnu/8.1.0       intel/18.0.5[m
   gnu/6.4.0        intel/16.0.1    intel/19.0.2[m
   gnu/7.1.0        intel/16.0.3    pgi/16.5[m
[m
------------------ /glade/u/apps/ch/modulefiles/default/idep -------------------[m
   R/3.3.2                       joe/4.1[m
   R/3.4.0                       julia/0.6.2       (D)[m
   R/3.5.2                (D)    julia/1.0.0[m
   allinea-forge/7.0.4           matlab/R2015b[m
   allinea-forge/7.1             matlab/R2016b[m
   allinea-forge/18.0.2   (D)    matlab/R2018a[m
   allinea-reports/7.0.4         matlab/R2019a     (D)[m
   allinea-reports/7.1           multijob/1.0[m
   allinea-reports/18.0.2 (D)    nano/2.7.4[m
   arm-forge/18.1.2              ncarenv/1.0[m
   arm-forge/19

In [1]:
#Demonstrating module load (the argument is the piece of software that you are loading)
module load matlab

In [2]:
#Demonstrate module list
module list

[m
Currently Loaded Modules:[m
  1) ncarenv/1.2           4) mpt/2.19       7) matlab/R2019a[m
  2) intel/17.0.1          5) netcdf/4.6.1[m
  3) ncarcompilers/0.4.1   6) python/3.6.8[m
[m
 [m
[m
[K[?1l>

In [3]:
#Demonstrate module unload
module unload matlab

In [None]:
#Demonstrate module spider (software you are looking for is the argument)
module spider python

[m
----------------------------------------------------------------------------[m
  python:[m
----------------------------------------------------------------------------[m
     Versions:[m
        python/2.7.13[m
        python/2.7.14[m
        python/2.7.15[m
        python/3.6.2[m
        python/3.6.4[m
        python/3.6.8[m
     Other possible modules matches:[m
        netcdf4-python  wrf-python[m
[m
----------------------------------------------------------------------------[m
  To find other possible module matches execute:[m
[m
      $ module -r spider '.*python.*'[m
[m
----------------------------------------------------------------------------[m
  For detailed information about a specific "python" module (including how to lo[mad the modules) use the module's full name.[m
  For example:[m


In [1]:
#Saving a module set using the module save command
#All modules to be saved should be already loaded
module save mod_config_1

Saved current collection of modules to: "mod_config_1", for system: "ch"



In [5]:
#Demonstate module purge (module purge has no output)
module purge
module list

No modules loaded[m
[K[?1l>

In [6]:
#Restoring a set of saved software utilizing module restore (module list being used to show restored modules)
module restore mod_config_1
module list

Restoring modules from user's mod_config_1, for system: "ch"
[m
Currently Loaded Modules:[m
  1) ncarenv/1.2    3) ncarcompilers/0.4.1   5) netcdf/4.6.1[m
  2) intel/17.0.1   4) mpt/2.19              6) python/3.6.8[m
[m
 [m
[m
[K[?1l>

In [1]:
module delete mod_config_1
module purge

<p>Memorizing and/or keeping note of this commands can make interacting with the HPC systems at NCAR a lot smoother. Also, saving module configurations that are likely to be used repeatedly can cut dwn on preparation time.</p>
<p style="font-weight:bold;">It is not recommended that more modules than are needed for a given task are loaded. Conserve your resources please.</p>

<h2 style="text-align:center;">Considerations when Compiling Software</h2>
<ul>
<li>Use <style="text-weight:bold;">ncarcompilers</style> module along with libraries to simplify compiling and linking (adds include and link flags for you)</li>
<li>When using MPI, make sure you run with the same library with which you compiled your parallel code</li>
<li style="text-weight:bold;">CISL recommends building code for the machine on which you will run</li>
    <ul>
        <li>Cheyenne and Casper have different CPUs and operating systems</li>
    </ul>
</ul>

<h2 style="text-align:center;">Run Large Tasks on Compute Nodes Using Batch Jobs</h2>
<ul>
    <li>Many tasks require too many resources to run on a login node</li>
    <li>Schedule these tasks to run on the Cheyenne compute nodes using PBS or on Casper nodes using Slurm</li>
    <li>Jobs request a given number of compute tasks for an estimated wall-time on specified hardware</li>
    <li>Jobs use core-hours, which are charged against your selected project/account</li>
    <ul>
        <li>Remaining resources are viewable in SAM</li>
    </ul>
    <li>Temporary files are often written by programs - set TMPDIR variable to scratch space to avoid job failures</li>
</ul>

<div style="background-color:#84d0ff;">
<h2 style="text-align:center;">Example PBS Job Script</h2>
</div>

In [None]:
$ cat > basic_mpi.pbs << EOF

<div style="background-color:#84d0ff">
<h2 style="text-align:center;">Example Slurm Job Script</h2>
</div>

In [None]:
vi text.txt | display

<h2 style="text-align:center;">Interacting with the Job Schedulers</h2>

<h3>PBS on Cheyenne</h3>
<p><code>qsub [script]</code> - submit batch job</p>
<p><code>qstat [jobid]</code> - query job status</p>
<p><code>qdel [jobid]</code> - delete/kill a job</p>
<p><code>qinteractive -A [project]</code> - Run an interactive job</p>
<p><code>qcmd -A [project] -- cmd.exe</code> - Run a command on a single compute node</p>

<h3>Slurm on DAV</h3>
<p><code>sbatch [script]</code> - submit batch job</p>
<p><code>squeue -j [jobid]</code> - query job status</p>
<p><code>scancel [jobid]</code> - delete/kill a job</p>
<p><code>execdav -A [project]</code> - Run interactive job on DAV</p>
<br>
<p><i>See our Casper tutorial and documentation for more details on requesting memory/GPUs with the execdav utility</i></p>

<h2 style="text-align:center;"> Using threads/OpenMP Parallelism on Cheyenne with MPT</h2>

<div style="background-color:#84d0ff;">
<h3>Only OpenMP</h3>
</div>

In [None]:
#!/bin/tcsh
#PBS -l select=1:ncpus=10:ompthreads=10
module load mpt/2.19f
# Run program with 10 threads
omplace ./executable_name

<div style="background-color:#84d0ff;">
<h3>Hybrid MPI/OpenMP</h3>
</div>

In [None]:
#!/bin/tcsh
#PBS -l select=2:ncpus=36:mpiprocs=1:ompthreads=36
module load mpt/2.19f
# Run program with one MPI task and 36 OpenMP
# threads per node (two nodes)
mpiexec_mpt omplace ./executable_name

<h2 style="text-align:center;">Running Serial Code on Multiple Data Files Using Command File Jobs</h2>

<div style="background-color:#84d0ff;">
<h3>cmdfile contents</h3>
</div>

In [None]:
./cmd1.exe < input1 > output1
./cmd2.exe < input2 > output2
./cmd3.exe < input3 > output3
./cmd4.exe < input4 > output4

<div style="background-color:#84d0ff;">
<h3>PBS Job script</h3>
</div>

In [None]:
#!/bin/tcsh
#PBS -l select=1:ncpus=4:mpiprocs=4
module load mpt/2.15f
# This setting is required to use command files
setenv MPI_SHEPHERD true
mpiexec_mpt launch_cf.sh cmdfile

<p><i>Note: Optimal if commands have similar runtimes</i></p>

<h2 style="text-align:center;">Placing Casper Jobs on Specific Resources</h2>

<div style="background-color:#84d0ff;">
<h3>Casper Jobs on Specific Resources</h3>
</div>

<ul>
    <li>This job can only run on a node with 100 GB of free memory and 2 V100 GPUs</li>
    <li>If multiple resources are specified they must be compatible</li>
    <ul>
        <li>Otherwise, job will be stuck in a pending state</li>
    </ul>
</ul>

<h2 style="text-align:center;">PBS Job Submission Queues</h2>

<div style="background-color:#ceffff">
    <br>
<table>
    <tr><th>PBS Queue</th><th>Priority</th><th>Wall Clock</th><th>Details</th></tr>
    <tr><td>premium</td><td>1</td><td>12 h</td><td>Jobs are charged at 150% of regular rate</td></tr>
    <tr><td>regular</td><td>2</td><td>12 h</td><td>Most production compute jobs go here</td></tr>
    <tr><td>economy</td><td>3</td><td>12 h</td><td>Jobs are charged at 70% of regular rate</td></tr>
    <tr><td>share</td><td>N/A</td><td>6 h</td><td>Memory is shared among all users on a node Jobs are limited to 18 cores or less</td></tr>
</table>
    <br>
</div>

<p><i>Jobs charges depend on the queue:</i></p>
<p><i>Exclusive:</i> wall-clock hours X nodes X 36 cores/node X queue factor</p>
<p><i>Shared:</i> core-seconds / 3600 (DAV jobs are shared as well)</p>

<h2 style="text-align:center;">When Running Programs with GUI (e.g., VAPOR), use a TurboVNC Session</h2>

<p>VNC can be used to run a remote GNOME/KDE desktop.</p>
<p>Usage:</p>

In [None]:
vncserver_submit -a [project]

<p>(or set DAV_PROJECT environment variable)</p>

<h2 style="text-align:center;"> Shell Startup Files - Customizing Your Default Environment</h2>

<h3>tcsh/csh</h3>

<h3>bash</h3>

<h2 style="text-align:center;">Changing Your Default Modules</h2>

<ul>
    <li>If you commonly load certain modules, you may wish to have them load automatically when logging onto a cluster</li>
    <li>The right way to do so is with saved module sets:</li>
    <ul>
        <li><code>module load ncl python nco mkl</code></li>
        <li><code>module save default</code></li>
    </ul>
    <li>You can manually load a module set using <code>module restore [set]</code></li>
    <li> The <code>[set]</code> argument is the name of the module configuration you saved</li>
    <li>Avoid putting module load commands in your shell startup files!</li>
</ul>

<div style="background-color:#40e0d0;">
<h2 style="text-align:center;">CISL Help Desk/Consulting</h2>
</div>

<a href="https://www2.cisl.ucar.edu/user-support/getting-help">https://www2.cisl.ucar.edu/user-support/getting-help</a>

<ul>
    <li>Walk-in: ML 1B Suite 55</li>
    <li>Email: cislhelp@ucar.edu</li>
    <li>Phone: 303-497-2400</li>
</ul>
<p>Specific questions and/or feedback in regard to thei material:</p>
<ul>
    <li>Email: vanderwb@ucar.edu</li>
</ul>