Will Furnass
Research Software Engineering team, University of Sheffield
https://rse.shef.ac.uk/rse-dcs-pres-on-hpc/
- What is HPC? Why bother?
- HPC at TUOS
- Using HPC
- Further resources
- Edit and run code in one place
- ~4 cores - some parallelism
- Full control!
but...
- Limited RAM
- Basic CPU
- Limited number of cores
- Limited, fragile storage
- Modest GPU
- Limited network connection
- How to distribute work between laptops?
- How to run a series of tasks overnight?
- "1am: Check if run 3 finished & start run 4"
- "3am: Check if run 4 finished & start run 5"
- "5am: Check if run 5 finished & start run 6"
'HPC': a computer cluster with:
- Computing resources
- many nodes
- each with many cores, much RAM, maybe GPUs
- connected by fast networking
- Storage
- both shared and per-node
- resilient and fast
- Job management
- Queue up jobs to run over a week
- Command-line as default interface
- Linux OS + optimised research software
- Bessemer:
- Newest hardware
- Best for single-node jobs
- ShARC:
- Good for multi-node jobs as has high-bandwidth, low-latency interconnects
- Majority of nodes public (free at point of use)
- But DCS has some private nodes in Bessemer and ShARC:
- Non-std hardware specs
- Less contention (sometimes!)
- 8x nodes with each with
- 4x NVIDIA V100 GPUs
- Fast NVLink interconnects between GPUs
- 192 GB RAM https://docs.hpc.shef.ac.uk/en/latest/bessemer/groupnodes/
- 1x node with
- 8x NVIDIA P100 GPUs (NB 1x GPU currently faulty)
- Fast NVLink interconnects between GPUs
- 512GB RAM
- 8x nodes each with
- 768GB RAM
- 5 with 1TB SSDs
- 4x nodes each with
Can also run Jupyter Notebooks on cluster!
Ask for more info
- SSH-based methods are your friends here (
rsync
,scp
,sftp
) - Or
- Use a storage area directly accessible to both your local machine and HPC?
- Just use HPC?
Location | Shared? | Quota | Backups? | Multi-HPC |
---|---|---|---|---|
/home/$USER |
✓ | 10 GB | ✓ | ✓ |
/data/$USER |
✓ | 100 GB | ✓ | ✓ |
/fastdata/$USER |
✓ | - | ✗ | ✓' |
/scratch |
✗ | - | ✗ | ✗ |
/shared/$PROJNAME |
✓ | 10 TB | ✗ | ✓ |
Location | Remote access? | Speed | Suited to |
---|---|---|---|
/home/$USER |
SSH | > | Pers data |
/data/$USER |
SSH | > | Pers data |
/fastdata/$USER |
SSH | >>> | Tmp big files |
/scratch |
- | >>> | Tmp small files |
/shared/$PROJNAME |
SSH + CIFS | > | Proj files |
- Users submit jobs to a job scheduler
- e.g. Slurm (Bessemer) or SGE (ShARC)
- A distributed resource manager
- Not intuitive!
- V. powerful --
- Request
- Interactive or batch job
- Run time (e.g. 2h or 4d)
- Computational resources (cores, RAM, GPUs)
- Access to private resources
- Notifications --
- Type of job
- Interactive sessions (if resources are available)
- Batch jobs (submit job to a queue)
[me@mylaptop ~]$ ssh te1st@bessemer.sheffield.ac.uk
...
[te1st@bessemer-login1 ~]$ srun \
--partition=dcs-gpu-test \
--account=dcs-res \
--cpus-per-gpu=4 \
--mem-per-cpu=2G \
--gpus=1 \
--pty \
/bin/bash
[te1st@bessemer-node030 ~]$ ./my_simulation_program --num-cores=4
...
Create a shell script, my-job-script.slurm
:
#!/bin/bash
#SBATCH --partition=dcs-gpu
#SBATCH --account=dcs-res
#SBATCH --cpus-per-gpu=4
#SBATCH --mem-per-gpu=2G
#SBATCH --gpus=4
#SBATCH --mail-user=me@sheffield.ac.uk
./my_simulation_program --num-cores=16
Copy this file to Bessemer then log on to Bessemer and submit this to Slurm:
[me@mylaptop ~]$ ssh te1st@bessemer.sheffield.ac.uk
[te1st@bessemer-login1 ~]$ sbatch my-job-script.slurm
Now go home for dinner!
You can then:
- Wait for an email notification
- Check status (running/queueing)
- Cancel/amend job
- Run short test jobs
- View resource utilisation
- Extrapolate
- Submit larger jobs
-
Compilers, libraries, apps, dev tools etc
-
Activate a package by loading a modulefile e.g.
module use $MODULENAME
Where, for e.g. cuDNN, $MODULENAME
could be one of:
libs/cudnn/4.0/binary-cuda-7.5.18
libs/cudnn/5.1/binary-cuda-7.5.18
libs/cudnn/5.1/binary-cuda-8.0.44
libs/cudnn/6.0/binary-cuda-8.0.44
libs/cudnn/7.0/binary-cuda-8.0.44
libs/cudnn/7.0/binary-cuda-9.1.85
...
Several options:
- Install non-optimised binary packages in e.g. your home directory
- Conda
- Build optimised software stacks from source
- Spack, EasyBuild
- Run containers
- Singularity - similar to Docker
All useful for e.g. provisioning/using complex Deep Learning software stacks!
-
Laptop may be faster than single-core job on HPC:
- CPUs in servers run at lower clock speeds
.exe
may not exploit advanced CPU features
-
Performance often comes from >=1 of:
- Optimising for CPU architecture
- CPU parallelism (multiple cores, multiple nodes)
- Accelerators (GPUs, TPUs, Xeon Phi etc)
- In Bessemer and ShARC the CPUs support
- hardware vectorisation (same instruction applied to multiple elements in memory)
- fused add-multiply (useful for matrix multplication)
- Either use pre-compiled libraries that can dynamically use these
- e.g. Intel Math Kernel Library (MKL)
- or compile to produce builds optimised for those CPUs
(At least) 5 flavours:
- Single node
- Multiple CPU cores
- Typically 1 thread per core
- Thread-local and shared variables
- Many applications do this via OpenMP and/or Intel MKL
- Multiple CPU cores
- Typically 1 process per core
- Separate address spaces
- Data (and code?) passed between processes
- e.g.
joblib
w/ multiprocessing oripyparallel
; MATLAB parfor; Rparallel
/foreach
- Multiple CPU cores per node (symmetric?)
- Typically 1 process per core
- Separate address spaces per process
- Data (and code?) passed between processes
- within a node
- between nodes
- V. fast interconnects between nodes
- Facilitated by
- MPI (API + software for exploiting fast interconnects)
- Apps/libs that understand MPI (
ipyparallel
,PETSc
, MATLAB DCE)
- Set of near identical tasks
- Embarrassingly parallel
- Scheduled separately
- Which then packs out its schedule with them!
- Great for sensitivity analyses
- Massive data parallelism
- Very effective for linear-algebra-heavy ops
- ML, DL
- Can either write low-level code in CUDA
- Or use higher-level libs that speak CUDA
- Tensorflow, PyTorch etc for DL
High-level APIs for working with large datasets, possibly out of core:
- Spark
- Dask
- Potential issues
- Jobs too big / queue times too long for ShARC?
- Want newer GPUs/processors?
- Options
- JADE/JADE2: Tier 2 HPC facilities for Deep Learning
- JADE: 22x DGX-1 systems: 22x 8x NVIDIA V100 cards (NVLINK between GPUs in nodes)
- JADE2 (pilot phase): similar to JADE but with 63 nodes instead of 22
- Bede: new N8 Tier 2 HPC facility for distributed DL/ML
- 32x IBM AC922 nodes (2x POWER9 CPU; 4x V100 GPU; NVLINK between GPUs and CPUs)
- 4x IBM IC922 'inference' nodes with T4 GPUs
- 100Gbps Infiniband EDR interconnects
- Better suited to hybrid CPU+GPU codes and scaling to multiple nodes than JADE/JADE2
- JADE/JADE2: Tier 2 HPC facilities for Deep Learning
- Options (continued)
- Other Tier 2 facilities (https://www.hpc-uk.ac.uk/facilities/)
- Tier 1 HPC facility: Archer
- Cloud (AWS, Azure, GCP etc)
- Alces Flight - traditional HPC in the cloud
- Docs: https://docs.hpc.shef.ac.uk (not a tutorial)
- For DCS nodes: https://docs.hpc.shef.ac.uk/en/latest/sharc/groupnodes/
- Workshops
- RSE team runs various workshops on fundamentals:
- UNIX shell, Git, Python/R/MATLAB, relational databases...
- and more advanced topics:
- multithreading/multiprocessing, CUDA, deep learning...
- IT Services also offer training in C/C++, Fortran, Python, MATLAB and HPC
- RSE team runs various workshops on fundamentals:
- IT Services' helpdesk
- Talks
- LunchBytes talks
- Code Clinic
- Book an appointment to get help with a coding issue
- Hire an RSE to help with your project(s)!
- Either as part of a grant proposal
- Or just for a few days
For more info (inc. mailing list and events schedule) see https://rse.shef.ac.uk/.
- 13.5 RSEs
- Team kick-started by 2x EPSRC RSE fellowships
- Based in Computer Science
- but work closely with IT Services
- Some current and recent projects:
- High-performance agent-based modelling (CUDA)
- Deep learning and workflows for NLP
- MRI image alignment (registration) software (C++/PETSc)
- Agile web apps for visualising datasets (R/Shiny)
- Augmenting cell modelling software (C++)
rse@sheffield.ac.uk