# HPC Architecture

## Overview
- HPC Job Scheduling
- Peformance analysis
- Debugging
- Profiling
- Petascale and Exascale Computing.

## Cluster Terminology


## What is a Job
- Job 
    - User's program/name of an executable
    - input data and parameters
    - environment variables
    - required libraries
    - descriptions of computing resources required
- Job Script
    - Formal specification
    - identifies an application to run along with its input data and env variables
    - requests computing resources.

## Local Resource Manager (LRM) / Batch System

- Job Scheduler or Workload Manager
    - Identifies jobs to run, selects the resources for the job, and decides when to run the job.
- Resource manager
    - identifies the compute resources and keeps track of their usage and feeds back this info to the workload manager.
- Execution manager
    - job initiation and start of execution is co-ordinated by the execution manager of the batch system.

## Job Scheduling
- The LRM  is responsible for receiving and parsing job script.
- if a job cannot be executed immediately, it is added to a queue. 

## Job Scheduling Policies:
- FCFS
- Multi-priority queues
- Back-filing
- Fair-share
- Premptive

## Job execution in Compute Nodes.
- User -> Job script -> Head Node/Login NOde -> Network Switch -> Compute Nodes
- Job script: Selects the appropriate node for job execution. exec and input files are copied to compute nodes and job is started.
- Login Node: Monitor's the status of job submitted. (Its not supposed to run anything, only compute nodes run execution)
- After execution, Input and Output files are written to user specified location. 

## Local Resource Manager - SLURM
- When you login to HPC cluster, you land on login nodes.
    - Login nodes are not meant to run jobs.
    - These are used to submit jobs to Compute Nodes.
- To submit job on the cluster, you need to write a scheduler job script.
- SLURM - Simple Linux Utility for Resource Mangement.
- It is a local manager that provides a framework for job queues, allocation of computer nodes, and the start and execution of jobs.

## SLURM Components
- client commands list
- compute node daemons
    - slurmd
- controller daemons
    - slurmctld
    - secondary slurmctld
    - slurmdbd
- data base
- other clusters (option)

## SLURM Commands
The list and descriptions of the mostly used Slurm commands (refer ppt).
- sbatch
- squeue -> info on job queue
- sinfo -> info on all nodes, partition, and their availabilty
- scancel
- scontrol 
- sacct
- srun



## SLURM: Sample Job Script for Serial Jobs


## SLURM: Sample Job Script for Parallel Jobs on GPUs
#!/bin/bash

#SBATCH -N 1                                // number of nodes

#SBATCH --ntasks-per-node=40                // number of cores per node

#SBATCH --output=3mm.out                    // name of output file

#SBATCH --error=3mm.err                     // name of error file

#SBATCH --time=01:00:00                     // time required to execute the program

#SBATCH --gres=gpu:2                        // request use of GPUs on compute nodes

#SBATCH --partition=gpu                     // partition or queue name




export OMP_NUM_THREADS=40

## Commands executed in Param Utkarsh:
- sinfo
- squeue
- ls /tmp/slurm-samples
- cp -r /tmp/slurm-samples ~/
- cd slurm-samples
- sbatch test1.slurm
- ls -lrt
- squeue -u \<username\>
- sbatch test2.slurm
- cp test1.slurm test-sleep.slurm
- vim test-sleep.slurm
- (Add line "sleep 120" so that this job will sleep for 120 seconds).
- sbatch test-sleep.slurm
- squeue -u \<username\>
- scontrol show job \<jobid\>
- scancel \<jobid\>
- sacct - u \<username\>
- srun --nodes=1 --ntasks-per-node=1 --time=00:05:00 --pty bash
- (This srun command will queue a job with jobid and will wait till resources are allocated to that jobid)