# P-Cluster Library and Job Management

This tutorial introduces the [Spack](https://spack.io) package manager and the [Slurm](https://slurm.schedmd.com) job manager tools available on the P-Cluster.

## The Spack Package Manager

*Spack is a flexible package manager designed for building and managing multiple software versions in high-performance computing environments. It allows users to easily install software with different configurations, dependencies, and compilers without interference between installations. Spack supports reproducibility and portability, making it ideal for complex scientific workflows across different systems.* - ChatGPT

### Setting up your environment for Spack
The [example .bashrc](./example.bashrc) initializes Spack profile. One can use Spack to install software and generate new modules even as a non-root user, although the collection of modules on the P-Cluster is already extensive.  

### Adding software to your environment using the `module` Command

The `module` command is used on the P-Cluster to manage environment modules. Modules allow users to easily load, unload, and switch between different software environments without manually modifying environment variables like `PATH` and `LD_LIBRARY_PATH` manually. This command is especially useful for managing multiple versions of software or libraries in shared environments.

When you load a module, it configures your environment to use a specific version of software. You can list available modules, load and unload modules, and reset your environment using various `module` subcommands.

Below is a list of common `module` commands and their functions:

| Command             | Description                                                                            | Example                    |
|---------------------|----------------------------------------------------------------------------------------|----------------------------|
| `module avail`      | Lists all available modules that can be loaded.                                         | `module avail`             |
| `module list`       | Shows a list of currently loaded modules in your environment.                           | `module list`              |
| `module load`       | Loads a specific module into your environment, making the software available for use.   | `module load gcc/9.3.0`    |
| `module unload`     | Unloads a specific module, removing it from your environment.                           | `module unload gcc/9.3.0`  |
| `module purge`      | Unloads all currently loaded modules, resetting your environment.                       | `module purge`             |


## The Slurm Batch System

*SLURM (Simple Linux Utility for Resource Management) is an open-source batch scheduling system widely used in high-performance computing (HPC) environments to manage and allocate computational resources. It enables users to submit, schedule, and manage jobs on clusters, ensuring efficient use of available nodes and resources. SLURM provides flexible scheduling policies, supports parallel and distributed workloads, and includes features for job prioritization and resource accounting.* - ChatGPT

### Common Slurm commands

| Command   | Description                                                                 | Example                                |
|-----------|-----------------------------------------------------------------------------|----------------------------------------|
| `sbatch`  | Submits a job script to the Slurm scheduler.                                | `sbatch job_script.sh`                 |
| `scancel` | Cancels a pending or running job.                                           | `scancel <job_id>`                     |
| `squeue`  | Displays information about jobs in the queue.                              | `squeue`                                |
| `sinfo`   | Displays information about available Slurm nodes and partitions.            | `sinfo`                                |
| `salloc`  | Allocates resources for a job interactively.                               | `salloc --ntasks=2 --ntasks-per-node=2 --partition=sealevel-c5xl-demand --time=01:00:00`    |
| `srun`    | Submits a job or launches parallel tasks (can be used in a script or interactively). | `srun --ntasks=4 ./my_program`|

There are numerous resources on the web about how to use Slurm. One useful example can be found [here](https://hpc.nmsu.edu/discovery/slurm/commands/#_slurm_script_main_parts).

Note that the head node (the machine where you first log in to 34.210.1.198) has very limited resources and is suitable for editing, but not for intensive data analysis or processing. We recommend using *salloc* to request an interactive node for heavy data processing. 


### Starting an interactive node

As stated above, the head node (the machine where you first log in to 34.210.1.198) has very limited resources and is suitable for editing, but not for intensive data analysis or processing. We recommend using *salloc* to request an interactive node for heavy data processing. 

For instance, to start an interactive with machine type [c5.xlarge](https://instances.vantage.sh/aws/ec2/c5.xlarge
), issue the following command from your home directory:
```
salloc --ntasks=2 --ntasks-per-node=2 --partition=sealevel-c5xl-demand --time=01:00:00 
```
This command requests an interactive node on the `partition` called ```sealevel-c5xl-demand``` with two tasks (a term similar to processes) running for one hour. 

On SLURM systems, a "partition" is a set of compute nodes grouped for specific job submissions. Partitions define a collection of resources with particular attributes or policies where jobs are submitted, such as job limits or access to specific hardware resources. The equivalent to "partition" on [PBS (Portable Batch System)](https://altair.com/pbs-professional) the "queue." 

After issuing the `salloc` command, and waiting a few minutes, one would receive the following message on the screen with a job identification number (ID), as shown below (using 123 as an example ID):
```
salloc: Granted job allocation 128
salloc: Waiting for resource configuration
```
**`Slurm` may take several minutes to allocate and configure the requested resources.** Once the resources are ready, the prompt will appear as follows:
```
salloc: Nodes sealevel-c5xl-demand-dy-c5xlarge-1 are ready for job
USERNAME@ip-10-20-22-69:~$ 
```
Then, one can run command or executable scripts. If the interactive node is no longer needed, use `scancel JOB_ID` to exit the partition and return the requested resources.

In addition to `salloc`, `srun` can also be used to request an interactive node, though it is often used to run a specific script or job. `salloc`, on the other hand, allows users to run multiple commands once the resources are allocated.