# 1: Choosing a System

Princeton University has a range of high-performance computing resources that are available to faculty and students. Broadly speaking, there are two types of systems, small and large, which are described in more detail below. The smaller systems are meant for exploration, small production runs, and visualization, and are available upon request to any Princeton University user. The larger systems are meant for solving computationally demanding research and getting access to these systems requires a proposal or sponsorship by a faculty member that is already using it.

## 1.1: Small Systems

The small systems offer users more computational power and storage than what is available on a personal laptop. They offer capabilities for running small parallel jobs, post-processing of data, and remote visualization. Getting started with one of the small systems is also a natural way to become familiar with how Princeton University's HPC resources work, before moving on to using one of the larger systems.

There are three systems available in this category:

* **[NOBEL](https://researchcomputing.princeton.edu/systems-and-services/available-systems/nobel)**
  - Great starting point for computational researchers and students.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - 56 cores distributed on 2 nodes (28 cores per node).
  - No job scheduling system.
  - 5GB storage per user.
  - Frequently used for course work.
  - Princeton University users have access by default.
* **[ADROIT](https://researchcomputing.princeton.edu/systems-and-services/available-systems/adroit)**
  - Use to run small parallel jobs and get familiar with the job scheduling system before moving to a bigger system.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - 288 cores distributed on 9 nodes(32 cores per node).
  - 16 GPUs distributed on 4 nodes (4 GPUs per node).
  - SLURM scheduling system.
  - 10GB storage in `/home` and 100GB in `/scratch/network`.
  - Frequently used for course work and testing code that needs to run in parallel.
  - Princeton University users can request access through this [online form](https://forms.rc.princeton.edu/registration/?q=adroit).
* **[TIGRESSDATA](https://researchcomputing.princeton.edu/systems-and-services/available-systems/tigressdata)**
  - Intended for developing, debugging, and testing code. For small production runs, post-processing, and remote visualization of data produced on the large systems.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - Visualization software available (VirtualGL, TurboVNC, ParaView, etc.).
  - 20 cores distributed on 1 node (20 cores per node).
  - No job scheduling system.
  - Mounted on the `/tigress` file system used by the large clusters.
  - Princeton University users can inquire about access by sending an email to <cses@princeton.edu>.
 
It is worth noting at this point that Nobel and Tigressdata do not use a job scheduler to manage the system resources. Since the resources are shared among all the users on Nobel and Tigressdata, one should be mindful of other users when running computationally expensive tasks. If you run a job that uses all the available resources, it will affect all the other users.

Adroit is different. It uses a software called SLURM to manage its resources. This means that you can request resources, and SLURM will try to make sure that the available resources get scheduled and distributed "fairly" among users. This is the same job scheduler that is used on the large Princeton clusters. Therefore, getting familiar with this system on Adroit is a natural progression before moving on to the large systems. Details on how to use SLURM are given in a later notebook.


## 1.2: Large Systems

If your research is computationally demanding and requires a lot of storage, you will consider using one of the large HPC systems at Princeton University. These systems are meant to enable users to run expensive parallel computations that generate massive amounts of data. For this reason, getting access to these systems is more exclusive and requires writing a proposal or sponsorship by a faculty member that is already on one of the systems.

There are three available systems:

* **[TIGER](https://researchcomputing.princeton.edu/systems-and-services/available-systems/tiger)**
  - General purpose cluster used by the largest, most demanding parallel codes. Supports GPU jobs.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - 16320 cores distributed on 408 nodes (40 cores per node).
  - 320 GPUs distributed on 80 nodes (4 GPUs per node).
  - SLURM scheduling system.
  - 20GB storage in `/home`, 512GB in `/scratch/gpfs`, and 512GB in `/tigress`.
* **[DELLA](https://researchcomputing.princeton.edu/systems-and-services/available-systems/della)**
  - General purpose cluster that works well for most parallel jobs and for users with large numbers of serial jobs.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - 5632 cores distributed on 224 nodes (20-32 cores per node).
  - Has nodes with different architectures (Ivybridge, Haswell, Skylake, and Broadwell).
  - SLURM scheduling system.
  - 10GB storage in `/home`, 512GB in `/scratch/gpfs`, and 512GB in `/tigress`.
* **[PERSEUS](https://researchcomputing.princeton.edu/systems-and-services/available-systems/perseus)**
  - Well suited for large, computationally intensive parallel jobs because of high core count per node that all include the latest AVX vector processing units.
  - Commercially licensed software available (Matlab, Intel compilers, etc.).
  - 8960 cores distributed on 320 nodes (28 cores per node).
  - All cores have the latest AVX vector processing units.
  - SLURM scheduling system.
  - 10GB storage in `/home`, 512GB in `/scratch/gpfs`, and 512GB in `/tigress`.
  
From this overview it is clear that there are some differences between the systems. Therefore, it is worth taking some time to consider the requirements of the application one intends to run on the cluster before deciding which system to use. If it requires GPUs, then Tiger is the answer. If it is a large collection of serial jobs, then Della might be the best choice. Or, if the application makes heavy use of vector processing units, then Perseus could be the best option.

Working with the different systems is similar. They are all Linux machines that use SLURM for resource management, and have similar file system usage.