# NYU HPC Guide for Beyond the Hype: LLMs vs. Traditional ML in the Real World

## Table of Contents
1. Logging In
2. Understanding the Filesystem
3. Setting up the Environment
   - Installing Conda
4. Request Compute Node
   - Interactive Jobs
     - With GPU
     - Without GPU
   - Batch Jobs
     - With GPU
     - Without GPU
5. Using OOD and Jupyter Notebook
   - With GPU
   - Without GPU

## Logging In

To access the Greene HPC cluster, you need to be on the NYU network. If you're off-campus, connect via the [NYU VPN](https://www.nyu.edu/life/information-technology/infrastructure/network-services/vpn.html).

### Steps to Log In

1. **Open a Terminal** on your local machine.

2. **Connect via SSH** (replace `[netid]` with your NYU NetID):
   ```bash
   ssh [netid]@greene.hpc.nyu.edu
   ```

3. You will see a welcome message upon successful login.

## Understanding the Filesystem

All the datasets and environment files for the workshop are located under `/vast/rs8020-share/`.

The Greene HPC cluster has different directories optimized for various storage needs:

| Directory  | Variable   | Purpose                | Flushed After | Quota             |
|------------|------------|------------------------|---------------|-------------------|
| `/archive` | `$ARCHIVE` | Long-term storage      | No            | 2TB / 20K inodes  |
| `/home`    | `$HOME`    | Configuration files    | No            | 50GB / 30K inodes |
| `/scratch` | `$SCRATCH` | Temporary data storage | Yes (60 days) | 5TB / 1M inodes   |

Check your quota:
```bash
myquota
```

**Recommended:** Store your data in `/scratch/[netid]` for the workshop.

## Setting up the Environment

### Installing Conda Inside the Singularity Container

1. **Create a directory in scratch:**
   ```bash
   mkdir /scratch/cm6627/hackathon
   cd /scratch/cm6627/hackathon
   ```

2. **Copy overlay and environment files:**
   ```bash
   cp -rp /vast/rs8020-share/overlay-25GB-500K.ext3 .
   cp -rp /vast/rs8020-share/environment.yml .
   ```

3. **Copy Singularity image:**
   ```bash
   cp -rp /scratch/work/public/singularity/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif .
   ```

4. **Start Singularity:**
   ```bash
   singularity exec --bind /scratch --nv --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
   /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash
   ```

5. **Install Conda inside Singularity:**

   Inside the Singularity container:

   ```bash
   cd /ext3/
   wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
   bash ./Miniconda3-latest-Linux-x86_64.sh -b -p /ext3/miniconda3
   rm Miniconda3-latest-Linux-x86_64.sh  # Remove installer
   ```

6. **Initialize Conda:**

   ```bash
   source /ext3/miniconda3/etc/profile.d/conda.sh
   export PATH=/ext3/miniconda3/bin:$PATH
   ```

7. **Create the Conda environment from the YAML file:**

   ```bash
   conda env create -f /scratch/cm6627/hackathon/environment.yml
   conda activate hackathon
   ```

## Request Compute Node

**IMPORTANT: Do not run computations on the login node. Always request a compute node.**

### Interactive Jobs (for Development)

#### With GPU

1. **Request an interactive session with GPU:**

   ```bash
   srun --reservation=cds-hackathon --gres=gpu:rtx8000:1 --time=04:00:00 --mem=64G --pty /bin/bash
   ```

2. **Once on compute node, start Singularity:**

   ```bash
   singularity exec --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
       --bind /scratch \
       --nv \
       /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
       /bin/bash
   ```

3. **Set up environment:**

   ```bash
   source /ext3/miniconda3/etc/profile.d/conda.sh
   conda activate hackathon
   ```

4. **Run your code:**

   ```bash
   python your_script.py
   ```

#### Without GPU

1. **Request an interactive session without GPU:**

   ```bash
   srun --reservation=cds-hackathon --time=04:00:00 --mem=64G --pty /bin/bash
   ```

2. **Once on compute node, start Singularity (without `--nv` flag):**

   ```bash
   singularity exec --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
       --bind /scratch \
       /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
       /bin/bash
   ```

3. **Set up environment:**

   ```bash
   source /ext3/miniconda3/etc/profile.d/conda.sh
   conda activate hackathon
   ```

4. **Run your code:**

   ```bash
   python your_script.py
   ```

### Batch Jobs (for Training)

#### With GPU

1. **Create a job script (save as `job_gpu.slurm`):**

   ```bash
   #!/bin/bash
   #SBATCH --job-name=hackathon_gpu
   #SBATCH --output=slurm_%j.out
   #SBATCH --error=slurm_%j.err
   #SBATCH --export=ALL
   #SBATCH --time=04:00:00
   #SBATCH --mem=64G
   #SBATCH --reservation=cds-hackathon
   #SBATCH --gres=gpu:rtx8000:1

   singularity exec --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
       --bind /scratch \
       --nv \
       /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
       /bin/bash -c "
   source /ext3/miniconda3/etc/profile.d/conda.sh
   conda activate hackathon
   python your_script.py
   "
   ```

2. **Submit the job:**

   ```bash
   sbatch job_gpu.slurm
   ```

#### Without GPU

1. **Create a job script (save as `job_nogpu.slurm`):**

   ```bash
   #!/bin/bash
   #SBATCH --job-name=hackathon_nogpu
   #SBATCH --output=slurm_%j.out
   #SBATCH --error=slurm_%j.err
   #SBATCH --export=ALL
   #SBATCH --time=04:00:00
   #SBATCH --mem=64G
   #SBATCH --reservation=cds-hackathon

   singularity exec --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
       --bind /scratch \
       /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
       /bin/bash -c "
   source /ext3/miniconda3/etc/profile.d/conda.sh
   conda activate hackathon
   python your_script.py
   "
   ```

2. **Submit the job:**

   ```bash
   sbatch job_nogpu.slurm
   ```

3. **Monitor your job:**

   ```bash
   squeue -u $USER  # Check job status
   scancel <job_id>  # Cancel job if needed
   ```

## Using OOD and Jupyter Notebook

### Access OOD

1. **Log in to Greene cluster at least once using terminal.**

2. **Connect to NYU VPN.**

3. **Access OOD at [https://ood.hpc.nyu.edu](https://ood.hpc.nyu.edu).**

### Set Up Custom Kernel

1. **In Singularity container, install `ipykernel`:**

   ```bash
   conda install ipykernel
   ```

2. **Create a custom kernel:**

   ```bash
   python -m ipykernel install --user --name=hackathon
   ```

3. **Copy kernel template:**

   ```bash
   mkdir -p ~/.local/share/jupyter/kernels/hackathon
   cd ~/.local/share/jupyter/kernels/hackathon
   cp -R /share/apps/mypy/src/kernel_template/* .
   ```

4. **Update the `python` file in the kernel directory:**

   Replace the content with:

   ```bash
   #!/bin/bash

   nv=""
   if [[ $(command -v nvidia-smi) ]]; then
       nv="--nv"
   fi

   singularity exec $nv \
       --overlay /scratch/cm6627/hackathon/overlay-25GB-500K.ext3:rw \
       /scratch/cm6627/hackathon/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
       /bin/bash -c "source /ext3/miniconda3/etc/profile.d/conda.sh; conda activate hackathon; python \"\$@\""
   ```

   Make sure to replace `cm6627` with your actual NetID.

5. **Update `kernel.json`:**

   ```json
   {
     "argv": [
       "/home/cm6627/.local/share/jupyter/kernels/hackathon/python",
       "-m",
       "ipykernel_launcher",
       "-f",
       "{connection_file}"
     ],
     "display_name": "hackathon",
     "language": "python"
   }
   ```

6. **Launch Jupyter through OOD:**

   - Go to **Interactive Apps > Jupyter Notebook**.
   - Configure with these settings:

     - **Partition:** `nvidia` (or `cpu` for no GPU)
     - **Number of hours:** `4`
     - **Memory:** `64GB`
     - **Number of GPUs:** `1` (set to `0` for no GPU)
     - **GPU Type:** `RTX8000`
     - **Additional Slurm Parameters:**

       ```bash
       --reservation=cds-hackathon
       ```

   - **Launch** and wait for the job to start.

7. **Select your custom kernel** when the Jupyter Notebook interface loads.

#### Without GPU

When configuring the Jupyter Notebook in OOD without GPU:

- **Partition:** `cpu`
- **Number of GPUs:** `0`
- Remove `--gres` options from Additional Slurm Parameters if present.