# Remote to a JupyterLab ran via Slurm job

Often a researcher may not have the environment setup locally or may just not have the right machine to run the experiments. In my case, I have an Apple Silicon laptop which is not very compatible with the tools used. One can use an SSH file manager like [FileRemote](https://github.com/allanrbo/filesremote), git repository to synchronize code, rsync and or rclone to synchronize code and files. Personally, it causes a lot of wasted time trying to figure out how to work these tools or just even type in the commands. 

Thus, running JupyterLab via a Slurm job is a quick and simple way to start an interactive session to speed up the iterative develoment/troubleshooting process.

Below is a quick tutorial on how I got it working in the cluster that I have access to, and this document serves as a record to remind my future self, but also for those who are looking on implementing this solution.

---

First we need to install the necessary packages and tools, and we will utilize conda (the package manager) to install and manage the packages for us.

TODO: Check if jupyter is needed to be installed for things to work and update notes.

`conda env create -f env.yaml`
`conda env update -f env.yaml`
`conda env update -f env.yaml --prune`

```yaml
# JupyterLab on srun Conda environment setup

name: JupyterLab_on_slurm
channels:
  - conda-forge
  - pytorch
  - nvidia
dependencies:
  - jupyterlab
  - nodejs
  - python[version='<3.11']
  - nb_conda_kernels  # Needed for kernels from other conda env to show up in Jupyter
  ## Additional packages you want (comment out ones you don't need):
  - conda-forge::nest-simulator  # This can take a while
  - numpy
  - matplotlib
  - seaborn
  - scipy
  - pytorch::pytorch
  - pytorch::torchvision
  - pytorch::torchaudio
  - pytorch::pytorch-cuda=11.7
  - conda-forge::ray-tune
```

## 2. Walk through of the `sbatch` script
---

Once the environment is created (via conda), we are ready to start an srun job with a slurm script.
Send the following script to Slurm.

```bash
#!/bin/bash

## Slurm script to launch JupyterLab to SSH to via srun
## Source: https://researchcomputing.princeton.edu/support/knowledge-base/jupyter#sbatch

## Please refer to your cluster's documentation for details on how to set up the
## parameters. Different clusters are configured differently to prefer users to 
## specify different parameters. 

#SBATCH --job-name=jovian
#SBATCH --partition=math-alderaan
#SBATCH --time=08:00:00
#SBATCH --ntasks=32
#SBATCH --output=slurm_jovian.out


## Get info for tunneling
node=$(hostname -s)      # Gather the short hostname
node_ip=$(hostname -i)   # Gather the host IP
user=$(whoami)           # Gather the username

## Variables here differ from different clusters
cluster="math-alderaan"  # Specify the cluster node
port=8889                # Specify the port to use - Make sure it is unused!

## Print tunneling instructions into the slurm output
echo -e "
################################################################################
Command to create SSH tunnel from local machine:
  ssh -N -f -L ${port}:${node}:${port} ${user}@${cluster}.ucdenver.pvt

Use a Browser on your local machine to go to:
  localhost:${port}  (prefix w/ https:// if using password)

node name: ${node}
node ip: ${node_ip}
################################################################################
"

## Load modules / Activate conda environment
source ~/.bashrc
conda activate JupyterLab_on_slurm

## Run JupyterLab
jupyter-lab --no-browser --port=${port} --ip=${node_ip}
```

## 3. Running the script and SSH tunneling
---

Open the slurm output to find the section of code to SSH tunnel into the slurm job.
```bash
cat $(ls | grep slurm_jovian.out) | grep -A 9 -B 2 ssh
cat $(ls | grep slurm_jovian.out) | grep 127.0.0.1
```

Follow the instruction!


## 4. Terminate the Slurm job and close the tunnel
---

Find the SSH tunnel and close it
```bash
## On the local machine
## src: https://superuser.com/questions/87014/how-do-i-remove-an-ssh-forwarded-port

ps aux | grep ssh  # Find all the ssh related processes
kill -9 <pid>      # Replace <pid> with the found process ID
```

Cancel the Slurm job
```bash
squeue --me  # Find the job ID
scancel <job_id>
```