# Computing Clusters

**CS5483 Data Warehousing and Data Mining**
$\def\abs#1{\left\lvert #1 \right\rvert}
\def\Set#1{\left\{ #1 \right\}}
\def\mc#1{\mathcal{#1}}
\def\M#1{\boldsymbol{#1}}
\def\R#1{\mathsf{#1}}
\def\RM#1{\boldsymbol{\mathsf{#1}}}
\def\op#1{\operatorname{#1}}
\def\E{\op{E}}
\def\d{\mathrm{\mathstrut d}}
$
___

This notebook gives a brief instruction on how to use CSLab computing clusters. For more information about the clusters, please refer to
[HTCC](https://cslab.cs.cityu.edu.hk/services/high-throughput-computing-cluster-htcc), 
[HTGC1](https://cslab.cs.cityu.edu.hk/services/high-throughput-computing-cluster-htcc), 
[HTGC2](https://cslab.cs.cityu.edu.hk/services/high-throughput-computing-cluster-htcc), 
[HTGC3](https://cslab.cs.cityu.edu.hk/services/high-throughput-computing-cluster-htcc).

## Prerequisites

You would need:
1. A valid CSLab UNIX account 
2. A SSH client (**Visual Studio Code** is highly recommended)
3. A VPN to connect to CS network if not physically in the Labs  
  <https://cslab.cs.cityu.edu.hk/services/cslab-vpn-sonicwall>

## Connecting to the clusters

### Using a terminal

Run in a termial (hostname depends on which cluster you want to use):
```shell
ssh your_eid@htcc1.cs.cityu.edu.hk
```

### Using VS Code

You can access the clusters using the VS Code
- installed locally on your computer or 
- in Xpra remote desktop.

VS Code from JupyterHub does not support this yet.

 1. Install **Remote - SSH** extension
 2. Press ```F1``` and run the ```Remote-SSH: Connect to SSH Host...``` command
 3. Enter ```your_eid@htcc1.cs.cityu.edu.hk```
 
See [Remote Development using SSH](https://code.visualstudio.com/docs/remote/ssh) for more details.

## Preparing conda environment

 1. Connect to any host that can access to your home dictionary, including
     - htcc1.cs.cityu.edu.hk
     - htgc1.cs.cityu.edu.hk
     - htgc2.cs.cityu.edu.hk
     - htgc3.cs.cityu.edu.hk
     - ltjh.cs.cityu.edu.hk
     - gateway.cs.cityu.edu.hk
 2. Open [Anaconda](https://www.anaconda.com/products/individual) webpage.
 3. Find and copy the download link of `Linux 64-Bit (x86) installer`.
 4. Download the installer to your dictionary by running 
 ```shell
 wget the_link_you_copied
 ```
 5. Install Anaconda by running (run ```ls``` to see the file name)
 ```shell
 sh the_file_you_downloaded
 ```
 6. Try launch the Python interpreter installed by Anaconda by running
 ```shell
 /anaconda3/bin/python
 ``` 
 press ```Ctrl``` + ```D``` to escape the interpreter
 
 7. If everything works, you can use conda to install any packages you need such as TensorFlow, PyTorch, etc.

## Job submission

1. Prepare a job submission file **run_hello.condor**. A minimal example is 
```shell
eid             = your_eid
name            = job_name_you_want
environment     = PYTHONHOME=/home/grads/$(eid)/anaconda3/
# CUDA device name depends on which node you are using
# check CSLab web page to find the names
requirements = (CUDADeviceName == "GeForce RTX 2080 Ti") 
executable      = /home/grads/$(eid)/anaconda3/bin/python
arguments       = /home/grads/$(eid)/hello.py --para1 123
error           = $(name).err
log             = $(name).log
queue
```

2. Submit the job by running `condor_submit run_hello.condor`.
3. Check the job status by running `condor_q`.
4. If your job completes much faster than you expected, something may be wrong. You need to check the `err` file.
5. If you want to remove a job, run `condor_rm your_eid`.
6. A user can launch five jobs concurrently by calling `queue` multiple times. For example:
```shell
eid             = your_eid
name            = job1_name_you_want
environment     = PYTHONHOME=/home/grads/$(eid)/anaconda3/
# CUDA device name depends on which node you are using
# check CSLab web page to find the names
requirements = (CUDADeviceName == "GeForce RTX 2080 Ti") 
executable      = /home/grads/$(eid)/anaconda3/bin/python
arguments       = /home/grads/$(eid)/hello1.py
error           = $(name).err
log             = $(name).log
queue
```

```shell
eid             = your_eid
name            = job2_name_you_want
environment     = PYTHONHOME=/home/grads/$(eid)/anaconda3/
# CUDA device name depends on which node you are using
# check CSLab web page to find the names
requirements = (CUDADeviceName == "GeForce RTX 2080 Ti") 
executable      = /home/grads/$(eid)/anaconda3/bin/python
arguments       = /home/grads/$(eid)/hello2.py
error           = $(name).err
log             = $(name).log
queue
```

## More details

Additional instructions and sample condor demo files can be found at `/public/condor_demo` of the servers.