# How to remotely connect to Dice and GPU cluster?

Before starting, we have three types of server as follows. 
    1. Local sever (your labtop)
    2. Dice server (+ shared server)
    3. GPU server
    
### 1. From local to Dice

You can connect to the Dice machine on you local server using this command.

Following this command, you should type your dice password once. 
And then, you should log in to shared server on Dice server with next command. After that, you can activate your mlp virtual environment. If you successfully connected to shared server, you can check that the name of server changed. (e.g. ashbury:~$ )

### 2. Using Jupyter Notebook on your Local server

Since jupyter notebook is run via firefox, you should use a specific command line to open jupyter notebook. 

When you type the command above, you will be able to know local host number through NotebookApp.

*Then come back to your local server and open another terminal.* In order to open jupyter notbook on your local firefox, you can use a command below.

You might be confused about the long command...
My example is as follows.



[Any_localhost_number_you_want] = 8880

[Dice_server_localhost_number] = 8888 (This is the number informed at last step.)

[name_of_server] = ashbury


Finally, you can open jupyter notebook on your local firefox via address "localhost:[Any_localhost_number_you_want]". (Type this line at address bar.)

At first, you might need to set a password for jupyter notbook because of security problem.

### 3. From Dice to GPU server

*Let's comback to dice server terminal.*
You can connect to GPU server using one of both command lines. 

Now you're on gpu server(=gpu cluster) and need to set gpu environment once following commands below.

During installation, you should type "yes" for first question and press ENTER for next question. 

Then you should follow commands below to finish setup.

If you want to use both servers, mlp1 and mlp2, you should set gpu environment on every two servers. 

Whenever you login to the cluster, remember to run:

Install scipy, sklearn and matplotlib.

Copy your data and code up to the cluster. To do this, simply use the following command, adding data_directory, cluster_number (1 or 2, whichever you are set up on) and student_id.

In [None]:
rsync -r <data_directory> mlp<cluster_number>:/home/<student_id>
rsync -r <code_directory> mlp<cluster_number>:/home/<student_id>

Now you successfully set your gpu environment. Before running experiments, create the necessary directories with the following script, passing in the start and end numbers for your experiments (see --model_dir column of our google spreadsheet to see which experiment numbers you've been assigned.

In [None]:
#!/bin/bash
START=$1
END=$2
for ((i=$START; i<=$END; i++)); do
   mkdir -p experiments/experiment"$i"/err
done

Now, because of the way slurm works, we need to generate a new script for every setting of the hyperparams (you'd think you could just pass new hyperparams as an argument to the same script, but I (ben) couldn't get this to work. Neither could Luke in my office, and he seems to know what he is doing.

So, below is a bash script that generates bash scripts that we can give to slurm. You need to pass it two arguments:
* a text file where each new line is a python command copied from the google sheet
* the experiment number (see google sheet) of the first python command.

Note: the script below assumes that your python commands correspond to consecutive experiment numbers.

now you have a directory called 'train_scripts' containing the scripts we need to pass to slurm. We can pass them all in one go with the following script (which automatically moves them into the 'complete' subfolder afterwards, so that we don't accidentally run them again later.

In [None]:
#!/bin/bash
for filename in ~/train_scripts/*.sh; do
   sbatch $filename
   mv $filename ~/train_scripts/complete
done

To check your jobs, run:

But if you don't want your job on gpu cluster to be automatically cancelled when exiting ssh, you can use nohup and longjob.

Once your experiments have completed, you want to be able to quickly check the results and paste the final AUCs into the google sheet. Here is a script that helps you do that. It does 3 things:

* creates a results.txt file containing the output of the final 20 epochs of each experiment you ran. It automatically reorders the output so that training outputs are listed first, and then validation ouputs are listed (rather than continually alternating making it hard to read).

* creates a final_train_auc.txt that contains the final AUC score for each experiment on the training data. this can be directly copied and pasted into the corresponding column in the google doc.

* creates a final_val_auc.txt that does the same as above, but for validation sets.

In [None]:
#!/bin/bash

START=$1
END=$2

# reset files
> results.txt
> final_train_auc.txt
> final_val_auc.txt

for ((i=$START; i<=$END; i++)); do
   echo '=====================================================================
                   EXPERIMENT' "$i" '
=====================================================================' >> results.txt
   tail -n 41 experiments/experiment"$i"/out_file | awk 'NR % 2 == 1' >> results.txt
   echo '---------------------------------------------------------------------' >> results.txt
   tail -n 41 experiments/experiment"$i"/out_file | awk 'NR % 2 == 0' >> results.txt
   tail -n 3 experiments/experiment"$i"/out_file | awk 'NR % 2 == 1' | grep -Po "(?<=AUC: ).*(?= \()" >> final_train_auc.txt
   tail -n 3 experiments/experiment"$i"/out_file | awk 'NR % 2 == 0' | grep -Po "(?<=AUC: ).*(?= \()" >> final_val_auc.txt
done


Finally you can copy your result from gpu server to your dice server. 