# Deep Learning - Part 0

This notebook explains how to install all the preriquistes and libraries that you will need to run the following tutorials. If you can execute all the following cells, you are good to go.

## Environment configuration

There are two major package managers in Python: pip and conda. For this tutorial we will be using conda which, besides being a package manager is also useful as a version manager. There are two main ways to install conda: [Anaconda](https://conda.io/docs/install/quick.html) and [Miniconda](https://conda.io/miniconda.html).

In order to install tensorflow we recommend following the [official documentation](https://www.tensorflow.org/install/install_linux#installing_with_anaconda). In particular, for the conda installation, they advise to use pip instead of conda as the only available Anaconda package for tensorflow is not actively mantained.

All the available tensorflow versions (for both Python 2 and 3 and with CPU and GPU support) can be found [in this link](https://www.tensorflow.org/install/install_linux#top_of_page). For this course we will be using this tensorflow version for CPU https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.10.0-cp35-cp35m-linux_x86_64.whl and this one for GPU (for nabucodonosor) https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.10.0-cp35-cp35m-linux_x86_64.whl


The commands to setup the environment are the following

```
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ conda create --name diplodatos python=3.5
$ source activate diplodatos
(diplodatos) $ conda install numpy scipy scikit-learn jupyter nb_conda keras
(diplodatos) $ pip install --ignore-installed --upgrade YOUR_TENSORFLOW_URL
(diplodatos) $ jupyter notebook
```

(note: it's quite important to install keras before tensorflow, as it overwrites the tf version)

In [1]:
import keras

print(keras.__version__)

2.2.2


## Optional libraries

These are some optional libraries to download in order to see some visualizations. They take a while, so if you don't have good Internet connection or no time you can skip them.

```
# For the fasttext embeddings
(diplodatos) $ pip install gensim
# To visualize keras graphs
(diplodatos) $ pip install pydot pydotplus
(diplodatos) $ conda install graphviz matplotlib
```

## Download the embeddings and the dataset

### 2nd class

The dataset we will use (MNIST) will be downloaded by Keras automatically the first time you use it. To save time, you can download it now running the next cell.

In [2]:
from keras.datasets import mnist
mnist.load_data();

### Assignment 1
We will use the FastText embeddings. We provide a smaller version, filtered with only the words on the movie reviews dataset. You can download it from 
https://cs.famaf.unc.edu.ar/~mteruel/datasets/diplodatos/filtered_fastext_movie_review.pickle

You can also download the original versions, if you want (most languages are available), from here https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md. The English version is about 9GB.

If you do not have the dataset, you can download and uncompress it from http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz using the cell:

In [3]:
%%bash
DIRECTORY=dataset
if [ ! -d "$DIRECTORY" ]; then
    # Control will enter here if dataset directory doesn't exist.
    wget http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
    mkdir "$DIRECTORY"
    mv review_polarity.tar.gz "$DIRECTORY"
    tar -xvf "$DIRECTORY"/review_polarity.tar.gz -C "$DIRECTORY"/
fi

## Using the server

### Tunneling and ssh

**How do you run a notebook in a remote machine?** You use an ssh connection with a port forwarding. This way, everything that goes to the port on the server machine (like a jupyter notebook) also goes to your localhost.

It is likely that everyone will be using the same ports, so we recommend you to select a random number before connecting. The port on the ssh must be the same that you use to start the notebook.

```
$ ssh -L PORT:localhost:PORT USER@SERVER
$ source activate diplodatos
(diplodatos) $ jupyter notebook --port PORT --no-browser
```

Now you can use the notebook as if it were running on your computer

### Using slurm

The Nabucodonosor server uses a queue system called slurm, which grants exclusive access to the CPU resources. You should enqueue everythin you do that takes more than 10 minutes!

#### Set up

1. Download the script https://raw.githubusercontent.com/MIREL-UNC/mirel-scripts/master/run_scripts/submit_job_slurm.sh

2. Create a logs folder

#### Enqueue things

To enqueue a job on slurm, first put your command in a file, for example command.txt
```
$ sbatch submit_job_slurm.sh commant.txt
```

The queue will assign your job a number JOBID. All the output of your process will be redirected to logs/JOBID.out and logs/JOBID.err

#### Controlling things

To see the state of the queue run `$ squeue`

To cancel a job run `$ scancel JOBID`

## Using Keras with GPUs 

If you installed tensorflow with a GPU support, now it's a good time to check if it actually detects your devices.

In [4]:
import tensorflow
print(tensorflow.__version__)

1.10.0


In [5]:
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']
get_available_gpus()

['/device:GPU:0']

If the above gives an error, try setting the environment variables. You can add this to your .bashrc, the changes are only temporary

In [6]:
%%bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/9.0/extras/CUPTI/lib64/:/opt/cuda/9.0/lib64:/opt/cudnn/v7.0/
export CUDA_HOME=/opt/cuda/9.0

### Avoid using GPUs

If all the GPUs are being used, you can still force Keras to use the CPU. For simple models this is still a very good option.

The easiest way is to run you command with CUDA_VISIBLE_DEVICES="". For example
```
(diplodatos) $ CUDA_VISIBLE_DEVICES="" jupyter notebook --no-browser
(diplodatos) $ CUDA_VISIBLE_DEVICES="" exercise_1.py --experiment_name mlp_200
```