From Zero to Lasagne on Ubuntu 14.04

Enrico Ferrero edited this page Apr 13, 2016 · 2 revisions

From Zero to Lasagne on Ubuntu 14.04

This guide provides step-by-step instructions to get Lasagne up and running on Ubuntu 14.04 (and possibly others), including BLAS and CUDA support.

If you run into trouble or have any suggestions for improvements, please let us know on the mailing list: https://groups.google.com/forum/#!forum/lasagne-users
Also let us know if you successfully used or adapted the steps for other versions of Ubuntu.

Basics

This installs the bare minimum needed: A compiler, pip, numpy and scipy, Theano and Lasagne.

Prerequisites

Installing the prerequisites is fairly easy. Open a terminal and run:

sudo apt-get install -y gcc g++ gfortran build-essential git wget libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy

Theano and Lasagne

Still in a terminal, run:

pip install --user --upgrade --no-deps https://github.com/Theano/Theano/archive/master.zip
pip install --user --upgrade --no-deps https://github.com/Lasagne/Lasagne/archive/master.zip

This will install the bleeding-edge versions of Theano and Lasagne for your user. Whenever you want to update to the latest version, just run the two commands again.

Testing

To test your installation, download Lasagne's MNIST example. We assume you want to put it into your home directory, in the subdirectory code/mnist. Still in a terminal, run:

mkdir -p ~/code/mnist
cd ~/code/mnist
wget https://github.com/Lasagne/Lasagne/raw/master/examples/mnist.py
python mnist.py mlp 5

If everything was installed correctly, this will download the MNIST dataset (11 MiB) and train a simple network on it for 5 epochs. Including compilation, this should take between 1 and 5 minutes.

BLAS

BLAS (Basic Linear Algebra Subprograms) is a specification for a set of linear algebra building blocks that many other libraries depend on, including numpy and Theano. Several vendors and open-source projects provide optimized implementations of these routines. The installation instructions above already install a precompiled version of OpenBLAS that should be usable by Theano.

Testing

To test whether Theano can use OpenBLAS, download Theano's BLAS check and run it. We assume you want to put it in a temporary directory. Still in a terminal, run:

cd /tmp
wget https://github.com/Theano/Theano/raw/master/theano/misc/check_blas.py
python check_blas.py

If everything works correctly, near the end of the output, it should say:

Total execution time: 31.37s on CPU (with direct Theano binding to blas).

The execution time may be very different, the important point is direct Theano binding to blas.

Self-compiled BLAS

For improved performance, you may want to compile OpenBLAS for your specific CPU architecture. The easiest way to do so is to use the source package of Ubuntu's OpenBLAS. The following commands should do the trick:

# create a directory to work in
cd /tmp
mkdir OpenBLAS
cd OpenBLAS
# obtain the source code and install tools needed to build it
apt-get source openblas
sudo apt-get build-dep openblas
sudo apt-get install build-essential dpkg-dev cdbs devscripts patch
# change some configuration options
cd openblas-*
nano -w Makefile.rule   # uncomment "NO_WARMUP = 1" and "NO_AFFINITY = 1"
# compile (will take about one minute)
fakeroot debian/rules custom
# if compilation went through, you can install the new OpenBLAS
sudo dpkg -i ../libopenblas-base*.deb
# and set it up as usual
sudo update-alternatives --config libblas.so.3gf
sudo update-alternatives --config liblapack.so.3gf 

Run the test again, as described in the previous section. Most likely, performance will have improved.

For even better performance, you may want to try compiling OpenBLAS from the original source as described on http://www.openblas.net/. (Please feel free to extend this guide accordingly.)

If compilation fails, it's possible that your CPU architecture is newer than what Ubuntu's OpenBLAS supports. The easiest solution in this case is to specify an alternative architecture manually. To do so, you would need to edit another file:

nano -w debian/rules

Where it says LANG=C debian/rules TARGET=custom build binary, replace custom with one of the architectures in the TargetList.txt file, the latest one that your CPU supports. Then run the compilation again (fakeroot debian/rules custom) and continue from there.

Nvidia GPU support (CUDA)

To be able to train networks on an Nvidia GPU using CUDA, we will need to install the proprietary Nvidia driver and CUDA and adapt some configuration files.

Prerequisites

First we need to install another prerequisite:

sudo apt-get install linux-headers-generic

Without this, the driver module cannot be compiled.

Driver and CUDA

At https://developer.nvidia.com/cuda-downloads, choose the download for Linux > x86_64 > Ubuntu > 14.04 > deb (network). Save it somewhere locally, we assume /tmp/cuda.deb, and install it from a terminal:

sudo dpkg -i /tmp/cuda.deb

It is important to understand that this has not installed anything yet, it just added Nvidia's package repository to Ubuntu's repository list. Run the following command to update Ubuntu's package database with the new packages available:

sudo apt-get update

Again, this didn't install anything.
There are two options how to proceed from here.

Option A: Install everything at once

If you don't care, you can just install CUDA along with the examples and the latest driver using:

sudo apt-get install cuda

Option B: Install driver and toolkit separately

To have better control about what's installed when, you can use the repository to only install the latest driver. Run the following in a terminal to see all available Nvidia driver versions:

aptitude search nvidia-3 -F'%p' | grep -vF ':i386'

For example, this may produce:

...
nvidia-346
nvidia-346-dev
nvidia-346-updates
nvidia-346-updates-dev
nvidia-346-updates-uvm
nvidia-346-uvm
nvidia-352
nvidia-352-dev
nvidia-352-updates
nvidia-352-updates-dev
nvidia-352-uvm

Usually, you will want to install the latest driver. You need both the normal and the "uvm" version. Here, it would be:

sudo apt-get install nvidia-352 nvidia-352-uvm

Now install the toolkit via the runfile from https://developer.nvidia.com/cuda-downloads, at Linux > x86_64 > Ubuntu > 14.04 > runfile (local). When it asks where to install the toolkit, accept the default location (/usr/local/cuda-x.y) and let it create the symlink (/usr/local/cuda). You do not need to install the samples, and you must not install the driver.

Note: When installing the CUDA toolkit via the runfile, never install the driver from the runfile. Always install it via apt-get as explained before, so the package manager knows. In the worst case, you may end up with an unbootable system otherwise!

Configuration

Independently of whether you chose Option A or Option B above, there are a few configuration files we need to create or adapt now.

To make the CUDA compiler available to all users, adapt /etc/environment:

nano -w /etc/environment

Add :/usr/local/cuda/bin to the end of the list. If instead you want to make it available for the current user only, add export PATH=/usr/local/cuda/bin:"$PATH" to the end of your ~/.profile file.

To make the libraries available to all users, run the following:

sudo sh -c "echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

If instead you want to make it available for the current user only, add export LD_LIBRARY_PATH=/usr/local/cuda/lib64:"$LD_LIBRARY_PATH" to the end of your ~/.profile file.

Finally, if you haven't done so since the driver installation, you will need to reboot your machine and cross fingers. This is required after every driver update, otherwise CUDA will stop working -- take care when you're maintaining a GPU server.

Testing

For a first sanity check, run nvidia-smi from a terminal. It should display information about all supported GPU devices.

To also try CUDA, compile a simple test program.

cd /tmp
wget https://gist.github.com/f0k/0d6431e3faa60bffc788f8b4daa029b1/raw/2e37a83a97f5df27e53326ec16879fcbd94850bf/cuda_check.c
nvcc -o cuda_check cuda_check.c -lcuda

If everything worked, you've got a program you can run now:

./cuda_check

It will produce output similar to the following:

Found 2 device(s).
Device: 0
  Name: GeForce GTX 970
  Compute Capability: 5.2
  Multiprocessors: 13
  CUDA Cores: 2496
  Concurrent threads: 26624
  GPU clock: 1329 MHz
  Memory clock: 3505 MHz
  Total Memory: 4093 MiB
  Free Memory: 4014 MiB
Device: 1
  Name: Tesla K40c
  Compute Capability: 3.5
  Multiprocessors: 15
  CUDA Cores: 2880
  Concurrent threads: 30720
  GPU clock: 875.5 MHz
  Memory clock: 3004 MHz
  Total Memory: 11519 MiB
  Free Memory: 11421 MiB

If it doesn't, the driver may not have been loaded properly. This can often be fixed by running any CUDA program as root, such as the one we just compiled:

sudo ./cuda_check

Afterwards it should also work as a normal user. To do this automatically on every boot, do the following:

sudo cp -a cuda_check /root
sudo sh -c "echo '@reboot root /root/cuda_check' > /etc/cron.d/cuda"

GPU boost

More recent GPUs support a boosting mode with increased core frequency. It can be enabled as follows:

sudo nvidia-smi -i 0 -pm 1  # set persistence mode
sudo nvidia-smi -i 0 -ac <mem>,<core>  # set gpu boost

Where 0 is the device number, and <mem>,<core> is a pair of frequencies to set (e.g., 3004,875 for a Tesla K40c). The highest supported frequencies for a device can be listed with:

nvidia-smi -i 0 --query-supported-clocks=mem --format=csv,nounits | head -n2
nvidia-smi -i 0 --query-supported-clocks=gr --format=csv,nounits | head -n2

To enable boost automatically on every boot, place a shell script executing the correct commands in /root/gpu_boost.sh and add it to /etc/cron.d/cuda as shown in the previous section.

Adapt Theano configuration

To make Theano use CUDA automatically for all your scripts, all that's left to do is to create a configuration file in your home directory:

echo "[global]
device=gpu
floatX=float32" > ~/.theanorc

Testing

Again, we will use Theano's BLAS check to test the installation:

cd /tmp
wget -N https://github.com/Theano/Theano/raw/master/theano/misc/check_blas.py
python check_blas.py

Near the end, it should say:

Total execution time: 0.61s on GPU.

Again, the execution time varies wildly depending on the GPU (it could be over 10 seconds), the critical part is on GPU.

cuDNN

Finally, for improved performance especially for ConvNets, you should install Nvidia's cuDNN library. After registering at https://developer.nvidia.com/cudnn, you can download it from https://developer.nvidia.com/rdp/cudnn-download.

You will obtain a .tar.gz file. Extract it directly into your CUDA installation:

cd /usr/local
tar -xzf <your-downloaded-file>.tar.gz

And update the shared library cache:

sudo ldconfig

Theano and Lasagne should now be able to use cuDNN. To check, run:

python -c "from theano.sandbox.cuda.dnn import dnn_available as da; print(da() or da.msg)"

If everything is configured correctly, it will say something like:

Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, CuDNN 4007)
True

Otherwise you will receive an error message you can search for online.

For somewhat improved performance, you can adapt your .theanorc file to include some additional flags:

nano -w ~/.theanorc

Append the following lines:

[dnn.conv]
algo_fwd = time_once
algo_bwd_data = time_once
algo_bwd_filter = time_once

[lib]
cnmem = .45

For more configuration options, consult the Theano documentation.