Skip to content

Commit

Permalink
Update doc with instructions for using new gpu backend
Browse files Browse the repository at this point in the history
  • Loading branch information
slefrancois committed May 11, 2016
1 parent 319382b commit bd54467
Show file tree
Hide file tree
Showing 10 changed files with 343 additions and 573 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints
.pydevproject
.ropeproject
core
4 changes: 2 additions & 2 deletions doc/extending/extending_theano.txt
Expand Up @@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops
^^^^^^^^^^^^^^^

Ops to be executed on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
When using the old GPU backend, Ops to be executed on the GPU should inherit
from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU.

Expand Down
14 changes: 8 additions & 6 deletions doc/install.txt
Expand Up @@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add

If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.

See :ref:`libdoc_config` for more information on how to change these
configuration options.
Expand Down Expand Up @@ -508,25 +508,25 @@ Any one of them is enough.

:ref:`Ubuntu instructions <install_ubuntu_gpu>`.


Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.

Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section:

.. code-block:: cfg

[global]
device = gpu
device = cuda
floatX = float32

Note that:

* If your computer has multiple GPUs and you use 'device=gpu', the driver
* If your computer has multiple GPUs and you use 'device=cuda', the driver
selects the one to use (usually gpu0).
* You can use the program nvida-smi to change this policy.
* You can choose one specific GPU by specifying 'device=gpuX', with X the
* You can choose one specific GPU by specifying 'device=cudaX', with X the
the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU.
Expand Down Expand Up @@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.

Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.

Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly.

Expand Down
22 changes: 11 additions & 11 deletions doc/install_ubuntu.txt
Expand Up @@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:

sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano

On 14.04, this will install Python 2 by default. If you want to use Python 3:

.. code-block:: bash
Expand Down Expand Up @@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it.


Bleeding Edge Installs
----------------------

If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:

.. code-block:: bash

git clone git://github.com/Theano/Theano.git
cd Theano
cd Theano
python setup.py develop --user
cd ..

VirtualEnv
----------
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up

If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.

.. code-block:: bash

virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate
pip install Theano
Expand Down Expand Up @@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run:

.. code-block:: bash

git pull


Expand Down Expand Up @@ -303,7 +303,7 @@ Test GPU configuration

.. code-block:: bash

THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py

.. note::

Expand Down
12 changes: 7 additions & 5 deletions doc/install_windows.txt
Expand Up @@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))

.. testoutput::
:hide:
:options: +ELLIPSIS

NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ...

.. code-block:: none

NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000

Expand All @@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use
############################

Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.

Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the
Expand All @@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg

[global]
device = gpu
device = cuda
floatX = float32

[nvcc]
Expand Down Expand Up @@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~

If you installed Python through WinPython or EPD, Theano will automatically
If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS.

.. note::
Expand Down
2 changes: 1 addition & 1 deletion doc/optimizations.txt
Expand Up @@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ =============
:term:`merge` x x
:term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
Expand All @@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x
:term:`inplace_random` x
:term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x
:term:`local_remove_all_assert`
========================================================= ========= ============ =============
Expand Down
47 changes: 0 additions & 47 deletions doc/tutorial/aliasing.txt
Expand Up @@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the
graph.

For GPU graphs, this borrowing can have a major speed impact. See the following code:

.. code-block:: python

from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time

vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

Which produces this output:

.. code-block:: none

$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu

*Take home message:*

When an input *x* to a function is not needed after the function
Expand All @@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y,
borrow=True)``.

0 comments on commit bd54467

Please sign in to comment.