# License
    IPython notebook for running a trivial OpenCL program
    Copyright (C) 2015 Andre.Brodtkorb@ifi.uio.no

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

# Getting started

To run this notebook, you need to have a reasonably new computer (so that it supports OpenCL), ipython notebook, numpy, and some other dependencies. Instaling these and getting started is detailed below.


##Windows
The first thing we need, is python. I usually use Enthought Python, as it has a set of pre-compiled packages readily available. Other alternatives include PythonXY.

I strongly recommend using 32-bit python and dependencies on Windows, as many packages are not readily available in 64-bit

After having installed a basic python distriution (including numpy etc.), the next thing we need is OpenCL for Python. Download from http://www.lfd.uci.edu/~gohlke/pythonlibs/ and install using
```
pip install <filename>
```
on the command line. Make sure that you install the correct version (again, that means 32-bit on windows)

We also need the OpenCL driver to be installed. Download and install from https://software.intel.com/en-us/articles/opencl-drivers, or from NVIDIA or AMD if you want to run it on your GPU.

##Ubuntu 14.04
On Ubuntu 14.04, most of our prerequisites can be installed using apt-get install:
```
sudo apt-get install
ipython
ipython-notebook
python-numpy
python-pyopencl
```
In addition, you may need an OpenCL driver (depending on your hardware: python-pyopencl installs the NVIDIA driver by default). For running on an Intel CPU, download from https://software.intel.com/en-us/articles/opencl-drivers and install. 

##Ubuntu 14.04 in a Virtualbox
Follow the instructions from Ubuntu 14.04. 
In addition, you need to magically enable SSE 4.1 and 4.2 in your virtualbox image by issuing
```
VBoxManage setextradata <vbox-image-filename> VBoxInternal/CPUM/SSE4.1 1
VBoxManage setextradata <vbox-image-filename> VBoxInternal/CPUM/SSE4.2 1
```
whilst your virtual machine is powered down.

In [1]:
#Lets have matplotlib "inline"
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

#Lets have opencl ipython integration enabled
%load_ext pyopencl.ipython_ext

#Import packages we need
import numpy as np
import pyopencl as cl
import os
from matplotlib import animation, rc
from matplotlib import pyplot as plt

#Set large figure sizes
rc('figure', figsize=(16.0, 12.0))
rc('animation', html='html5')

In [2]:
#Setup easier to use compilation of OpenCL
os.environ["PYOPENCL_COMPILER_OUTPUT"] = "1"
os.environ["PYOPENCL_CTX"] = "0"
os.environ["CUDA_CACHE_DISABLE"] = "1"

In [3]:
#Create OpenCL context
cl_ctx = cl.create_some_context()

#Create an OpenCL command queue
cl_queue = cl.CommandQueue(cl_ctx)

In [4]:
%%cl_kernel 
__kernel void add_kernel(__global const float *a, __global const float *b, __global float *c) {
  int gid = get_global_id(0);
  c[gid] = a[gid] + b[gid];
}

In [5]:
def opencl_add(a, b):
    #Make sure that the data is single precision floating point
    assert(np.issubdtype(a.dtype, np.float32))
    assert(np.issubdtype(b.dtype, np.float32))

    #Check that they have the same length
    assert(a.shape == b.shape)

    #Upload data to the device
    mf = cl.mem_flags
    a_g = cl.Buffer(cl_ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
    b_g = cl.Buffer(cl_ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b)

    #Allocate output data
    c_g = cl.Buffer(cl_ctx, mf.WRITE_ONLY, a.nbytes)

    #Execute program on device
    add_kernel(cl_queue, a.shape, None, a_g, b_g, c_g)

    #Allocate data on the host for result
    c = np.empty_like(a)

    #Copy data from device to host
    cl.enqueue_copy(cl_queue, c, c_g)

    #Return result
    return c

In [6]:
#Create test input data
a = np.random.rand(50000).astype(np.float32)
b = np.random.rand(50000).astype(np.float32)

#Add using OpenCL
c = opencl_add(a, b)

#Compute reference using Numpy
c_ref = a + b

#Print result
print("C   = ", c)
print("Ref = ", c_ref)
print("Sad = ", np.sum(np.abs(c - c_ref)))

C   =  [ 0.45644426  1.09116375  1.12599587 ...,  1.05299234  1.25434113
  1.08579791]
Ref =  [ 0.45644426  1.09116375  1.12599587 ...,  1.05299234  1.25434113
  1.08579791]
Sad =  0.0
