<img src='img/anaconda-logo.png' align='left' style="padding:10px">
<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*

# Accelerate FFT Convolution with PySpark

## Table of Contents
* [Distributed FFT convolution (with PySpark)](#Distributed-FFT-convolution-%28with-PySpark%29)
	* [GPU FFT Convolvution Code](#GPU-FFT-Convolvution-Code)
* [Using PySpark](#Using-PySpark)
	* [Apply PySpark](#Apply-PySpark)
		* [Note on multiGPU usage](#Note-on-multiGPU-usage)


# Distributed FFT convolution (with PySpark)

**Setup PySpark in Standalone mode**

Start master node:

```bash
start-master.sh
```

Start workers:

```bash
export PYTHONHASHSEED=0   # for python 3 to work
start-slave.sh spark://hostname:7077   # find hostname from the log of master
```

Start notebook:

```bash
export PYSPARK_PYTHON=`which ipython`
export IPYTHON_OPTS="notebook --no-browser --ip=<outgoing ip>" 
pyspark --master=spark://hostname:7077
```

## GPU FFT Convolvution Code

The following code are the same from earlier lesson on FFT convolution

In [None]:
from __future__ import division, print_function

import sys

import numpy as np
from scipy.signal import fftconvolve
from scipy.misc import imresize
import skimage.data
from skimage.color import rgb2gray
from matplotlib import pyplot as plt
from numba import cuda, vectorize
from timeit import default_timer as timer

%matplotlib inline

In [None]:
# Build 5x5 laplacian filter
laplacian_pts = '''
-4 -1 0 -1 -4
-1  2 3  2 -1
 0  3 4  3  0
-1  2 3  2 -1
-4 -1 0 -1 -4
'''.split()

laplacian = np.array(laplacian_pts, dtype=np.float32).reshape(5, 5)

In [None]:
import accelerate.cuda.fft as cufft


@vectorize(['complex64(complex64, complex64)'], target='cuda')
def gpu_mult(a, b):
    # a GPU ufunc to compute the elementwise product 
    return a * b

def gpu_fftconvolve(image):
    image_complex = image.astype(np.complex64)
    response_complex = np.zeros_like(image_complex)
    response_complex[:5, :5] = laplacian.astype(np.complex64)
    
    # explicit CPU->GPU memory transfer
    d_image_complex = cuda.to_device(image_complex)
    d_response_complex = cuda.to_device(response_complex)

    # GPU forward FFT
    cufft.fft_inplace(d_image_complex)
    cufft.fft_inplace(d_response_complex)

    # GPU ufunc
    gpu_mult(d_image_complex, d_response_complex, out=d_image_complex)

    # GPU inverse FFT
    cufft.ifft_inplace(d_image_complex)

    # explicit GPU->CPU memory transfer
    cvimage_gpu = d_image_complex.copy_to_host().real
    return cvimage_gpu

# Using PySpark

Function to generate random images

In [None]:
def generate_image(size):
    return skimage.data.binary_blobs(length=size).astype(np.float32)

view the sample image

In [None]:
im = generate_image(512)
plt.figure(figsize=(8,8))
plt.imshow(im, cmap=plt.cm.gray)

Test our GPU FFT convolve function

In [None]:
out = gpu_fftconvolve(im)

In [None]:
plt.figure(figsize=(8,8))
plt.imshow(out, cmap=plt.cm.gray)

## Apply PySpark

In the notebook environment, the Spark Context is available as `sc`.

In [None]:
sc

Generate 10 images

In [None]:
images = [generate_image(size=512) for _ in range(10)]

Send our data to the cluster

In [None]:
rdd_images = sc.parallelize(images)

Apply our GPU FFT convolution function on the loaded images.

The function references GPU ufuncs and cuFFT functions.  The jit-compiled GPU ufuncs can be seralized and transfer to the worker node, where it will be deserialized and finalized to machine code.

In [None]:
rdd_convolved = rdd_images.map(gpu_fftconvolve)

So far, no computation has occurred yet. RDD computes lazily.  By calling collect, we trigger the computation and gather the result back.

In [None]:
convolved = rdd_convolved.collect()

In [None]:
plt.figure(figsize=(8,8))
plt.imshow(convolved[0], cmap=plt.cm.gray)

In [None]:
plt.figure(figsize=(8,8))
plt.imshow(convolved[1], cmap=plt.cm.gray)

### Note on multiGPU usage

Similar to the Dask Distributed version, it is possible to assign specific GPU to each worker using the ``CUDA_VISIBLE_DEVICES`` environment variable when launching the workers.

---
*Copyright Continuum 2012-2016 All Rights Reserved.*