<a href="https://www.nvidia.com/dli"> <img src="images/DLI Header.png" alt="Header" style="width: 400px;"/> </a>

### Accelerate Neural Network Calculations

The following is a simple version of performing some work needed to create a hidden layer in a neural network. It normalizes a million grayscale values (simply created randomly here), weighs them, and applies an activation function.

The task is to move this work to the GPU using the NVIDIA CUDA accelerated computing techniques, retain the correctness of the calculations, and improve the performance of the function calls , which, according to the `timeit` magic, currently take about *50 ms*, to run instead under *5 ms*.

Here are some hints used in this work:
To accelerate this on the GPU, first transfer the data from CPU to device (GPU). This requres that variables be instantiated in the device. Then use @vectorize decorator while defining the data type, and giving a type signature (target == 'cuda'). Also, all math functions should be scalar functions from math module, as against using numpy functions.

### First I define and run the neural network operations on the CPU as follows ....

In [2]:
import numpy as np

In [3]:
# Do not modify this cell, these are the values that you will be assessed against.
n = 1000000

greyscales = np.floor(np.random.uniform(0, 255, n).astype(np.float32))
weights = np.random.normal(.5, .1, n).astype(np.float32)

The cell immediately below is used to import libraries, define data structures, and define functions.

In [4]:
from numpy import exp

# Consider modifying the 3 values in this cell to optimize host <-> device memory movement
normalized = np.empty_like(greyscales)
weighted = np.empty_like(greyscales)
activated = np.empty_like(greyscales)

# Modify these 3 function calls to run on the GPU
def normalize(grayscales):
    return grayscales / 255

def weigh(values, weights):
    return values * weights
        
def activate(values):
    return ( np.exp(values) - np.exp(-values) ) / ( np.exp(values) + np.exp(-values) )

Run the function calls and time it to see how long it takes to execute.

In [5]:
%%timeit
# Feel free to modify the 3 function calls in this cell
normalized = normalize(greyscales)
weighted = weigh(normalized, weights)
SOLUTION = activate(weighted)

47.9 ms ± 149 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
This ran in approximately 50 ms.
Now, applying CUDA acceleration on the GPU

### Now redefine and run the neural network operations on the GPU as follows ....

In [6]:

#Import needed modules
from numba import cuda
from numba import vectorize
import math

#Transfer data to devoce
grey_dev = cuda.to_device(greyscales)
weights_dev = cuda.to_device(weights)

#State variables outputs on the device following given array shape
normalized = cuda.device_array_like(grey_dev)
weighted = cuda.device_array_like(weights_dev)
SOLUTION = cuda.device_array_like(grey_dev)


# Modify the 3 function calls to run on the GPU using appropriate decorators

#Normalize inputs
@vectorize(['float32(float32)'],target = 'cuda')
def normalize(grayscales):
    return grayscales / 255

#Updating the weights
@vectorize(['float32(float32,float32)'],target = 'cuda')
def weigh(values, weights):
    return values * weights

#Activation function
@vectorize(['float32(float32)'],target = 'cuda')       
def activate(values):
    return ( math.exp(values) - math.exp(-values) ) / ( math.exp(values) + math.exp(-values) )

Use the above functions, data, and variables to execute the commnads below that will perform a typical neural network operation including normalizing input data, updating weights, and applying an activation function. The result is stored in a SOLUTION variable.

In [9]:
%%timeit
# Modifying the 3 function calls to include output variables, while using variables set up on the device as inputs
normalize(grey_dev, out = normalized)
weigh(normalized, weights_dev, out = weighted)
SOLUTION = activate(weighted)

958 µs ± 162 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
### Wow!!! That ran in less than 1 ms. Thats > 1000% imporvement. Incredible.

<a href="https://www.nvidia.com/dli"> <img src="images/DLI Header.png" alt="Header" style="width: 400px;"/> </a>