# Introduction to GPU Programming with Python
## Questions or Exercises

### Mandelbrot Example
![](images/322px-Mandel_zoom_00_mandelbrot_set.jpeg)

The Mandelbrot set can be explained with the equation zn+1 = zn2 + c.
Images are created by applying the equation to each pixel in an iterative process, using the pixel's position in the image for the number 'c'. 

'c' is obtained by mapping the position of the pixel in the image relative to the position of the point on the complex plane. 

In our exercise the mandel function is supposed to perform the Mandelbrot set calculation for a given (x,y) position on the imaginary plane. It returns the number of iterations before the computation "escapes".

In [None]:
import numpy as np
from matplotlib.pyplot import imshow, show
from timeit import default_timer as timer
from numba import jit,cuda

In [None]:
def mandel(x, y, max_iters):
  
  c = complex(x, y)
  z = 0.0j
  for i in range(max_iters):
    z = z*z + c
    if (z.real*z.real + z.imag*z.imag) >= 4:
      return i

  return max_iters

Then we need to make a function that iterates over all the pixels in the image, computing the complex coordinates from the pixel coordinates, and calls the mandel function at each pixel. The return value of mandel is used to color the pixel.

In [None]:
#Part1 : Make a create_fractal function
def create_fractal(min_x, max_x, min_y, max_y, image, iters):
    height = image.shape[0]
    width = image.shape[1]
    
    pixel_size_x = ###
    pixel_size_y = ###
    for x in ### :
        real = 
        for y in ### :
            image = 
            color = 
            image[] = 
 

Next we create a 1024x1024 pixel image as a numpy array of bytes. We then call create_fractal with appropriate coordinates to fit the whole mandelbrot set.

In [None]:
#Part 2: Next we create an empty array, size 1024x1024, type np.uint8. Call create_fractal with appropriate coordinates 
#to fit the whole mandelbrot set. Then show the image. Measure the execution time.
image = ###
create_fractal(-2.0, 1.0, -1.0, 1.0, image, 20) 
imshow(image)
show()

In [None]:
#Part 3: Modify both mandel and create_fractal function and optimize/parallelize them with jit decorator 
#to work on the CPU


In [None]:
#Part 4: Run again and measure the execution time
image = ###
create_fractal(-2.0, 1.0, -1.0, 1.0, image, 20) 
imshow(image)
show()

In [None]:
#Part 5: Write the kernel function mandel_kernel  with numba.cuda. Also modify mandel to mandel_gpu with cuda.jit
mandel_gpu = 
def mandel_kernel(min_x, max_x, min_y, max_y, image, iters):
    height = image.shape[0]
    width = image.shape[1]
    #####

In [None]:
#Part 6: Initiate image array, create cuda grid
image = ###

In [None]:
#Part 7: Run the kernel. Also measure the execution time.

### Matrix multiplication WITH SHARED MEMORY
![](images/05-matmulshared.png)

In [None]:
import numpy as np
from numba import cuda, float32

In [None]:
#Part 3: Create a CUDA kernel with @cuda.jit decorator

# Controls threads per block and shared memory usage.
# The computation will be done on blocks of TPBxTPB elements.
TPB = 16

def fast_matmul(A, B, C):
    # Define an array in the shared memory
    # The size and type of the arrays must be known at compile time
    sA = cuda.shared.array(shape=(TPB, TPB), dtype=float32)
    sB = cuda.shared.array(shape=(TPB, TPB), dtype=float32)
    
    # Define global and thread indices
    
    # Define number of blocks per grid
    
    tmp = 0.
    for i in range(bpg):
        # Preload data into shared memory
        #####
        
        # Wait until all threads finish preloading
        
        # Computes partial product on the shared memory
        for j in range(TPB):
            #####
            
        # Wait until all threads finish computing
        
    # Put tmp into C matrix

In [None]:
#Part 1: Create matrices A,B,C as numpy arrays (size 128x128,float32). Fill A and B with random numbers.

In [None]:
#Part 2: Calculate number of blocks and threads

In [None]:
#Part 4: Call the kernel function and time it to get the execution time