# About

*by Dr [Paul Richmond](http://paulrichmond.shef.ac.uk/) (University of Cambridge)*




## Exercise 01 ##

Exercise 1 requires that we de-cipher some encrypted text. The text provided in the file [`encrypted01.bin`](./encryted01.bin) (click file to open in JupyterLab) has been encrypted by using an affine cipher. The affine cypher is a very simple type of monoalphabetic substitution cypher where each numerical character of the alphabet is encrypted using a mathematical function. The encryption function is defined as;

$E(x)=(Ax+B) mod M$

Where $A$ and $B$ are keys of the cypher, mod is the modulo operation and $A$ and $M$ are co-prime. For this exercise the value of $A$ is `15`, $B$ is `27` and $M$ is `128` (the size of the ASCII alphabet). The affine decryption function is defined as:

$D(x)= A^{-1} (x-B)  mod M$

Where $A^{-1}$ is the modular multiplicative inverse of $A modulo M$. For this exercise $A^{-1}$ has a value of `111`. 

Note: The $mod$ operation is not the same as the remainder operator (`%`) for negative numbers. A suitable $mod$ function (`modulo`) has been provided for the example. The provided function takes the form of `modulo(int a, int b)` where `a` in this case is everything left of the affine decryption functions $mod$ opertor and `b` is everything to the right of the $mod$ operator.

As each of the encrypted character values are independent we can use the GPU to decrypt them in parallel. To do this we will launch a thread for each of the encrypted character values and use a kernel function to perform the decryption. Starting from the code provided in [`exercise01.cu`](./exercise01.cu) (click file to open and edit in JupyterLab), complete the following  tasks;


### Exercise 01 Step 1

Modify the `modulo` function by adding the `__device__ `decorator so that it can be called on the device by the `affine_decrypt` kernel. Using the `__device__` decoroator will ensure that the function will be compiled as device code and will then be availble to call by CUDA Kernels and other device funtions.

Although your code wont perform any decryption at this point you can try to build it by running the following cell. Alternatively you can open a new Terminal from the JupyerLab File menu and run the command yourself.


In [2]:
!nvcc exercise01.cu -o exercise01

nvcc fatal   : Cannot find compiler 'cl.exe' in PATH


The above will compile and link the `exercise01.cu` file with the NVIDIA CUDA compiler (`nvcc`) and output (`-o`) the executable `exercise01`. At this stage the output buffer will only have junk within it but you can run the executable (in the code cell below) to confirm that it produces an output.

In [None]:
!./exercise01

### Exercise 01 Step 02

Implement the decryption kerne#@title Your Title Herel (`affine_decrypt`) for a single block of threads with an `x` dimension of `N` (`1024`). A kernel definition stub is already provided in the source file. The function has two argument. The input is provided in `d_input`. You should perform your calculation and store the result in `d_output`. You can use the inverse modulus `A`, `B` and `M` C pre-processor definitions (at the top of the source file). 


### Exercise 01 Step 03

Allocate some memory on the device for the input (`d_input`) and output (`d_output`). 


### Exercise 01 Step 04

1.4. Copy the host input values in `h_input` to the device memory `d_input`.

### Exercise 01 Step 05

Configure a single block of `N` threads and launch the `affine_decrypt` kernel.



### Exercise 01 Step 06

Copy the device output values in `d_output` to the host memory `h_output`.


### Exercise 01 Step 07 

Compile and execute your program. If you have performed the exercise correctly, you should decrypt the text.

### Exercise 01 Step 08

Don’t go running off through the forest just yet! Modify your code to complete the `affine_decrypt_multiblock` kernel which should work when using multiple blocks of threads. Change your grid and block dimensions so that you launch `8` blocks of `128` threads.
