# Introduction to CUDA C

In [None]:
import os
# the jupyter notebook is launched from your $HOME, change the working directory provided a username directory is created under /scratch/vp91
os.chdir(os.path.expandvars("/scratch/vp91/$USER/introduction-to-cuda/exercises"))

## 0. Hello, world!
Our first example ([hello_world.cu](./exercises/hello_world.cu)), printing "Hello, world!" 64 times to the console in parallel.

Run the next cell to compile the program, and the one after that to run it.

In [None]:
!nvcc hello_world.cu -o hello_world

In [None]:
!./hello_world

## 1. AXPY

The goal is to write a cuda kernel function in [axpy.cu](./exercises/axpy.cu) called `axpy` takes `a`, `X`, `Y`, and `Z` as input variables, and stores `a * X + Y` in `Z`, where `X`, `Y`, and `Z` are all arrays of equal length and `a` is a scalar.

Run the cell below to compile and test your code.

In [None]:
!nvcc axpy.cu -o axpy && ./axpy

## 2. Overlapping memory transfer and compute

Modify the `test_axpy()` function in [axpy_overlapped.cu](./exercises/axpy_overlapped.cu) to overlap memory transfer and calculations using streams.

Compare the execution time of your code and the non-overlapped version using the commands below:

**Non-Overlapped:**

In [None]:
!nvcc axpy_non_overlapped.cu -o axpy_non_overlapped && ./axpy_non_overlapped

**Overlapped:**

In [None]:
!nvcc axpy_overlapped.cu -o axpy_overlapped && ./axpy_overlapped

## 3. Modular code

Use the cell below to create a new folder called `axpy_modular` and change into it. Split your original [axpy.cu](./exercises/axpy.cu) code (or the code in [solutions/axpy.cu](./exercises/solutions/axpy.cu)) into a `main.c` file compiled with `gcc`, an `axpy.cu` file compiled with `nvcc`, and an `axpy.h` file to allow `main.c` to access the AXPY code. For completeness, you could also create an `axpy.cuh` file that's included by `axpy.cu` to forward-declare the kernel (this would be useful in a larger project if multiple files needed to access the kernel function).

The `axpy.cu` file should contain the CUDA kernel and a C function that calls it.
`main.c` should then call that C function. Try having the implementation of the `test_axpy()` function in `main.c` - calls to CUDA library functions are valid from regular C code.

In [None]:
# Create the axpy directory and change into it
import os
axpy_dir = os.path.expandvars("/scratch/vp91/$USER/introduction-to-cuda/exercises/axpy_modular")
try:
    os.mkdir(axpy_dir)
except FileExistsError:
    pass
os.chdir(axpy_dir)

In [None]:
nvcc # TODO: compile axpy.cu into a static object

In [None]:
gcc # TODO: compile main.c with axpy.o