# Practice Exercise: Where Does Your CUDA Code Run?

In this notebook, we'll practice identifying where code executes in a CUDA program. Understanding execution spaces is a fundamental concept that will help you write effective GPU code.

<img src="./images/image3.jpg" width="800" height="600">

[image credits](https://unsplash.com/illustrations/a-black-and-white-photo-of-a-keyboard-poUqDRe3Z7U)

# Introduction to Execution Spaces

When writing CUDA programs, you need to be conscious of where each line of code runs:

- **CPU (Host)**: The main processor that coordinates computation

- **GPU (Device)**: The accelerator that performs parallel computation

A common misconception is that simply using the CUDA compiler (NVCC) makes code run on the GPU. The reality is that you must explicitly specify which parts of your code should run on the GPU.

# Exercise: Identifying Execution Spaces

Let's see if you can determine where different parts of a program execute. In the following code, we've used a helper function called print_location() that will tell us where code is running.

Your task is to replace each "???" with either "CPU" or "GPU" based on where you think that part of the code executes.

In [1]:
#Specifying path to where nvcc exists so that the jupyter notebook reads from it. nvcc is the nvidia cuda compiler for executing cuda. 
import os
os.environ['PATH'] = "/packages/apps/spack/21/opt/spack/linux-rocky8-zen3/gcc-12.1.0/cuda-12.6.1-cf4xlcbcfpwchqwo5bktxyhjagryzcx6/bin:" + os.environ['PATH']

In [1]:
%%writefile codes/particle_locations.cu
#include <thrust/execution_policy.h>
#include <cstdio>

// Helper function to print execution location, callable from both host and device
__host__ __device__ void execution_location(const char* location) {
#ifdef __CUDA_ARCH__
    printf("Currently executing on: GPU (%s)\n", location);
#else
    printf("Currently executing on: CPU (%s)\n", location);
#endif
}

int main() {
    
    execution_location("???");

    thrust::for_each_n(thrust::device,
                       thrust::counting_iterator<int>(0), 1,
                       [=] __host__ __device__ (int) {
                           execution_location("???");
                       });

    
    thrust::for_each_n(thrust::host,
                       thrust::counting_iterator<int>(0), 1,
                       [=] __host__ __device__ (int) {
                           execution_location("???");
                       });

    // Runs on CPU
    execution_location("???");

    return 0;
}

Overwriting codes/particle_locations.cu


After filling in your answers, compile and run the code:


In [21]:
%%bash
nvcc -o codes/particle_locations --extended-lambda codes/particle_locations.cu
./codes/particle_locations              

Currently executing on: CPU (???)
Currently executing on: GPU (???)
Currently executing on: CPU (???)
Currently executing on: CPU (???)
