# Q1: Identify !, %, and %% used in cell in Google Colab.

In [19]:
#! runs a shell command
!nvidia-smi

Sun Feb  1 18:03:41 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   35C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
# % single line magic command
# %% entire cell of magic commands

# Q2: Identify all key nvidia-smi commands with multiple options

In [2]:
!nvidia-smi --help


NVIDIA System Management Interface -- v550.54.15

NVSMI provides monitoring information for Tesla and select Quadro devices.
The data is presented in either a plain text or an XML format, via stdout or a file.
NVSMI also provides several management operations for changing the device state.

Note that the functionality of NVSMI is exposed through the NVML C-based
library. See the NVIDIA developer website for more information about NVML.
Python wrappers to NVML are also available.  The output of NVSMI is
not guaranteed to be backwards compatible; NVML and the bindings are backwards
compatible.

http://developer.nvidia.com/nvidia-management-library-nvml/
http://pypi.python.org/pypi/nvidia-ml-py/
Supported products:
- Full Support
    - All Tesla products, starting with the Kepler architecture
    - All Quadro products, starting with the Kepler architecture
    - All GRID products, starting with the Kepler architecture
    - GeForce Titan products, starting with the Kepler architecture
- L

# Q3: Debug common CUDA errors (zero output, incorrect indexing, PTX errors)

In [None]:
# Zero output erro
# Caused when Kernel not launched or printf missing new line or wrong grid/block configuration

# incorrect indexing
int id = threadIdx.x;

# Correct Indexing
int id = blockIdx.x * blockDim.x + threadIdx.x;


# PTX errors occur when there is a CUDA version mismatch or unsupported GPU architecture


# 4. Write a CUDA C/C++ program to demonstrate GPU kernel execu'on and thread indexing.
* a. Launch a CUDA kernel using: 1 block and 8 threads
* b. Each thread must print: Hello from GPU thread <global_thread_id>
* c. Compute the global thread ID using: global_thread_id = blockIdx.x * blockDim.x + threadIdx.x
* d. Clearly separate: Host code (CPU) & Device code (GPU kernel)

In [6]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [11]:
%%writefile firstprogram.cu
#include <stdio.h>

__global__ void helloKernel() {
    int global_thread_id = blockIdx.x * blockDim.x + threadIdx.x;
    printf("Hello from GPU thread %d\n", global_thread_id);
}

int main() {
    helloKernel<<<1, 8>>>();
    cudaDeviceSynchronize();
    return 0;
}


Overwriting firstprogram.cu


In [12]:
!nvcc firstprogram.cu -o firstprogram -arch=sm_75
!./firstprogram

Hello from GPU thread 0
Hello from GPU thread 1
Hello from GPU thread 2
Hello from GPU thread 3
Hello from GPU thread 4
Hello from GPU thread 5
Hello from GPU thread 6
Hello from GPU thread 7


In [14]:
%%writefile question5.cu
#include <stdio.h>

__global__ void printKernel(int *d_arr) {
    int id = threadIdx.x;
    printf("GPU thread %d sees value %d\n", id, d_arr[id]);
}

int main() {
    int h_arr[5] = {10, 20, 30, 40, 50};
    int *d_arr;

    cudaMalloc((void**)&d_arr, 5 * sizeof(int));
    cudaMemcpy(d_arr, h_arr, 5 * sizeof(int), cudaMemcpyHostToDevice);

    printKernel<<<1, 5>>>(d_arr);
    cudaDeviceSynchronize();

    cudaMemcpy(h_arr, d_arr, 5 * sizeof(int), cudaMemcpyDeviceToHost);

    printf("\nValues copied back to CPU:\n");
    for (int i = 0; i < 5; i++) {
        printf("%d ", h_arr[i]);
    }

    cudaFree(d_arr);
    return 0;
}


Writing question5.cu


In [15]:
!nvcc question5.cu -o question5 -arch=sm_75
!./question5

GPU thread 0 sees value 10
GPU thread 1 sees value 20
GPU thread 2 sees value 30
GPU thread 3 sees value 40
GPU thread 4 sees value 50

Values copied back to CPU:
10 20 30 40 50 

# 6. Compare CPU times of List/tuple with Numpy arrays.

In [20]:
import numpy as np
import time

SIZE = 1_000_000

my_list = list(range(SIZE))

my_array = np.arange(SIZE)

start = time.time()
total = sum(my_list)
list_time = (time.time() - start) * 1000

start = time.time()
total = np.sum(my_array)
numpy_time = (time.time() - start) * 1000

print(f"Sum - List: {list_time:.2f} ms | NumPy: {numpy_time:.2f} ms | Speedup: {list_time/numpy_time:.1f}x")

Sum - List: 8.90 ms | NumPy: 0.56 ms | Speedup: 15.9x
