# **CUDA Examples**

In [None]:
import os
os.environ["PATH"] += ":/usr/local/cuda/bin"

# Verify nvcc is now accessible
!nvcc --version

## **1. Matrix**

### **a. Matrix Transpose**

Matrix transposition is the process of swapping the rows and columns of a matrix. For a given matrix A, its transpose A<sup>T</sup> is formed by converting the element at position (i,j)(i,j) in A to position (j,i)(j,i) in A<sup>T</sup>.

In [None]:
!make SRC=./matrix/matrix_transpose.cu run

### **b. Matrix Addition**

Matrix addition is an element-wise operation where corresponding elements from two matrices of the same dimensions are added together. The resulting matrix has the same dimensions as the input matrices, with each element being the sum of the corresponding elements from the input matrices. In CUDA, this operation can be parallelized by assigning each element addition to a separate thread, making it highly efficient compared to sequential CPU processing.

In [None]:
!make SRC=./matrix/matrix_addition.cu run

### **c. Matrix Multiplication**

Matrix multiplication is an operation that combines two matrices to produce a third matrix. Given matrices 
A and B, the element C[i,j] in the result matrix C is calculated as the dot product of the i-th row of A and the 
j-th column of B. This operation is widely used in fields like machine learning, computer graphics, and scientific computing.

In [None]:
!make SRC=./matrix/matrix_multiplication.cu run

In [None]:
!make SRC=./matrix/matrix_transpose.cu clean
!make SRC=./matrix/matrix_addition.cu clean
!make SRC=./matrix/matrix_multiplication.cu clean

## **2. Reduction**

### **a. Maximum/Minimum**

Finding maximum value within an array.

In [None]:
!make SRC=./reduction/max.cu run

*Has been implemented only for integers because atmomicMax only works on integers. For floating point numbers, there's a different technique using atomicCAS to find the maximum value.*

### **b. Sum**

Finding sum of all the elements in an array.

In [None]:
!make SRC=./reduction/sum.cu run

In [None]:
!make SRC=./reduction/max.cu clean
!make SRC=./reduction/sum.cu clean

## **3. Parallel Scan**

### **a. Parallel Prefix Sum (Hillis-Steele Inclusive Scan)**

<div style="text-align: center;">
  <img src="./parallel_scan/hillis_steele.png" alt="Hillis Steele" width="400">
</div>

In [None]:
!make SRC=./parallel_scan/hillis_steele_prefix_sum.cu run

### **b. Blelloch Scan Prefix Sum**

<div style="text-align: center;">
  <img src="./parallel_scan/blelloch_scan_reduce.png" alt="Blelloch Scan Reduce" width="400">
</div>

<div style="text-align: center;">
  <img src="./parallel_scan/blelloch_scan_down_sweep.png" alt="Blelloch Scan Down Sweep" width="400">
</div>

In [None]:
!make SRC=./parallel_scan/blelloch_prefix_sum.cu run

In [None]:
!make SRC=./parallel_scan/hillis_steele_prefix_sum.cu clean
!make SRC=./parallel_scan/blelloch_prefix_sum.cu clean

## **4- Searching**

### **a. Parallel Binary Search**

In [None]:
!make SRC=./searching/parallel_binary_search.cu run

### **b. K-Nearest Neighbors (KNN) Search**

In [None]:
!make SRC=./searching/knn_search.cu run

In [None]:
!make SRC=./searching/parallel_binary_search.cu clean
!make SRC=./searching/knn_search.cu clean

## **5. Image/Signal Processing**

### **a. Image Convolution (Gaussian Blur)**

In [None]:
import cv2
import matplotlib.pyplot as plt

img = cv2.imread('./image_signal_processing/input.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.figure(figsize=(10,8))
plt.imshow(img)
plt.axis('off')
plt.show()

In [None]:
!nvcc ./image_signal_processing/gaussian_blur_image.cu -o ./image_signal_processing/gaussian_blur_image `pkg-config --cflags --libs opencv4` -diag-suppress 611
!./image_signal_processing/gaussian_blur_image

In [None]:
img = cv2.imread('./image_signal_processing/result_gpu.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.figure(figsize=(10,8))
plt.imshow(img)
plt.axis('off')
plt.show()

In [None]:
!make SRC=./image_signal_processing/gaussian_blur_image.cu clean

### **b. Fast Fourier Transform (FFT) on a signal**

`sudo apt-get install libsndfile1-dev`

In [None]:
!nvcc -o ./image_signal_processing/fft_mp3 ./image_signal_processing/fft_mp3.cu -lcufft -lsndfile
!./image_signal_processing/fft_mp3

In [None]:
!python3 ./image_signal_processing/visualize_frequency_domain.py

In [None]:
img = cv2.imread('./image_signal_processing/fft_comparison.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.figure(figsize=(10,8))
plt.imshow(img)
plt.axis('off')
plt.show()

In [None]:
!make SRC=./image_signal_processing/fft_mp3.cu clean

## **7. Histogram Equaliztion**

## **8. Statistical Simulation**

### **a. Monte Carlo Simulation**

## **9. Physics Simulation**

### **a. N-Body Simulation**

### **b. Navier-Stokes Fluid Simulation**

### **c. Heat Diffusion**

## **10. Graph Algorithms**

### **a. Breadth-First Search (BFS)**