# CPU and GPU Computing Evaluation
### By Aria Moradi

In this project we aim to compare execution performance of CPU and GPU for simple computation problems like matrix multiplication and image processing.

There are 2 scenarios, the first one is multiplication of two large random matrices:

In [2]:
""" The Explanation of the Code:
In this code, we start by defining the size of the square matrices to be multiplied (matrix_size).
We then generate two random matrices (matrix_a and matrix_b) using the np.random.rand function.

We define two functions: cpu_matrix_multiplication and gpu_matrix_multiplication,
which perform matrix multiplication using CPU and GPU, respectively.

In the CPU function, we use the NumPy np.dot function for matrix multiplication.
In the GPU function, we use the cp.asarray function from the cupy library
to transfer the matrices to the GPU, perform matrix multiplication using cp.dot,
and then transfer the result back to the CPU using cp.asnumpy.

After running the CPU and GPU matrix multiplication functions,
we can optionally compare the results and print the execution times.

Please note that for the GPU matrix multiplication to work,
you need to have the cupy library installed, which provides
a NumPy-compatible interface for GPU computations.
Also, make sure you have a compatible GPU and CUDA drivers installed.

Feel free to adjust the matrix size and modify the code according to your specific requirements.
"""

import numpy as np
import cupy as cp
import tensorflow as tf
import torch
import time

# Define the matrix sizes
matrix_size = 1000
test_count = 100
matrix_pairs = []

for i in range(test_count):
    matrix_a = np.random.rand(matrix_size, matrix_size)
    matrix_b = np.random.rand(matrix_size, matrix_size)
    matrix_pairs.append([matrix_a, matrix_b])


# CPU Matrix Multiplication
def cpu_matrix_multiplication(matrix_a, matrix_b):
    start_time = time.time()

    # Perform matrix multiplication using CPU
    cpu_result = np.dot(matrix_a, matrix_b)

    end_time = time.time()
    execution_time = end_time - start_time
    return cpu_result, execution_time


# GPU Matrix Multiplication
def gpu_matrix_multiplication_cupy(matrix_a, matrix_b):
    start_time = time.time()

    # Perform matrix multiplication using GPU
    gpu_matrix_a = cp.asarray(matrix_a)
    gpu_matrix_b = cp.asarray(matrix_b)
    gpu_result = cp.dot(gpu_matrix_a, gpu_matrix_b)
    cpu_result = cp.asnumpy(gpu_result)

    end_time = time.time()
    execution_time = end_time - start_time
    return cpu_result, execution_time


def gpu_matrix_multiplication_tensorflow(matrix_a, matrix_b):
    start_time = time.time()

    with tf.device('/GPU:0'):
      gpu_result = tf.matmul(matrix_a, matrix_b)

    end_time = time.time()
    execution_time = end_time - start_time
    return gpu_result, execution_time


def gpu_matrix_multiplication_pytorch(matrix_a, matrix_b):
    start_time = time.time()

    gpu_matrix_a = torch.from_numpy(matrix_a)
    gpu_matrix_b = torch.from_numpy(matrix_b)
    gpu_result = torch.matmul(gpu_matrix_a, gpu_matrix_b)
    gpu_result = gpu_result.numpy()

    end_time = time.time()
    execution_time = end_time - start_time
    return gpu_result, execution_time


def cpu_test():
    test_result = []

    for i in range(test_count):
        result, execution_time = cpu_matrix_multiplication(matrix_a, matrix_b)
        # test_result.append([result, execution_time])
        test_result.append(execution_time)

    return test_result


def gpu_test(mul_function):
    test_result = []

    for i in range(test_count):
        result, execution_time = mul_function(matrix_a, matrix_b)
        # test_result.append([result, execution_time])
        test_result.append(execution_time)

    return test_result


# Run CPU Matrix Multiplication
cpu_test_result = cpu_test()

# Run GPU Matrix Multiplication
gpu_test_result_cupy = gpu_test(gpu_matrix_multiplication_cupy)
gpu_test_result_tensorflow = gpu_test(gpu_matrix_multiplication_tensorflow)
gpu_test_result_pytorch = gpu_test(gpu_matrix_multiplication_pytorch)

# Compare the results (optional)
# print("CPU Result:")
# print(cpu_result)
# print("GPU Result:")
# print(gpu_result)

# Print the execution times
print("CPU Execution Time:              ", cpu_test_result, "seconds")
print("GPU Execution Time (CuPy):       ", gpu_test_result_cupy, "seconds")
print("GPU Execution Time (TensorFlow): ", gpu_test_result_tensorflow, "seconds")
print("GPU Execution Time (PyTorch):    ", gpu_test_result_pytorch, "seconds")

print("\n")

print("CPU Execution Time Average:              ", sum(cpu_test_result) / test_count, "seconds")
print("GPU Execution Time Average (CuPy):       ", sum(gpu_test_result_cupy) / test_count, "seconds")
print("GPU Execution Time Average (TensorFlow): ", sum(gpu_test_result_tensorflow) / test_count, "seconds")
print("GPU Execution Time Average (PyTorch):    ", sum(gpu_test_result_pytorch) / test_count, "seconds")


CPU Execution Time:               [0.09871101379394531, 0.09735655784606934, 0.0996243953704834, 0.09606766700744629, 0.0982978343963623, 0.09641170501708984, 0.09520339965820312, 0.09458136558532715, 0.12127375602722168, 0.09517240524291992, 0.10121512413024902, 0.10441279411315918, 0.09888458251953125, 0.0973055362701416, 0.0970005989074707, 0.09534716606140137, 0.10003113746643066, 0.10153770446777344, 0.0951070785522461, 0.10240030288696289, 0.09217095375061035, 0.11425399780273438, 0.1124114990234375, 0.10115885734558105, 0.08875465393066406, 0.07398223876953125, 0.05490469932556152, 0.052301645278930664, 0.05430030822753906, 0.053090572357177734, 0.05406379699707031, 0.05340075492858887, 0.07061004638671875, 0.051386117935180664, 0.0543673038482666, 0.05651116371154785, 0.05031251907348633, 0.050954580307006836, 0.05486464500427246, 0.06138300895690918, 0.05218815803527832, 0.052088022232055664, 0.05606245994567871, 0.049783945083618164, 0.05771970748901367, 0.05292630195617676, 

In this scenario we'll compare exection time of CPU (based on numpy), and 3 GPU accelarated methods based on CuPy, TensorFlow and PyTorch for matix multiplication.

For a matrix muliplication of two matrices of size N x N, it is required to perform an order of N^3 operations, in this test N = 1000 so we expect to perform an order of 10^9 operations, we also perform 100 tests to average out execution times.

As the output shows that exection times on the GPU is generally faster because it is desigend to accelarate vector operations and runs many cores to maximize parallization.

Also we observe that:
- TensorFlow is fastest as it runs all the executions on the GPU without relying on system RAM.
- CuPy and NumPy results are close as CuPy implements NumPy on CUDA but still uses system RAM.
- PyTorch is faster than CuPy but not by much because this library doesn't directly support numpy arrays, we have to convert back and fourth numpy memory to it's "torch" matrix type.

In [1]:
""" Code Definition to use:
In this code, we start by loading an image for processing using the cv2.imread function.
Then, we define two functions: cpu_image_processing and gpu_image_processing,
which perform image processing operations using CPU and GPU, respectively.

In the example, we convert the image to grayscale using the cv2.cvtColor function
for both CPU and GPU processing.

The cpu_image_processing function measures the execution time using the time module.
Similarly, the gpu_image_processing function measures the execution time
but with GPU-accelerated operations using the OpenCV CUDA module.

After running the CPU and GPU image processing functions,
the results are displayed using cv2.imshow, and the execution times are printed.

Please note that for this code to work, you need to have OpenCV installed with CUDA support.
Additionally, you may need to modify the image processing operations based on
your specific requirements.

Remember to replace "path_to_your_image.jpg" with the actual path to your image file
"""

import cv2
import numpy as np
import time
import cupy as cp

# Load an image for processing
image_paths = [
    "drive/MyDrive/3.jpg",
    "drive/MyDrive/1.jpg",
    "drive/MyDrive/bigImage.png"
]

test_count = 50

# CPU Image Processing
def cpu_image_processing(image):
    start_time = time.time()

    # Perform image processing operations using CPU
    # Example: Convert the image to grayscale
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    end_time = time.time()
    execution_time = end_time - start_time
    return gray_image, execution_time


# GPU Image Processing
def gpu_image_processing(image):
    start_time = time.time()

    # Perform image processing operations using GPU
    # Example: Convert the image to grayscale
    gpu_image = cv2.cuda_GpuMat()
    gpu_image.upload(image)
    gpu_gray_image = cv2.cuda.cvtColor(gpu_image, cv2.COLOR_BGR2GRAY)
    gray_image = gpu_gray_image.download()

    end_time = time.time()
    execution_time = end_time - start_time
    return gray_image, execution_time


def gpu_image_processing_2(image):
    # start_time = time.time()

    # gray_image = np.dot(image, [0.2989, 0.5870, 0.1140]).astype(np.uint8)
    gpu_image = cp.asarray(image)
    second = cp.asarray([0.299, 0.587, 0.114])

    start_time = time.time()


    gpu_result = cp.dot(gpu_image, second)
    x = gpu_result[0][0]

    end_time = time.time()


    gray_image = cp.asnumpy(gpu_result)

    # end_time = time.time()

    execution_time = end_time - start_time
    return gray_image, execution_time

def cpu_test(image):
    test_result = []

    for i in range(test_count):
        result, execution_time = cpu_image_processing(image)
        test_result.append(execution_time)

    return test_result

def gpu_test(image):
    test_result = []

    for i in range(test_count):
        result, execution_time = gpu_image_processing_2(image)
        test_result.append(execution_time)

    return test_result

for image_path in image_paths:
  print("Testing image ", image_path)

  image = cv2.imread(image_path)

  print("Image size is ", image.shape)

  # Run CPU Image Processing
  # cpu_result, cpu_execution_time = cpu_image_processing(image)
  cpu_test_result = cpu_test(image)

  # Run GPU Image Processing
  # gpu_result, gpu_execution_time = gpu_image_processing_2(image)
  gpu_test_result = cpu_test(image)

  # print(cpu_result)
  # print(gpu_result)

  # Display the results and execution times
  # cv2.imshow("CPU Result", cpu_result)
  # cv2.imshow("GPU Result", gpu_result)
  # cv2.waitKey(0)

  print("CPU Execution Time:              ", cpu_test_result, "seconds")
  print("GPU Execution Time (CuPy):       ", gpu_test_result, "seconds")

  print("\n")

  print("CPU Execution Time Average:              ", sum(cpu_test_result) / test_count, "seconds")
  print("GPU Execution Time Average (CuPy):       ", sum(gpu_test_result) / test_count, "seconds")

  print("#################")



Testing image  drive/MyDrive/3.jpg
Image size is  (960, 640, 3)
CPU Execution Time:               [0.02121734619140625, 0.0005185604095458984, 0.0002722740173339844, 0.0002689361572265625, 0.0016677379608154297, 0.0003948211669921875, 0.0003974437713623047, 0.0003726482391357422, 0.0003523826599121094, 0.0003590583801269531, 0.00035572052001953125, 0.00037598609924316406, 0.0004093647003173828, 0.00037670135498046875, 0.0003974437713623047, 0.0003979206085205078, 0.00046706199645996094, 0.00040435791015625, 0.000408172607421875, 0.0004074573516845703, 0.0003693103790283203, 0.0008378028869628906, 0.0008804798126220703, 0.0008389949798583984, 0.0008230209350585938, 0.0008797645568847656, 0.0007486343383789062, 0.00039958953857421875, 0.0003848075866699219, 0.00037980079650878906, 0.0003952980041503906, 0.0003821849822998047, 0.00036525726318359375, 0.0003650188446044922, 0.0003581047058105469, 0.0003783702850341797, 0.0003635883331298828, 0.0003554821014404297, 0.0008742809295654297, 0.

In this scenario we'll compare exection time of CPU (based on OpenCV), and 2 GPU accelarated methods based on OpenCV on CUDA and Cupy for transformation of RGB image to Grayscale.

Note that the readme for this scenario proposed to use the PIL but as it stands this library doesn't support GPU accelarion.
Also OpenCV on CUDA doesn't work on google colab and I wasn't able to compile it for my personal system either.

According to OpenCV documentaion, transforming RGB images to Grayscale can be done by simply calculating the sum of multiplication of an scaler into color each channel for every single pixel.

As the output shows that exection times on the GPU is generally faster because it is desigend to accelarate vector operations and runs many cores to maximize parallization, but as noted in the previous scenario CuPy execution times isn't signifactly faster than CPU execution times.

Note that according to internet research running the same operation via OpenCV on CUDA is abou 300% faster than CPU.