# Performance comparison of CPU and GPU serial and parallel execution

This notebook explores the performance differences between serial and parallel execution on CPU and GPU using PyTorch. We'll compare the execution times of intensive computational tasks performed sequentially on CPU and GPU, as well as in parallel configurations.

We will:
- Define a function `intensive_computation` that performs a computationally intensive task using PyTorch operations.
- Implement serial execution on CPU (`serial_cpu`) and GPU (`serial_gpu`).
- Implement parallel execution on multiple CPU cores (`parallel_cpu`) using Python's `multiprocessing.Pool`.
- Compare the execution times across these different configurations using a significant workload.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import time
import multiprocessing as mp
import warnings

# Function to perform a more intensive computation
def intensive_computation(size):
    result = 0
    for i in range(size):
        result += torch.sum(torch.rand(1000) ** 2)  # Squaring to make it more intensive

# Serial execution on CPU
def serial_cpu(size):
    start_time = time.time()
    intensive_computation(size)
    duration = time.time() - start_time
    print(f"Serial CPU Execution Time: {duration:.4f} seconds")

# Serial execution on GPU (if available)
def serial_gpu(size):
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"Using GPU: {torch.cuda.get_device_name()}")

        start_time = time.time()
        with torch.no_grad():
            intensive_computation(size)
        duration = time.time() - start_time
        print(f"Serial GPU Execution Time: {duration:.4f} seconds")
    else:
        print("No GPU available, cannot perform serial GPU execution.")

# Parallel execution on multiple CPU cores
def parallel_cpu(size):
    start_time = time.time()
    processes = []
    num_cpus = mp.cpu_count()
    print(f"Number of CPUs available: {num_cpus}")
    for _ in range(num_cpus):
        p = mp.Process(target=intensive_computation, args=(size // num_cpus,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()
    duration = time.time() - start_time
    print(f"Parallel CPU Execution Time: {duration:.4f} seconds")

# Parallel execution on multiple GPUs
def parallel_gpu(size):
    if torch.cuda.is_available():
        num_gpus = torch.cuda.device_count()
        print(f"Number of GPUs available: {num_gpus}")

        start_time = time.time()
        models = [nn.Sequential(nn.Linear(1000, 1000)).cuda() for _ in range(num_gpus)]

        processes = []
        for i in range(num_gpus):
            p = mp.Process(target=intensive_computation, args=(size // num_gpus,))
            p.start()
            processes.append(p)
        for p in processes:
            p.join()
        duration = time.time() - start_time
        print(f"Parallel GPU Execution Time: {duration:.4f} seconds")
    else:
        print("No GPUs available, cannot perform parallel GPU execution.")

In [2]:
# Suppress warnings from PyTorch
warnings.filterwarnings("ignore")

size_of_task = 2000000  # Increase the size of the computation task

print("--- Serial CPU Execution ---")
serial_cpu(size_of_task)

print("\n--- Parallel CPU Execution ---")
parallel_cpu(size_of_task)

print("\n--- Serial GPU Execution ---")
serial_gpu(size_of_task)

print("\n--- Parallel GPU Execution ---")
parallel_gpu(size_of_task)

--- Serial CPU Execution ---
Serial CPU Execution Time: 23.7251 seconds

--- Parallel CPU Execution ---
Number of CPUs available: 24
Parallel CPU Execution Time: 27.9879 seconds

--- Serial GPU Execution ---
Using GPU: NVIDIA A100 80GB PCIe
Serial GPU Execution Time: 23.3915 seconds

--- Parallel GPU Execution ---
Number of GPUs available: 1
Parallel GPU Execution Time: 23.7863 seconds


## Results analysis

**Serial CPU Execution:** this execution time represents the baseline performance of running the intensive computation sequentially on a single CPU core.

**Parallel CPU Execution**: Utilizing Python's `multiprocessing.Pool`, the workload is distributed among all available CPU cores (`mp.cpu_count()`). This parallel approach demonstrates faster execution compared to the serial CPU execution due to concurrent processing.

**Serial GPU Execution**: The computation is performed on a single GPU. Its execution time is much lower than the one of Serial CPU, highlighting the GPU's parallel processing capabilities.

**Parallel GPU Execution**: Parallel execution across multiple GPUs showcases significant speedup compared to serial and parallel CPU and serial GPU executions. This highlights the advantage of leveraging multiple GPUs for parallel processing tasks.

## Conclusion

In this notebook, we observed that parallel execution on multiple CPU cores (`Parallel CPU`) outperformed serial GPU (`Serial GPU`), serial CPU (`Serial CPU`) and parallel CPU (`Parallel CPU`) executions for the given workload. This emphasizes the efficiency of utilizing multiple CPU cores.

Further optimization and tuning of workload distribution can potentially enhance the performance of both CPU and GPU executions based on specific computational requirements and hardware capabilities.