<h1>GPU test notebook using PyTorch and CUDA</h1>
This GPU test notebook is designed to showcase the output of a system equipped with a GPU and running the OPAL software.

<h3>What is PyTorch</h3>
PyTorch is an open source marchine learning framework that enables you to perform scientific and tensor computations.  PyTorch can be used to speed up deep learning with the addition of a GPU or Graphics Processing Unit.  It is an open-source deep learning framework built on Python.  It is know for being flexible and easy to use. Here is what Pytorch can offer:

- **Deep learning focus:** PyTorch is particularly well-suited for building deep learning models, a type of artificial intelligence inspired by the structure and function of the brain. These models are used in various applications like image recognition, natural language processing, and recommendation systems.
- **Pythonic nature:** Since it's written in Python, a widely used high-level programming language, PyTorch is considered beginner-friendly for those already familiar with Python. This makes it easier to learn and use compared to some other deep learning frameworks.
- **Dynamic computational graphs:** One of PyTorch's strengths is its use of dynamic computational graphs. These graphs define the relationships between different parts of a model and how data flows through it. PyTorch allows for these graphs to be modified on the fly, which is helpful for fast experimentation and prototyping.
- **GPU support:** For demanding computations, PyTorch has excellent support for graphics processing units (GPUs). GPUs can significantly accelerate training deep learning models.

Pytorch Documentation : [Pytorch Documentation](https://pytorch.org/get-started/locally/)

<h3>What is CUDA</h3>
CUDA is a programming model and computing toolkit developed by NVIDIA.  It takes advantage of the power of GPUs for general computing tasks, especially computations that can be parallelized or broken down into many smaller tasks that can be done at the same time.

- **Parallel computing platform:** CUDA provides a way to write programs that can be run on multiple cores in a GPU simultaneously. This can be a huge speedup for tasks that can be broken down into many smaller pieces.

- **Programming model:** CUDA includes a set of tools and libraries that make it easier to write programs that run on GPUs. This includes a special dialect of C/C++ that allows you to write code that can be run on the GPU.

<h3>CUDA can also be used for</h3>

- **Machine Learning:** Machine learning algorithms often involve a lot of mathematical computations that are well-suited to GPUs.
- **Scientific computing:** Simulations and other scientific calculations can be accelerated with CUDA.
- **Video editing and processing:** Many video editing and processing applications use CUDA to improve performance.
- **Cryptocurrency mining:** Some cryptocurrencies can be mined more efficiently using GPUs with CUDA.

CUDA Documentation : [CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/) 

<h3>How does pytorch and CUDA work together</h3>

You can think of CUDA as the hardware accelerator. CUDA unlocks the power of the GPU for interfaces used by the user.  In this instance pytoch is the user-friendly interface.  PyTorch bridges the gap between the python code and the underlying CUDA capabilities.  It provides a user-friendly API with the ```torch.cuda``` library to manage ensors (data structures) on the GPU.

- **Tensor Placement:** You can create tensors on the GPU using ```device='cuda'``` argument during creation or move existing tensors using the ```.to('cuda')``` method. PyTorch automatically keeps track of which GPU you're using and assigns tensors to that device.
- **Computation Acceleration:** Once your tensors reside on the GPU, PyTorch leverages CUDA to perform computations in parallel across the GPU cores. This significantly accelerates operations compared to the CPU.
- **Data Transfer:** There's some overhead involved in transferring data between CPU and GPU memory. PyTorch manages these transfers efficiently.

Pytorch CUDA documentation : [PyTorch documentation using CUDA](https://pytorch.org/docs/stable/cuda.html)

<h2>Getting started</h2>
First it is important to check your GPU system.  Using the NVIDIA System Management Interface SMI.  This command is intended to aid in the management and monitoring of NVIDA GPU devices.

In [1]:
!nvidia-smi

Thu Mar 21 15:34:22 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-16GB           Off | 00000000:00:1E.0 Off |                    0 |
| N/A   20C    P0              35W / 300W |      0MiB / 16384MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

This will give you a quick printout that lists important information about your GPU.  

<h4>Now lets use PyTorch to test CUDA</h4>

In [9]:
# imports are always needed when using PyTorch
import torch

In [10]:
# check to see if the gpu is available.  If this call returns false, you have an issue with your GPU configuration
torch.cuda.is_available()

True

In [11]:
# get the GPU index or ID
torch.cuda.current_device()

0

In [14]:
# get the number of current GPUs on the system
torch.cuda.device_count()

1

In [12]:
# get the name of the GPU
torch.cuda.get_device_name(0)

'Tesla V100-SXM2-16GB'

In [15]:
# this function might come in handy if you are running into memory issues while using PyTorch with CUDA
torch.cuda.empty_cache()

Now that we have some basic knowledge of the GPU using PyTorch, we can run some tests to compare the difference a GPU can make.

In [18]:
# this test will pit the CPU vs the GPU using a tensor filles with the scalar value 1, with the shape defined by the variable argument

import torch
import time
 
###CPU
start_time = time.time()
a = torch.ones(400,400)
for _ in range(1000000):
    a += a
elapsed_time = time.time() - start_time
 
print('CPU time = ',elapsed_time)
 
###GPU
start_time = time.time()
b = torch.ones(400,400).cuda()
for _ in range(1000000):
    b += b
elapsed_time = time.time() - start_time
 
print('GPU time = ',elapsed_time)

CPU time =  12.13370656967163
GPU time =  4.866923570632935


In [17]:
# this test can be ran with the CPU or GPU.  It uses the RANDN function to return a tensor filled with random numbers from a normal distribution

import torch
import math


dtype = torch.float
#device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

99 366.05731201171875
199 247.44046020507812
299 168.3035888671875
399 115.47830200195312
499 80.19639587402344
599 56.618072509765625
699 40.85108947753906
799 30.30082130432129
899 23.23638916015625
999 18.50265121459961
1099 15.328434944152832
1199 13.1982421875
1299 11.767601013183594
1399 10.805967330932617
1499 10.159031867980957
1599 9.72340202331543
1699 9.429821968078613
1799 9.231765747070312
1899 9.098038673400879
1999 9.007640838623047
Result: y = -0.009008153341710567 + 0.8673025369644165 x + 0.0015540558379143476 x^2 + -0.09483269602060318 x^3
