<a href="https://colab.research.google.com/github/wbandabarragan/computational-physics-2/blob/main/unit-3/310_GPU_and_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GPUs and APIs

# TensorFlow and GPU Parallelization:

TensorFlow is designed to take advantage of GPUs (Graphics Processing Units) to accelerate machine learning computations, especially deep learning models that involve large matrix operations.

GPU parallelization capabilities and functions:

- TensorFlow allows you to explicitly specify which device (CPU or GPU) to use for a particular operation.

## GPU abstraction and parallelisation:

- TensorFlow abstracts away the complexities of GPU programming, so you don't need to write low-level CUDA or OpenCL code.

- TensorFlow handles GPU memory management, kernel execution, and data transfer between CPU and GPU.

-  Many TensorFlow operations (like matrix multiplication, convolution, and activation functions) are automatically parallelized on GPUs. TensorFlow automatically manages GPU memory allocation and deallocation.

- TensorFlow's runtime optimizes these operations for efficient GPU execution. It uses a memory allocator to efficiently allocate and reuse GPU memory.

## Key Functions:

- **`tf.device()`** This is to direct operations to specific GPUs. If you don't specify, TensorFlow will attempt to use available GPUs automatically.

-  **`tf.config.list_physical_devices('GPU')`:** This function returns a list of all physical GPU devices that are available to TensorFlow. You can use this to check if TensorFlow is detecting your GPUs.

- **`tf.device('/GPU:0')`:** This context manager allows you to explicitly place operations on a specific GPU. `/GPU:0` refers to the first GPU, `/GPU:1` to the second, and so on.

    
## CUDA Integration:

- TensorFlow relies on NVIDIA's CUDA and cuDNN libraries for GPU acceleration.

- These libraries provide highly optimized routines for deep learning operations.


## Conda installation on a laptop or HPC with GPUs:

``
conda install tensorflow-gpu
``

## Use on Google Colab:

1. Change runtime type to T4 GPU (Go to the Runtime menu).


2. Import tensorflow:
```Python
import tensorflow as tf
```

In [4]:
!nvidia-smi

Tue Apr 15 18:10:51 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   38C    P8             11W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### GPU cores:

- **CUDA Cores:** The NVIDIA Tesla T4 GPU has 2560 CUDA cores. These are general-purpose parallel processing units.

- **Tensor Cores:** The Tesla T4 also features 320 Tensor Cores. These are specialized cores designed to accelerate matrix multiplications, which are fundamental for deep learning workloads.

In [7]:
import tensorflow as tf


In [8]:
# Number of GPUs detected
gpus = tf.config.list_physical_devices('GPU')
print("Num GPUs Available:", len(gpus))

Num GPUs Available: 1


While tf.config.list_physical_devices('GPU') shows you have 1 physical GPU device, that device (the Tesla T4) contains 2560 CUDA cores and 320 Tensor Cores that TensorFlow can utilize for parallel computation.

In [9]:
# Details of each GPU
for gpu in gpus:
    print("GPU Name:", gpu.name, " Type:", gpu.device_type)

GPU Name: /physical_device:GPU:0  Type: GPU


In [10]:
# Boolean indicating GPU availability

gpu_available = tf.config.list_physical_devices('GPU')
print("GPU Available:", bool(gpu_available))

GPU Available: True


In [11]:
# Whether TensorFlow is built with CUDA
cuda_available = tf.test.is_built_with_cuda()
print("CUDA Available:", cuda_available)

CUDA Available: True


## Usage example:

In [12]:
import time

In [29]:
# Define the size of the matrices
n_points = 4096  # A relatively large matrix

# Create two random matrices
a = tf.random.normal((n_points, n_points))
b = tf.random.normal((n_points, n_points))

print(type(a))

with tf.device('/GPU:0'):
    # Start time stamp
    start_gpu = time.time()

    # Matrix Multiplication
    c_gpu = tf.matmul(a, b)

    # End time stamp
    end_gpu = time.time()

    # Execution time
    gpu_time = end_gpu - start_gpu

    print(f"Matrix multiplication on GPU took: {gpu_time:.4f} seconds")

<class 'tensorflow.python.framework.ops.EagerTensor'>
Matrix multiplication on GPU took: 0.0004 seconds


### Matrix multiplication on GPU:

- When the ``tf.matmul(a, b)`` operation is placed on the GPU, the backend will automatically distribute the numerous calculations involved in matrix multiplication across the many available CUDA cores on your GPU.

- Each core performs a part of the overall computation in parallel, leading to a significant reduction in the total execution time compared to the sequential processing on the CPU.

In [30]:
with tf.device('/GPU:0'):
    # Start time stamp
    start_gpu = time.time()

    # Matrix Multiplication
    c_gpu = tf.matmul(a, b)

    # End time stamp
    end_gpu = time.time()

    # Execution time
    gpu_time = end_gpu - start_gpu

    print(f"Matrix multiplication on GPU took: {gpu_time:.4f} seconds")


Matrix multiplication on GPU took: 0.0006 seconds


### Matrix multiplication on CPU:

In [31]:
# Start time stamp
start_cpu = time.time()

# Matrix Multiplication -> We move to NumPy
c_cpu = tf.matmul(a.numpy(), b.numpy())
#c_cpu = a.numpy() @ b.numpy()

# End time stamp
end_cpu = time.time()

# Execution time
cpu_time = end_cpu - start_cpu

print(f"Matrix multiplication on CPU took: {cpu_time:.4f} seconds")

Matrix multiplication on CPU took: 0.3267 seconds


### Speedup:

In [32]:
# Ratio of execution times
speedup = cpu_time / gpu_time
print(f"Speedup (CPU/GPU): {speedup:.2f}x")

Speedup (CPU/GPU): 577.17x


## Running on GPUs on the HPC-Cedia cluster:

- The NVIDIA A100 GPU has 6912 CUDA cores.

- It also features 432 third-generation Tensor Cores, which are specialized units designed to accelerate matrix multiplications and deep learning applications


### Install:

- Activate your environment or create one for GPUs specifically:

```
conda activate py39
```

- Install tensorflow with GPU support:

```
conda install tensorflow-gpu
```

- Request resources interactively:

```
salloc -p gpu -n 1 -c 16  --mem=1GB --gres=gpu:a100_2g.10gb:1 --time=00:30:00
```

# (Optional) Application Programming Interfaces (APIs)


- An API
is a set of rules and protocols that allows different software applications to communicate and exchange data with each other.

- An API lists the operations that are available and how to request them (order), without the need to know the intricate details of how the service works internally.

A basic C API provides a set of C functions that allow Python to call and utilize specific functionalities implemented in C code. It acts as a simple interface to interact with the underlying C logic.


## Example:

Say Hi from C using python.


### 1. Basic C module with Python C APIs:

```bash
mkdir C_API_example && cd C_API_example

vim hola_modulo.c
```


```C
// Includes the Python.h header file, which provides the Python C API.
#include <Python.h>

// Includes the standard input/output library for C.
#include <stdio.h>

// Defining C function that will be accessible from Python with 'self' and 'args' from Python.
static PyObject* py_hello(PyObject *self, PyObject *args) {

	// Print a message:
    printf("Hola Mundo desde el lenguaje C!\n");

    // No value to be returned. Returns a 'None' object.
    Py_RETURN_NONE;
}

// Method table for the module, mapping Python function to C functions.
static PyMethodDef HolaMetodos[] = {
	// py_hello -> A pointer to the C function that implements this Python function.
    {"hola",  py_hello, METH_NOARGS, "Print 'Hola Mundo desde el lenguaje C!."},
    // METH_NOARGS -> the function takes no arguments from Python.
    // Marking the end of the array of method definitions.
    {NULL, NULL, 0, NULL}
};

// Module structure: provides metadata about the Python module.
static struct PyModuleDef hola_modulo = {
	// Internal members of the module definition structure.
    PyModuleDef_HEAD_INIT,
    
    // Name of module
    "hola_modulo",
    
    // Module documentation, in this case NULL/  
    NULL,
    // -1 so the module keeps state in global variables
    -1,
    // Pointer to PyMethodDef structures defined earlier
    HolaMetodos
};

// Module initialization function, void for call from Python.
PyMODINIT_FUNC PyInit_hola_modulo(void) {
	// Creates and returns the Python module object based on the definition in 'hola_modulo'.
    return PyModule_Create(&hola_modulo);
}
```


### 2. Setup script:

```bash
vim setup.py
```

```Python
# import tools
from setuptools import setup, Extension

# For compilation
module = Extension('hola_modulo', sources=['hola_modulo.c'])

# Setup
setup(
    name='HolaMundoCModule',
    version='0.1.0',
    ext_modules=[module]
)
```

### 3. Build interface (inplace for development):

```bash
python setup.py build_ext --inplace
````

### 4. Test:

```Bash
python
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:52:10)
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import hola_modulo
>>> hola_modulo.hola()
Hola Mundo desde el lenguaje C!
```