# HPC as a solution for AI: PyTorch

<p style='text-align: justify;'>
In this section, it will be shown how to optimize PyTorch models, accelerating training and execution using GPUs.
</p>    

The principal gols are:
* **Use** the PyTorch library on GPU environments for the first time to accelerate the training of image classification models,
* **Familiarize** yourself with the CIFAR-10 and CIFAR-100 dataset by classifying their various classes,
* **Evaluate** and **Compare** the performance of your models on GPU and CPU environments to understand the benefits of GPU acceleration in AI tasks.

## The problem: Resource-intensive training and model scalability

<p style='text-align: justify;'>
As AI research progresses, deep neural networks have become critical for tasks like image generation and language translation. However, resource-intensive training challenges arise as networks become more complex and demanding in performance.
</p>

<p style='text-align: justify;'>
Research and development in artificial intelligence have made remarkable strides in recent decades, mainly driven by deep neural networks. These networks are computational structures loosely inspired by the functioning of the human brain. They are particularly well-suited for tasks that involve large volumes of data, such as pattern recognition in images, natural language processing, and more.
</p>

<p style='text-align: justify;'>
However, as the problems being addressed become more complex and performance demands increase, the need for computational resources also grows exponentially. Additionally, the scalability of these models becomes a concern as they grow in size and complexity. Maintaining and optimizing constantly expanding AI models becomes challenging for the research and development community.
</p>

## The solution: GPUs and PyTorch

<p style='text-align: justify;'>
Using libraries like PyTorch, a popular machine learning and AI framework, offers a flexible interface for designing, training, and evaluating neural networks using GPUs, especially when harnessed with computational prowess.
</p>
<p style='text-align: justify;'>
Furthermore, Intel® PyTorch is well equipped to fully utilize the optimizations and hardware support of Intel® processors and GPUs. This synergy results in an even more efficient and performance-oriented machine-learning experience. It enables practitioners to extract maximum computational throughput from their hardware infrastructure.
</p>

##  ☆ Challenge: Zoo breakout!☆ 

<p style='text-align: justify;'> 
    Recently, an unexpected incident occurred at the local zoo, <b>Orange Grove Zoo</b>: all the animals escaped from their enclosures and are now roaming freely. To deal with this situation, we need your help locating and classifying the escaped animals, distinguishing each animal class, and identifying possible vehicles in the same environment.
</p>
<p style='text-align: justify;'> 
You have been assigned as the person responsible for developing a computer vision system capable of identifying and classifying the escaped animals and identifying the presence of vehicles in the images. We will use the CIFAR-10 dataset and the TensorFlow library to train a deep-learning model for this challenge.
</p>
CIFAR-10 and CIFAR-100 datasets comprehensively collect $32$x$32$ pixel images grouped into $10$ distinct classes.

- [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html): CIFAR-10 consists of $60,000$ images, each belonging to one of the ten classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. This dataset offers a diverse set of images representing everyday objects.

- [CIFAR-100 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html): CIFAR-100 expands upon the CIFAR-10 concept, containing 60,000 images as well. However, it introduces a more challenging task by categorizing images into 100 classes. These classes include various subcategories such as fruits, animals, vehicles, and more.

a) **Create** deep neural network model utilizing the PyTorch library for the classification of animals and vehicles on a GPU environment using the CIFAR-10 dataset.

b) **Conduct** a comparative analysis between models trained on a CPU and GPU to highlight disparities in results.

c) Now, use the CIFAR-100 dataset for the classification of animals and vehicles on a GPU. Would it be a good decision to use a GPU or CPU environment for the training process?

### ☆  Solution for CIFAR-10 using PyTorch on a GPU  ☆

#### ⊗ Importing packages

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import time

#### ⊗ Define processing device

In [None]:
device = torch.device("cuda:0")

#### ⊗ Transformations to the data

As part of the data preparation process, we create a ```transforms``` object to apply specific transformations to the data. These transformations are commonly used in training datasets to enhance data diversity and ready images for utilization in a deep learning model, such as a convolutional neural network (CNN). I will provide a detailed explanation of each component:

In [None]:
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Following that, download the CIFAR-10 dataset and load it into the code. Define the neural network as we have done in previous notebooks, and remember to move this network instance to the previously defined device.

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 128 * 8 * 8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()
net.to(device)

#### ⊗ Training the network

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

gpu_start_time = time.time()

for epoch in range(10):  
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}')

gpu_end_time = time.time()

print(f'GPU Training time: {gpu_end_time - gpu_start_time}')

torch.save(net.state_dict(), 'cifar10_gpu_model.pth')

### Comments about the results

<p style='text-align: justify;'>
We explored training neural networks with PyTorch, comparing CPU and GPU performance on the CIFAR-10 dataset. When utilizing the CPU environment, the process can be executed in approximately <b>811 seconds or (13.52 minutes)</b>. However, when employed in the GPU environment with PyTorch, the execution time was reduced to approximately <b>191 seconds or (3.2 minutes)</b>. This outcome illustrates that the GPU has achieved nearly a <b>Speedup of 4X</b> compared to the CPU when running with 10 epochs. Thanks to its parallel computing capabilities, the GPU has substantially enhanced the training speed, which is particularly advantageous for handling extensive data and intricate models in deep learning.
</p>

## Summary
In this notebook we have shown: 

- Install and use PyTorch using GPU environments,
- Comparative performance tests between CPU and GPU on model training.

## Clear the memory

Before moving on, please execute the following cell to clear up the CPU memory. This is required to move on to the next notebook.

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## Next

Congratulations on finishing the HPC simulations topic!! You have completed the last part of the learning objectives of this part of the course! As a final exercise, complete an applied problem in the assessment in [_04-hpc-simulations-assessment.ipynb_](04-hpc-simulations-assessment.ipynb).