## Import Libraries </br>
torch - deep learning framework </br>
torch functional - torch API</br>
Dataset - Dataset stores the samples and their corresponding labels</br>
DataLoader - DataLoader wraps an iterable around the Dataset to enable easy access to the samples, iterable over a dataset </br>
MyTrainDataset - Custom class in datautils file</br>

## Make a custom dataset class. 

Datasets represent a map from keys to data samples.
All subclasses overwrite `__getitem__`, which supports fetching a data sample for a given key.
And, __len__ function counts the size of total sample
    

In [10]:
import torch
from torch.utils.data import Dataset

class MyTrainDataset(Dataset):
    def __init__(self, size):
        self.size = size
        self.data = [(torch.rand(20), torch.rand(1)) for _ in range(size)]

    def __len__(self):
        return self.size
    
    def __getitem__(self, index):
        return self.data[index]

In [11]:
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

### Class Trainer


We create a class Trainer to train our network. <br>
The class variables are:<br>
gpu_id: gpu number <br>
model: model used for training<br>
train_data: dataset for training<br>
optimizer: optimizer for gradients<br>
save_every: integer for stating number<br>

### Class functions:<br>
#### _run_batch:
    def _run_batch(self, source, targets):
        This function is used to run a batch job, the arguments are source and targets.
        self.optimizer.zero_grad() - makes all the gradient to zero
        self.model - calling source with model class
        F.cross_entropy - calculating loss function
        loss.backward -  to perform backpropagation: Computes the Gradient: It calculates the gradients of the loss function with respect to the model parameters (weights and biases).  Clearing gradients should be done by optimizer.zero_grad() before calling this
        Once the gradients are computed and stored, they can be used by an optimizer
        self.optimizer.step() -  is a method used in PyTorch that applies the gradients computed from loss.backward() to update the model's parameters

#### _run_epoch
    def _run_epoch(self, epoch):
        epoch: represents the current training epoch number
        b_sz: calculates the batch size used for training
        for source, targets in self.train_data: starts a loop over self.train_data, which is an iterable yielding batches of data
        source: source.to(self.gpu_id): moves the source tensor to the GPU specified by self.gpu_id
        targets: Similar to the previous line, this line moves the targets tensor to the GPU
        self._run_batch(source, targets): calls class _run_batch


#### _save_checkpoint
    def _save_checkpoint(self, epoch):
        loads and saves checkpoint

#### train
    begins calling classes
    

In [12]:
class Trainer:
    def __init__(
        self,
        model: torch.nn.Module,
        train_data: DataLoader,
        optimizer: torch.optim.Optimizer,
        gpu_id: int,
        save_every: int, 
    ) -> None:
        self.gpu_id = gpu_id
        self.model = model.to(gpu_id)
        self.train_data = train_data
        self.optimizer = optimizer
        self.save_every = save_every

    def _run_batch(self, source, targets):
        self.optimizer.zero_grad()
        output = self.model(source)
        loss = F.cross_entropy(output, targets)
        loss.backward()
        self.optimizer.step()

    def _run_epoch(self, epoch):
        b_sz = len(next(iter(self.train_data))[0])
        print(f"[GPU{self.gpu_id}] Epoch {epoch} | Batchsize: {b_sz} | Steps: {len(self.train_data)}")
        for source, targets in self.train_data:
            source = source.to(self.gpu_id)
            targets = targets.to(self.gpu_id)
            self._run_batch(source, targets)

    def _save_checkpoint(self, epoch):
        ckp = self.model.state_dict()
        PATH = "checkpoint.pt"
        torch.save(ckp, PATH)
        print(f"Epoch {epoch} | Training checkpoint saved at {PATH}")

    def train(self, max_epochs: int):
        for epoch in range(max_epochs):
            self._run_epoch(epoch)
            if epoch % self.save_every == 0:
                self._save_checkpoint(epoch)


### load_train_objs

MyTrainDataset is a custom dataset class designed to hold or generate training data.<br>
###### model = torch.nn.Linear(20, 1):<br>
    This line initializes a linear model (model) using PyTorch's torch.nn.Linear class, which creates a linear transformation from an input size of 20 to an output size of 1<br>
###### optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)<br>
    An optimizer object named optimizer is created using PyTorch's Stochastic Gradient Descent (SGD) optimization algorithm, using a learning rate (lr) of 0.001 (1e-3). The optimizer is responsible for updating the model's weights based on the gradients computed during the training process to minimize the loss function.<br>



In [13]:
def load_train_objs():
    train_set = MyTrainDataset(2048)  # load your dataset
    model = torch.nn.Linear(20, 1)  # load your model
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
    return train_set, model, optimizer

## Preparing dataloader 
The prepare_dataloader function is designed to wrap a given dataset into a PyTorch DataLoader, which is a powerful utility that provides an iterable over the given dataset. It abstracts the complexity of batching, shuffling, and loading the data in parallel with the use of workers. Here's an explanation of each part of the function:

###### dataset

    The first argument to the DataLoader constructor is the dataset itself. This is the dataset object from which data will be loaded.

###### batch_size=batch_size,

    This specifies the size of the batch that the DataLoader will return with each iteration. The batch_size argument determines how many data points are included in each batch. This is useful for specifying how much data your model will process in one forward/backward pass.

###### pin_memory=True,

    Setting pin_memory=True is a performance optimization that can lead to faster data transfer to CUDA-enabled GPUs by having the data loader allocate the samples in page-locked (or "pinned") memory. This setting is beneficial when using GPUs for training because it allows for more efficient transfer of data from host (CPU) to device (GPU) memory.

###### shuffle=True

    This argument specifies that the data should be shuffled at the beginning of each epoch. Shuffling the data is important for preventing the model from learning anything from the order of the samples, and thus, it helps to reduce overfitting and ensures that the model generalizes well.
)

In [14]:
def prepare_dataloader(dataset: Dataset, batch_size: int):
    return DataLoader(
        dataset,
        batch_size=batch_size,
        pin_memory=True,
        shuffle=True
    )

## The main function

###### dataset, model, optimizer = load_train_objs()

    Calls the load_train_objs function, which is expected to return a tuple containing three elements: the dataset (dataset), the model (model), and the optimizer (optimizer). This line unpacks the returned tuple into the three variables.

###### train_data = prepare_dataloader(dataset, batch_size)

    Invokes the prepare_dataloader function with the previously loaded dataset and the specified batch_size. This function returns a DataLoader object, which is assigned to train_data. The DataLoader provides an iterable over the dataset, automatically handling batching, shuffling, and the potential parallel loading of data.

###### trainer = Trainer(model, train_data, optimizer, device, save_every)

    Instantiates a Trainer object with the model, train_data DataLoader, optimizer, device, and save frequency (save_every). This Trainer class is not defined in the provided code but is presumably designed to encapsulate the training loop, handling the training process based on the given parameters.

###### trainer.train(total_epochs)

    Calls the train method of the Trainer object with total_epochs as its argument. This method likely executes the training loop, iterating over the dataset for the specified number of epochs, performing forward and backward passes, updating the model's weights, and possibly saving the model's state at intervals defined by save_every.

In [15]:
def main(device, total_epochs, save_every, batch_size):
    dataset, model, optimizer = load_train_objs()
    train_data = prepare_dataloader(dataset, batch_size)
    trainer = Trainer(model, train_data, optimizer, device, save_every)
    trainer.train(total_epochs)

In [16]:
if __name__ == "__main__":
    device = 0  # shorthand for cuda:0
    main(device, total_epochs=50, save_every=10, batch_size=32)

[GPU0] Epoch 0 | Batchsize: 32 | Steps: 64
Epoch 0 | Training checkpoint saved at checkpoint.pt
[GPU0] Epoch 1 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 2 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 3 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 4 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 5 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 6 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 7 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 8 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 9 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 10 | Batchsize: 32 | Steps: 64
Epoch 10 | Training checkpoint saved at checkpoint.pt
[GPU0] Epoch 11 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 12 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 13 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 14 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 15 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 16 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 17 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 18 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 19 | Batchsize: 32 | Steps: 64
[GPU0] Epoch 20 | Batch