## Importing requied modules

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use("ggplot")

import torch
import torchvision 
import torch.nn as nn
import torch.nn.functional as F
import timm
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm

import cv2
from PIL import Image

import os
import json
import random

from sklearn.model_selection import train_test_split

%matplotlib inline

In [None]:
## Setting up the enviornment 

In [None]:
# Initialize the distributed backend
# os.environ['MASTER_ADDR'] = 'localhost'
# os.environ['MASTER_PORT'] = '12345'
# torch.distributed.init_process_group(backend='nccl', world_size=4)

# For parallel TPUs,GPUs
os.environ["XLA_USE_BF16"] = "1"
os.environ["XLA_TENSOR_ALLOCATOR_MAXSIZE"] = "100000000"

These two lines of code are setting environment variables in the current Python session.

The first line sets the "XLA_USE_BF16" environment variable to the value "1". XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra operations that can be used to accelerate machine learning workloads on CPUs, GPUs, and TPUs. BF16 (bfloat16) is a floating-point format that uses 16 bits instead of the usual 32 bits used by the more common float32 format. By setting this environment variable to 1, you are telling XLA to use the BF16 format when possible, which can lead to faster performance on hardware that supports it.

The second line sets the "XLA_TENSOR_ALLOCATOR_MAXSIZE" environment variable to the value "100000000". This variable sets the maximum size, in bytes, that the XLA tensor allocator is allowed to allocate. The tensor allocator is responsible for managing the memory used by tensors (multidimensional arrays) in XLA computations. By increasing the maximum size, you are allowing XLA to allocate more memory, which can improve performance for larger models or datasets.

Overall, these environment variables are used to configure the XLA runtime to potentially improve the performance of machine learning workloads in your Python session.

## Loading and reading data

In [None]:
def seed_everything(seed):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministicdet = True
    torch.backends.cudnn.benchmark = False
seed_everything(1001)

**os.environ["PYTHONHASHSEED"] = str(seed)**

This line of code is setting an environment variable called "PYTHONHASHSEED" to the value of a variable named "seed", after converting it to a string using the str() function.

In Python, the built-in hash() function is used to generate hash values for objects like strings, tuples, and dictionaries. The hash value is used for various purposes, such as comparing objects for equality or storing them in a hash table.

The hash function in Python is based on the contents of the object being hashed and is therefore not deterministic across different runs of the program. This means that the hash value for an object can be different between different Python sessions, or even between different runs of the same program.

By setting the "PYTHONHASHSEED" environment variable to a fixed value, you are making the hash function deterministic within your Python session. This can be useful in certain situations, such as when you want to ensure that the same hash values are generated for the same objects across different runs of the program.

The value of "seed" can be any integer value. By setting it to a fixed value, you are ensuring that the hash function will always generate the same hash values for the same objects within your Python session.

**torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False**

These lines of code are setting the random seed for PyTorch operations, specifically for the CPU and GPU (CUDA) computations.

The first line sets the seed for the random number generator used by PyTorch on the CPU. The seed is an integer value that is used to initialize the random number generator, and by setting it to a fixed value, you are ensuring that the same sequence of random numbers is generated every time you run your program.

The second line sets the seed for the random number generator used by PyTorch on the GPU (CUDA). This is necessary if you are using CUDA to accelerate your computations, and you want to ensure that the same sequence of random numbers is generated every time you run your program.

The third line sets a flag in PyTorch's CUDA backend (cudnn) to ensure that the computations are deterministic. cudnn is a library that is used by PyTorch for fast convolutional operations on the GPU. By setting this flag to True, you are ensuring that the same results are generated every time you run your program.

The fourth line sets another flag in the cudnn backend to disable benchmarking. This is necessary when you want to ensure that the performance of your program is consistent across different runs. By default, cudnn will run a benchmark to determine the optimal configuration for the convolutional operations, but this can lead to inconsistencies in performance between different runs. By setting this flag to False, you are ensuring that the performance of your program is consistent, but you may sacrifice some performance gains that could be obtained from benchmarking.

## Setting Variables

In [None]:
data_path = "/kaggle/input/cassava-leaf-disease-classification/"
train_images = "/kaggle/input/cassava-leaf-disease-classification/train_images/"
test_images = "/kaggle/input/cassava-leaf-disease-classification/test_images/"
train_labels = "/kaggle/input/cassava-leaf-disease-classification/train.csv"
with open("/kaggle/input/cassava-leaf-disease-classification/label_num_to_disease_map.json","rb") as f:
    label_match = json.load(f)

In [None]:
df = pd.read_csv(train_labels)
df.head()

In [None]:
df.info()

In [None]:
df.label.value_counts().plot(kind="bar")
plt.show()

Visualizing the training images

In [None]:
image_files = os.listdir(train_images[:-1])[:9]
plt.figure(figsize=(10,5))
for i in range(len(image_files)):
    plt.subplot(3,3,i+1)
    image = cv2.imread(train_images+image_files[i])
    plt.imshow(image)
    a = df.iloc[i]["label"]
    plt.title(f"{a}:{label_match[str(a)]}",fontdict={'fontsize':9})
    plt.axis('off')
plt.subplots_adjust(wspace=1)
plt.show()

## Data formatting 

In [None]:
#splitting data into training and validation
train_df,val_df = train_test_split(df,test_size=0.1,random_state=42)
train_df.shape,val_df.shape

In [None]:
#preparing the dataset 
class CassavaDataset(torch.utils.data.Dataset):
    #Helper class to create pytorch dataset#
    def __init__(self,df,data_path = data_path,mode="train",transforms=None):
        super().__init__()
        self.df_data = df.values
        self.data_path = data_path
        self.transforms = transforms
        self.mode = mode
        self.data_dir = "train_images" if mode == "train" else "test_images"
    def __len__(self):
        return len(self.df_data)
    def __getitem__(self,index):
        img_name,label = self.df_data[index]
        img_path = os.path.join(self.data_path,self.data_dir,img_name)
        img = Image.open(img_path).convert("RGB")
        if self.transforms is not None:
            image = self.transforms(img)
        return image,label

## Augmentation 

In [None]:
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_tfms = transforms.Compose([
                 transforms.Resize((224, 224)),
                 transforms.RandomHorizontalFlip(p=0.3),
                 transforms.RandomVerticalFlip(p=0.3),
                 transforms.RandomCrop(224, padding=4, padding_mode='reflect'), 
                 transforms.ToTensor(), 
                 transforms.Normalize(*stats,inplace=True),])
valid_tfms = transforms.Compose([transforms.Resize((224, 224)),
                         transforms.ToTensor(), 
                         transforms.Normalize(*stats)])

In [None]:
print("Available Vision Transformer Models: ")
timm.list_models("vit*")

## Augmented Data 

In [None]:
train_dataset = CassavaDataset(train_df, transforms=train_tfms)
valid_dataset = CassavaDataset(val_df, transforms=valid_tfms)

In [None]:
#DataLoader
train_loader = DataLoader(train_dataset,100,shuffle=True,num_workers=2)
val_loader = DataLoader(valid_dataset,100,num_workers=2)

### Image Classfication base

In [None]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))


class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels)  # Calculate loss
        return loss

    def validation_step(self, batch):
        images, labels = batch
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
    
    def add_batch(self, x):
        return x.unsqueeze(0)

    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

    def epoch_end(self, epoch, result):
        print("Epoch [{}],{} train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, "last_lr: {:.5f},".format(result['lrs'][-1]) if 'lrs' in result else '', 
            result['train_loss'], result['val_loss'], result['val_acc']))
        
    @torch.no_grad()
    def evaluate(model, val_loader):
        model.eval()
        outputs = [model.validation_step(batch) for batch in val_loader]
        return model.validation_epoch_end(outputs)

In [None]:
class ViT(ImageClassificationBase):
    def __init__(self, num_classes, pretrained=True):
        super().__init__()
        self.network = timm.create_model('vit_base_patch16_224', pretrained=pretrained)
        self.network.head = nn.Linear(self.network.head.in_features, num_classes)
    def forward(self, xb):
        return self.network(xb)

In [None]:
def get_default_device():
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
def to_device(data, device):
    if isinstance(data, (list, tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)
class DeviceDataLoader():
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
    def __iter__(self):
        for b in self.dl:
            yield to_device(b, self.device)
    def __len__(self):
        return len(self.dl)

In [None]:
def fit(epochs, max_lr, model, train_loader, val_loader,
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    for epoch in range(epochs):
        model.train()
        train_losses = []
        lrs = []
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            if grad_clip:
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            optimizer.step()
            optimizer.zero_grad()
        result = model.evaluate(val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader,
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs,
                                                steps_per_epoch=len(train_loader))
    for epoch in range(epochs):
        model.train()
        train_losses = []
        lrs = []
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            if grad_clip:
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            optimizer.step()
            optimizer.zero_grad()
            lrs.append(get_lr(optimizer))
            sched.step()
        result = model.evaluate(val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history

In [None]:
device = get_default_device()
device

In [None]:
train_dl = DeviceDataLoader(train_loader, device)
valid_dl = DeviceDataLoader(val_loader, device)

## Finetuning the Pretrained Model

In [None]:
# model = ViT(df.label.nunique())
# to_device(model, device)

In [None]:
# history = [model.evaluate(valid_dl)]
# history

In [None]:
epochs = 15
max_lr = 0.01
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam

In [None]:
# history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl,grad_clip=grad_clip, weight_decay=weight_decay, opt_func=opt_func)

In [None]:
model1 = ViT(df.label.nunique())
to_device(model1, device)

In [None]:
def fit(epochs, max_lr, model, train_loader, val_loader,
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    for epoch in range(epochs):
        model.train()
        train_losses = []
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            if grad_clip:
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            optimizer.step()
            optimizer.zero_grad()
        result = model.evaluate(val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        model.epoch_end(epoch, result)
        history.append(result)
    return history

In [None]:
history = [model1.evaluate(valid_dl)]
history

In [None]:
history += fit(epochs, max_lr, model1, train_dl, valid_dl,grad_clip=grad_clip, weight_decay=weight_decay, opt_func=opt_func)