## **Image Classification**

---
In transfer learning, we do not need to update the parameters of the entire model. Since our ViT has learned feature representations from millions of images, we can just choose to train the very last layers of our model to make it perform well on our new dataset.
we will be using [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) model from the Hugging Face hub.



In [None]:
!pip install fsspec==2023.9.2

In [None]:
!pip install --quiet evaluate

In [None]:
import torch
import torch.nn as nn
from huggingface_hub import notebook_login
from datasets import load_dataset, DatasetDict
from transformers import AutoImageProcessor, ViTForImageClassification, TrainingArguments, Trainer
from torchvision.transforms import ToTensor
import evaluate


In [None]:

!git config --global credential.helper store

In [None]:
#Login to hugging face
notebook_login()

###loading our image classification dataset
---
Using [Oxford-IIIT Pets Dataset](https://huggingface.co/datasets/pcuenq/oxford-pets).We'll be using [Hugging Face Datasets](https://huggingface.co/datasets) library to load our dataset easily from the hub.

In [None]:
dataset = load_dataset("pcuenq/oxford-pets",cache_dir=None)

In [None]:
dataset

In [None]:
print(dataset['train'][0])

In [None]:
# dataset['train'][0]['image'] directly returns a PIL Image object
image= dataset['train'][0]['image']
image.show()

In [None]:
labels = dataset['train'].unique('label')
print(len(labels),labels)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
torch.manual_seed(4209)
fig = plt.figure(figsize=(9,9))
rows=4
cols=4
for i in range (1,rows*cols+1):
  random_idx = torch.randint(0,len(dataset['train']),size=[1]).item()
  img = dataset['train'][random_idx]['image']
  label = dataset['train'][random_idx]['label']
  fig.add_subplot(rows,cols,i)
  plt.imshow(np.asarray(img).squeeze())
  plt.title(f"{label}")
  plt.axis(False)

## Preprocessing our dataset
---



Since we only have a `train` split in our original dataset, we'll use 80% of it for training and 10% for `validation` and remaining 10% as our `test` split. We can use the in-built `train_test_split` method to do so.

In [None]:
train_set = dataset['train'].train_test_split(test_size=0.2)# 80% train, 20% evaluation
val_set = train_set['test'].train_test_split(0.5)# 50% eval , 50% test

#combine our splits in a stateDIct form
our_dataset = DatasetDict({
    'train': train_set['train'],
    'validation': val_set['train'],
    'test': val_set['test']
})
our_dataset


Any model cannot understand the labels in their `string` format.
Creating two mappings, `label2id` and `id2label` to convert the labels to their IDs and vice versa. This is also be useful when we initialize our model to update its configuration.

In [None]:
label2id = {c:idx for idx,c in enumerate(labels)}
id2label = {idx:c for idx,c in enumerate(labels)}


### Image Processor

To apply the right transforms on our images, we will be using [AutoImageProcessor](https://huggingface.co/docs/transformers/main_classes/image_processor) which will apply the transforms according to the model we will use.

In [None]:
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
processor

To apply the transforms to a batch at the time of training, we can create a function which will preprocess the batch. The trainer will call this function when we add it to the dataset using `with_transform`.

At the time of training, we need to apply the transforms on a batch of samples. To handle the batches, we'll create a `transforms` function which will take care of the following:

- Converting all images into RGB
- Converting the string labels to integers
- Applying image transforms:

The resultant dataset features will be:
```py
{
    'pixel_values': torch.Tensor,
    'labels': List
}
```

We'll pair the function with our dataset using the `with_transform()` method.

In [None]:
def transforms(batch):
  batch['image']= [img.convert('RGB') for img in batch['image']]
  inputs = processor(images=batch['image'],return_tensors='pt')
  inputs['labels'] = [label2id[label] for label in batch['label']]
  return inputs

In [None]:
processed_dataset = our_dataset.with_transform(transforms)

#### Data Collation
 For `pixel_values`, the input shape for the model should be `(batch, channels, height, width)` and for `labels`, the shape should be `(batch,)`

In [None]:
def collate_fn(batch):
  return{
  'pixel_values' : torch.stack([item['pixel_values'] for item in batch]),
  'labels' : torch.tensor([item['labels'] for item in batch])
  }

###Evaluation Metrics

In [None]:
accuracy_metrics = evaluate.load("accuracy")
def compute_metrics(eval_pred):
  logits,labels = eval_pred
  #logits: raw model outputs with shape (batch_size, num_classes)
  #labels: true labels with shape (batch_size,).
  predictions = logits.argmax(-1)
  return accuracy_metrics.compute(predictions=predictions,references=labels)

###Load transformer model for transfer learning
We will use [ViTForImageClassification](https://huggingface.co/docs/transformers/main/en/model_doc/vit#transformers.ViTForImageClassification) to load our pre-trained model.

We will update the final classification layer, to output predictions equal to the number of labels in our dataset.


We also need to pass `ignore_mismatched_sizes = True` to compensate for the change in number of parameters in the classification layer.

In [None]:
model = ViTForImageClassification.from_pretrained(
    'google/vit-base-patch16-224',
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True

)

In [None]:
model

Since we are not going to update the entire model, we can "freeze" all the parameters except for the new `classifier` layer by setting `requires_grad` to False for each layer's parameters.

In [None]:
for name,p in model.named_parameters():
    if not name.startswith('classifier'):
        p.requires_grad = False

In [None]:
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")

##Training using HF Trainer

In [None]:
training_args = TrainingArguments(
    output_dir="./vit-base-oxford-iiit-pets",
    per_device_train_batch_size=8,
    eval_strategy="epoch",
    save_strategy="epoch",
    logging_steps=100,
    num_train_epochs=5,
    learning_rate=3e-4,
    save_total_limit=2,
    remove_unused_columns=False,
    push_to_hub=True,
    report_to='tensorboard',
    load_best_model_at_end=True,
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    train_dataset=processed_dataset['train'],
    eval_dataset=processed_dataset['validation'],
    tokenizer=processor
)


In [None]:
trainer.train()

evalluation

In [None]:
trainer.evaluate(processed_dataset['test'])

we get a great accuracy of 94%.

In [None]:
kwargs = {
    "finetuned_from": model.config._name_or_path,
    "dataset": 'pcuenq/oxford-pets',
    "tasks": "image-classification",
    "tags": ['image-classification'],
}

In [None]:
trainer.save_model()
trainer.push_to_hub('up and running', **kwargs)