# Final Project: Hugging Face (Generating the Model)

This code uses : [https://huggingface.co/docs/datasets/en/quickstart#vision]

Important Note: In this code I will be showing how to train a model for image classification. In order to run and edit this code, you must first create a Hugging Face account and login while running this code. This way, the Finetuned model may be saved into your profile. 

If you chose to not generate your own model and simply test my plant_classification model, check the testing_model.ipynb notebook

In [1]:
### MAKE SURE YOU HAVE ALL OF THIS INSTALLED BEFORE RUNNING THE CODE ###

# Pip intalling necessary items
! pip install datasets
! pip install datasets[vision] #used to work with the Image features 

# I will be using Pytorch for this project, so I will install it here. But you can use Tensorflow if you prefer
! pip install torch
! pip install torchvision


# I will be using the Hugging Face Transformers library for this project
! pip install transformers[torch]
! pip install "accelerate>=0.26.0"
! pip install transformers
! pip install evaluate
! pip install Pillow

! pip install keras
! pip install tensorflow
! pip install astroNN
! pip install --upgrade tensorflow tensorflow-probability
! pip install tensorflow-probability\[tf\]


zsh:1: no matches found: datasets[vision]
zsh:1: no matches found: transformers[torch]


In [2]:
import numpy as np 
import evaluate
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

from datasets import load_dataset, Image
from transformers import AutoImageProcessor
from torchvision.transforms import Compose, ColorJitter, ToTensor,  RandomResizedCrop, Normalize
from transformers import DefaultDataCollator
from torchvision import transforms as Trans
from torch.utils.data import DataLoader
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from transformers import ViTFeatureExtractor, ViTForImageClassification, ViTImageProcessor, SwinForImageClassification
from transformers import pipeline
from PIL import Image
from evaluate import load




This is the dataset that we will be playing with: [https://huggingface.co/datasets/AI-Lab-Makerere/beans]

In [3]:
# loading the Beans dataset
# beans dataset is a public Hugging Face dataset 
# It contans images of plants and their health classification
dataset = load_dataset("beans") 

In [4]:
# In order to check the features of the training data, we can use the following
# This shows that there are three different classes in the datset: angular_leaf_spot, bean_rust, healthy 
dataset["train"].features

{'image_file_path': Value(dtype='string', id=None),
 'image': Image(mode=None, decode=True, id=None),
 'labels': ClassLabel(names=['angular_leaf_spot', 'bean_rust', 'healthy'], id=None)}

In [5]:
# save the label names in a variable 
labels = dataset['train'].features['labels'].names

In [6]:
# A feature extractor is needed to preprocess the image into a TENSOR
# I use the Swin Transformer for this project
checkpoint = "microsoft/swin-tiny-patch4-window7-224"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)


In [None]:
# Define a transform pipeline for random cropping
random_crop_transform = Trans.Compose([
    Trans.RandomResizedCrop(size=(224, 224)),  # Random crop to 224x224
    Trans.RandomHorizontalFlip(),              # Optional: Add random flipping for augmentation
])

In [8]:
# This is the feature extractor that will be used to preprocess the images
# What this does is that it takes the image and preprocesses it into a tensor
# You must have your data in tensor form before passing it into the training model
def transforms(example_batch):
    # Take a list of PIL images and turn them to pixel values
    # inputs = image_processor([x for x in example_batch['image']], return_tensors='pt')

    cropped_images = [random_crop_transform(x) for x in example_batch['image']]
    inputs = image_processor(cropped_images, return_tensors='pt')

    # Don't forget to include the labels!
    inputs['labels'] = example_batch['labels']
    return inputs

In [9]:
# Apply the transform function to the dataset
dataset_new = dataset.with_transform(transforms)

In [10]:
# Now we combine individual processed samples from the dataset into a single batch to be passed to the model
# for this, I will use the function below. It is nice to have this function, because the default collate function, may not work 
# with the custom dataset that we are using.

def collate_fn(batch):
    return {
        'pixel_values': torch.stack([x['pixel_values'] for x in batch]),
        'labels': torch.tensor([x['labels'] for x in batch])
    }


In [11]:
# Next, we need to define metric and a function that will compute the metrics for when we train our model. 
# This will help evaluate the performance of the model
accuracy = load("accuracy")
metric = evaluate.load("accuracy")

def compute_metrics(p):
    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)



In [12]:
labels = dataset['train'].features['labels'].names

# Load the model with ignore_mismatched_sizes to bypass a size mismatch issue
model = AutoModelForImageClassification.from_pretrained(
    checkpoint,
    num_labels=len(labels),
    id2label={str(i): c for i, c in enumerate(labels)},
    label2id={c: str(i) for i, c in enumerate(labels)},
    ignore_mismatched_sizes=True
)

# Create a new classifier layer with the correct number of labels
model.classifier = nn.Linear(in_features=model.classifier.in_features, out_features=len(labels))


Some weights of SwinForImageClassification were not initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([3]) in the model instantiated
- classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([3, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Now we can define the training arguments
# These arguments are the ones that typically change according to your model and dataset
training_args = TrainingArguments(
    output_dir="plant_classification",           # This is the output directory where the model checkpoints and predictions will be written
    per_device_train_batch_size=16,              # Batch size for training
    evaluation_strategy="steps",                 # Evaluate the model every 100 steps (same as eval_steps)
    num_train_epochs=3,                          # We will train the model for 2 epochs              
    warmup_ratio=0.1,                            # Ratio of total steps used for a linear warmup from 0 to the initial learning rate           
    weight_decay=0.01,                           # Regularization term to prevent overfitting
    fp16=False,                                  # Disable mixed precision training (useful for GPUs; enable by setting this to True)
    save_steps=100,                              # Save model checkpoint every 100 steps
    eval_steps=100,                              # Evaluate the model every 100 steps
    logging_steps=10,                            # Log metrics and training progress every 10 steps
    learning_rate=2e-5,                          # Initial learning rate for the optimizer
    save_total_limit=2,                          # Keep only the last 2 checkpoints to save disk space
    remove_unused_columns=False,                 # Prevent removing unused columns from the dataset during training
    push_to_hub=True,                            # Save the trained model to the Hugging Face Hub
    report_to='tensorboard',                     # Log metrics and training information to TensorBoard
    load_best_model_at_end=True,                 # Automatically load the best model at the end of training
    no_cuda=True,                                # Force the use of CPU (if you have a GPU, remove this line, and you may set fp16 to True)
)    



In [18]:
# Now we can define the trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    train_dataset=dataset_new["train"],
    eval_dataset=dataset_new["validation"],
    tokenizer=image_processor,
)

  trainer = Trainer(


In [None]:
# Here we actually train the model and evaluate it
#### This takes about 4 minutes to run on a CPU ####
train_results = trainer.train()

trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)
trainer.save_state()

  0%|          | 0/195 [00:00<?, ?it/s]

{'loss': 0.1489, 'grad_norm': 6.943637371063232, 'learning_rate': 1e-05, 'epoch': 0.15}
{'loss': 0.1356, 'grad_norm': 9.124574661254883, 'learning_rate': 2e-05, 'epoch': 0.31}
{'loss': 0.1911, 'grad_norm': 7.689700603485107, 'learning_rate': 1.885714285714286e-05, 'epoch': 0.46}
{'loss': 0.1066, 'grad_norm': 7.400233745574951, 'learning_rate': 1.7714285714285717e-05, 'epoch': 0.62}
{'loss': 0.0544, 'grad_norm': 3.349994659423828, 'learning_rate': 1.6571428571428574e-05, 'epoch': 0.77}
{'loss': 0.134, 'grad_norm': 9.765637397766113, 'learning_rate': 1.542857142857143e-05, 'epoch': 0.92}
{'loss': 0.1074, 'grad_norm': 19.633617401123047, 'learning_rate': 1.4285714285714287e-05, 'epoch': 1.08}
{'loss': 0.1085, 'grad_norm': 3.591390609741211, 'learning_rate': 1.3142857142857145e-05, 'epoch': 1.23}
{'loss': 0.1634, 'grad_norm': 27.50710105895996, 'learning_rate': 1.2e-05, 'epoch': 1.38}
{'loss': 0.1008, 'grad_norm': 4.543381690979004, 'learning_rate': 1.0857142857142858e-05, 'epoch': 1.54}


  0%|          | 0/17 [00:00<?, ?it/s]

{'eval_loss': 0.03722032904624939, 'eval_accuracy': 0.9849624060150376, 'eval_runtime': 5.0736, 'eval_samples_per_second': 26.214, 'eval_steps_per_second': 3.351, 'epoch': 1.54}
{'loss': 0.1279, 'grad_norm': 2.21695613861084, 'learning_rate': 9.714285714285715e-06, 'epoch': 1.69}
{'loss': 0.2238, 'grad_norm': 6.652666091918945, 'learning_rate': 8.571428571428571e-06, 'epoch': 1.85}
{'loss': 0.0992, 'grad_norm': 0.9972898960113525, 'learning_rate': 7.428571428571429e-06, 'epoch': 2.0}
{'loss': 0.0509, 'grad_norm': 18.499359130859375, 'learning_rate': 6.285714285714286e-06, 'epoch': 2.15}
{'loss': 0.0489, 'grad_norm': 6.813304424285889, 'learning_rate': 5.142857142857142e-06, 'epoch': 2.31}
{'loss': 0.1177, 'grad_norm': 1.1648198366165161, 'learning_rate': 4.000000000000001e-06, 'epoch': 2.46}
{'loss': 0.0439, 'grad_norm': 1.3854161500930786, 'learning_rate': 2.8571428571428573e-06, 'epoch': 2.62}
{'loss': 0.1212, 'grad_norm': 16.453393936157227, 'learning_rate': 1.7142857142857145e-06, 

In [21]:
metrics = trainer.evaluate()

# some nice to haves:
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

  0%|          | 0/17 [00:00<?, ?it/s]

***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9699
  eval_loss               =     0.0792
  eval_runtime            = 0:00:04.49
  eval_samples_per_second =     29.575
  eval_steps_per_second   =       3.78


Congrats, you have now trained your model. Check 'testing_model' for a way in which you can test your model