
# Image Classification with Huggingface's Deep Learning Model

Image classification is the task of assigning a label to an image from a predefined set of categories. It's one of the core tasks in Computer Vision that, despite its simplicity, has a large variety of practical applications. 

In this lab, we'll use a model from Huggingface's model hub to classify images. Deep learning models, especially Convolutional Neural Networks (CNNs), have shown impressive performance on image classification tasks. They automatically and adaptively learn spatial hierarchies of features from images.
___
## How does Image Classification work?

1. **Feature Extraction**: The model reads the image and extracts important features from it. This is done using convolutional layers, which have filters that slide over the image to capture local patterns.
2. **Classification**: After extracting features, the model uses fully connected layers to classify the image into one of the categories.
3. **Output**: The model outputs a probability distribution over all categories. The category with the highest probability is the model's prediction.

Remember, the deeper the network (more layers), the more complex patterns it can capture. But it also becomes computationally expensive.

Now, let's dive into the code and see the model in action!

---
## Understanding Image Classification at a Low Level

The image below provides a visual representation of a Convolutional Neural Network (CNN), which is commonly used in image classification tasks.

![CNN Architecture](https://www.mdpi.com/applsci/applsci-12-12873/article_deploy/html/images/applsci-12-12873-g001.png)

---
A CNN processes an image through multiple layers:
- **Input Layer**: Takes the raw image pixels.
- **Hidden Layers**: Layers in between the input and output, responsible for feature extraction and transformation.
- **Output Layers**: The final layer that provides the classification result based on the processed features.

Each layer transforms the image data, capturing patterns and features that help the model make its final prediction.


---
# The model we'll be using in this notebook is a variation of Microsofts BEiT Model
## BEiT Model Overview

The BEiT model is a Vision Transformer pre-trained in a self-supervised manner on ImageNet-22k and fine-tuned on the same dataset at resolution 224x224. It's capable of image classification tasks and introduced a unique pre-training objective to predict visual tokens from masked patches. The model processes images as sequences of fixed-size patches and employs relative position embeddings for capturing spatial relationships among patches. The model was introduced by Hangbo Bao, Li Dong, and Furu Wei in the paper titled "BEiT: BERT Pre-Training of Image Transformers"&#8203;

- **Origin**: Introduced by Hangbo Bao, Li Dong, and Furu Wei in the paper titled "BEiT: BERT Pre-Training of Image Transformers".
- **Pre-training**: Pre-trained on ImageNet-22k dataset in a self-supervised manner.
- **Fine-tuning**: Fine-tuned on the same dataset at resolution 224x224.
- **Architecture**: Vision Transformer (ViT) that processes images as sequences of fixed-size patches.
- **Position Embeddings**: Employs relative position embeddings instead of absolute ones.
- **Pre-training Objective**: Unique objective to predict visual tokens from masked patches.
- **Classification**: Capable of image classification tasks&#8203;
___

### Installing requirements to run huggingface on paperspace

This code cell does the following operations:



In [None]:
!pip install -q --upgrade transformers torch torchvision torchaudio
!pip install -q tokenizers==0.14 evaluate
!pip install -q bitsandbytes transformers accelerate gradio thread6

---
### The Model is about 414Megabytes.

This code cell does the following operations:

- Imports necessary libraries and modules.
- Opens and displays the image that was classified.


In [None]:
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
from transformers import pipeline

# Define the classifier pipeline
classifier = pipeline(model="microsoft/beit-base-patch16-224-pt22k-ft22k")

# URL of the image
image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png"

# Load and display the image
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
plt.imshow(img)
plt.axis('off')  # Turn off axis numbers and ticks
plt.show()

# Classify the image and print the result
result = classifier(image_url)
print(result)


## Interpreting the Model's Output

When we provide an image to the model, it returns a set of predictions. Each prediction consists of:
- **Label**: The category or class the model believes the image belongs to.
- **Score**: A confidence score (usually between 0 and 1) indicating how sure the model is about its prediction.

The higher the score, the more confident the model is that the image belongs to the corresponding label. In this notebook, we visualize these predictions using a bar chart, where each bar represents a label, and the length of the bar indicates the confidence score.
___


This code cell does the following operations:

- Creates a visualization using `matplotlib`.
- Plots a horizontal bar chart to visualize model predictions.
- Opens and displays the image that was classified.


In [None]:
# Extract scores and labels from the result
scores = [item['score'] for item in result]
labels = [item['label'] for item in result]

# Plotting
fig, ax = plt.subplots(figsize=(10, 6))

# Create horizontal bar chart
bars = ax.barh(labels, scores, color='skyblue')

# Invert the y-axis to have the highest score at the top
ax.invert_yaxis()

# Set labels for x and y axis
ax.set_xlabel('Prediction Score')
ax.set_ylabel('Label')
ax.set_title('Model Predictions')

# Display the scores next to the bars
for bar in bars:
    width = bar.get_width()
    ax.text(width - 0.03, bar.get_y() + bar.get_height()/2, 
            f'{width:.2%}', 
            ha='center', va='center', color='black')

# Display the image and the chart
img = Image.open(BytesIO(response.content))
plt.figure(figsize=(5,5))
plt.imshow(img)
plt.axis('off')

plt.tight_layout()
plt.show()

---
This code cell does the following operations:

- Imports necessary libraries and modules.
- Creates a visualization using `matplotlib`.
- Plots a horizontal bar chart to visualize model predictions.
- Opens and displays the image that was classified.
___

The interactive cell below asks for an image url. 
- simply go to google images.
- search for your desired image
- right click the image desired and click "copy image link"
- paste image link in the "image URL" pop up after you run the cell
- click the 'Classify and plot' button

In [None]:
from transformers import pipeline
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from ipywidgets import widgets, HBox, VBox, Layout, Output
from IPython.display import display, clear_output

def classify_and_plot(image_url):
    """
    Classifies the image from the given URL using HuggingFace pipeline and plots the results.
    
    Args:
    - image_url (str): URL of the image to be classified.
    """
    # Classify the image and store the result
    result = classifier(image_url)

    # Extract scores and labels from the result
    scores = [item['score'] for item in result]
    labels = [item['label'] for item in result]

    # Plotting the classified results
    fig, ax = plt.subplots(figsize=(10, 6))

    # Create horizontal bar chart
    bars = ax.barh(labels, scores, color='skyblue')

    # Invert the y-axis to have the highest score at the top
    ax.invert_yaxis()

    # Set labels for x and y axis
    ax.set_xlabel('Prediction Score')
    ax.set_ylabel('Label')
    ax.set_title('Model Predictions')

    # Display the scores next to the bars
    for bar in bars:
        width = bar.get_width()
        ax.text(width - 0.03, bar.get_y() + bar.get_height()/2, 
                f'{width:.2%}', 
                ha='center', va='center', color='black')

    # Display the image and the chart
    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))
    plt.figure(figsize=(5,5))
    plt.imshow(img)
    plt.axis('off')

    plt.tight_layout()
    plt.show()

# Define the classifier pipeline
classifier = pipeline(model="microsoft/beit-base-patch16-224-pt22k-ft22k")

# Create an Output widget to capture and display the function's output
output_widget = Output()

# Create a Text widget for the image URL
image_url_widget = widgets.Text(
    value='',
    placeholder='Enter image URL',
    description='Image URL:',
    disabled=False
)

# Create a Button widget to trigger the classification and plotting
classify_button = widgets.Button(
    description='Classify and Plot',
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to classify and plot',
    icon='check'
)

# Define the button click event
def on_button_click(button):
    with output_widget:
        clear_output(wait=True)
        classify_and_plot(image_url_widget.value)

# Bind the button click event to the Button widget
classify_button.on_click(on_button_click)

# Create a UI with the widgets
ui = VBox([image_url_widget, classify_button, output_widget])

# Display the UI
display(ui)

---
## Huggingface and Pipelines

[Huggingface](https://huggingface.co/) provides a platform for the community to upload and share their trained models. Their ecosystem revolves around the Transformers library, which offers implementations of state-of-the-art deep learning models for various tasks.

One of the main features Huggingface offers is the **pipeline**. A pipeline abstracts away the complexities of setting up a model for inference. Instead of worrying about tokenization, model loading, and post-processing, you can simply use a pipeline to get predictions. It's like a high-level helper that streamlines the process from input to output.

For instance, the image classification task we're doing in this notebook uses a pipeline. We simply provide an image, and the pipeline handles the rest, giving us the predicted label.

### Benefits of Pipelines:
1. **Simplicity**: No need to deal with the intricacies of model setup.
2. **Versatility**: Pipelines support a wide range of tasks, from text classification to image recognition.
3. **Efficiency**: Quickly switch between models and tasks with minimal code changes.
___


## Exploring the Huggingface Model Hub

The [Huggingface Model Hub](https://huggingface.co/models) is a repository of thousands of pre-trained models. You can explore models for various tasks, including image classification, text generation, translation, and more.

### Steps to Use a New Model:
1. Visit the [Model Hub](https://huggingface.co/models).
2. Use the filters to narrow down models suitable for "Image Classification".
3. Choose a model and take note of its ID (usually in the format `username/model-name`).
4. Replace the model ID in the code below with your chosen model's ID.
5. Provide an image and get the prediction!


In [None]:
# Replace 'your-model-id' with the model ID you chose from the Model Hub
model_id = 'your-model-id'

# Load the pipeline for your chosen model
new_pipeline = pipeline('image-classification', model=model_id)

# Classify an image (replace 'path_to_your_image.jpg' with your image path)
image_path = 'path_to_your_image.jpg' 
result = new_pipeline(image_path)
print(result)


- To set the "image_path" variable, begin by uploading an image through the "Upload Files" button located on the left side of the screen.

- Following the upload, right-click the image and select the "Copy Path" option.
- Now, you can paste the copied path in place of 'path_to_your_image.jpg.'
- Make sure the path is inside quotes!