## Introduction to Hugging Face and Pipelines

**Hugging Face** is a company that has created a platform for building, training, and deploying machine learning models, especially those based on the Transformer architecture. One of the most popular aspects of Hugging Face is their Transformers library, which provides pre-trained models for a variety of tasks. 

**Pipelines** are a high-level component of the Transformers library. They abstract away most of the complicated details and provide a simple interface to use these models for various tasks such as text classification, translation, summarization, and more.
___

## Setup and Installation

Before we dive in, we need to set up our environment. This requires installing the Hugging Face Transformers library as well as other dependencies to make it work with paperspace. Normally, you would execute the following command:

In [None]:
!pip install -q --upgrade transformers torch torchvision torchaudio
!pip install -q tokenizers==0.14 evaluate
!pip install -q bitsandbytes transformers accelerate gradio thread6

---
## Overview of Different Pipelines

Pipelines can be used for a multitude of tasks. Some of the common ones include:

- **Text Classification**: Categorizing a piece of text into one or multiple categories.
- **Token Classification**: Classifying individual words or tokens in a sentence.
- **Text Generation**: Generating coherent and contextually relevant text based on a given prompt.
- **Translation**: Translating text from one language to another.
- **Summarization**: Providing a concise summary of a longer text.
- ... and many more!

For our demonstration, we'll focus on a single task "Zero-Shot Image Classification"

---
## How does Zero-Shot Image Classification work?

At a high level, zero-shot classification combines the power of natural language processing and vision. The idea is to represent both the image and possible labels (categories) in a shared embedding space. If the embeddings of the image and a label are close in this space, it means the image likely belongs to that category.

**Here's a simplified step-by-step process:**
   - The image is passed through a vision model (like a CNN) to get an image embedding.
   - Each potential label (like "cat" or "dog") is passed through a text model to get label embeddings.
   - The distance between the image embedding and each label embedding is computed.
   - The label with the smallest distance to the image embedding is predicted as the category of the image.

---
### Importing Libraries and Model Initialization

In the realm of data analysis and machine learning, there are a multitude of tools and libraries at our disposal. Think of them as the various departments in a company, each specializing in a specific task. Just as a business would call upon its marketing department to launch a campaign, we call upon specific libraries to execute certain functions.

In this cell, we're:
1. **Importing Libraries:** Bringing in specialized tools to handle tasks like fetching images from the internet and processing them.
2. **Model Initialization:** Setting up our machine learning model, which in this context, is like hiring an expert. We're using a model specifically designed for vision tasks, making it our 'visual expert' for the images we'll analyze.

This setup is crucial as it lays the foundation for the tasks we'll execute in the subsequent steps. In business terms, it's akin to ensuring you have the right team 
and strategy before launching a project.


 **Disregard any warnings that may appear.**

In [None]:
# requests: To make HTTP requests and fetch the image from the given URL
# transformers: Hugging Face's library for using transformer models
# Image from PIL: To process and manipulate the image data
import requests
from transformers import pipeline
from PIL import Image

# Specify the checkpoint for the model we want to use.
# In this case, we're using the "openai/clip-vit-large-patch14" which is a model designed for vision tasks.
checkpoint = "openai/clip-vit-large-patch14"

# Initialize the zero-shot image classification pipeline.
# This pipeline will use the specified model checkpoint and is designed for zero-shot image classification.
detector = pipeline(model=checkpoint, task="zero-shot-image-classification")

### Fetching and Displaying the Image

In this step, we are working with real-world data. We fetch an image directly from a provided URL. Imagine you're browsing the internet and you come across an image you'd like to categorize. Instead of manually downloading the image, we can programmatically fetch it using a simple line of code.

The displayed image below is what we'll be attempting to classify in the next step. It's essential to visually inspect our data (in this case, the image) to have an understanding of what we are working with.

In [None]:
# Provide the URL of the image we want to classify.
# This is an image from google and we'll fetch it directly using the requests library. (look at the lil doge)
url = "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcR2v8jGQFEHwDE0bEIm2Sofs-0n5RUWyiNtY_JQw46IozVB-YPU"

# Fetch the image from the provided URL and open it using PIL's Image module.
# The "stream=True" ensures that the content isn't downloaded until accessed, and ".raw" provides the raw content.
image = Image.open(requests.get(url, stream=True).raw)

# Display the fetched image (this will render the image in the Jupyter notebook or the respective environment).
print("lil doge")
image

### Image Classification

Once we have our image, the next logical step is to determine what's depicted in it. Remember, in the world of business, quick and accurate decision-making can be crucial. If we had thousands of images, manually categorizing them would be time-consuming and prone to errors.

Here, we use a powerful machine learning model for a task called "zero-shot image classification." In layman's terms, this means that even if the model hasn't explicitly been trained on a specific category (e.g., "fox" or "bear"), it can still make educated guesses based on its knowledge. 

The results below show the model's confidence level for each category. The higher the score, the more confident the model is that the image belongs to that category. This is similar to how a business decision might be backed by data-driven insights, providing a level of confidence in the choices made.

In [None]:
# Use the detector (zero-shot image classification pipeline) to classify the image.
# We provide a list of candidate labels ["fox", "bear", "seagull", "owl"] and ask the model to classify the image among these labels.
predictions = detector(image, candidate_labels=["fox", "bear", "seagull", "owl"])

# Print the classification results.
print(predictions)

### Assignment: Image Classification Challenge

Now that you've seen the power of machine learning in image classification, it's time to put your knowledge to the test!

**Task:** 
1. Choose an image from the internet that you're curious about. It could be any subject or object, but preferably something not overly complex.
2. Use the tools and code structure we've discussed to fetch the image and display it.
3. Utilize the zero-shot image classification pipeline to classify your chosen image among a set of candidate labels you provide.
4. Analyze the results. Did the model correctly classify your image? Were you surprised by the outcome?

Remember, this is a hands-on way to understand the potential and limitations of machine learning models. By choosing different images and labels, you'll gain insights into how the model thinks and makes decisions.

**Hint:** You can re-use and modify the code from the previous cells. Just make sure to provide a new image URL and set of candidate labels. Be creative and have fun!


In [None]:
# TODO: Provide the URL of the image you want to classify
url_assignment = "YOUR_IMAGE_URL_HERE"

# TODO: Fetch the image from the provided URL and display it
image_assignment = Image.open(requests.get(url_assignment, stream=True).raw)
image_assignment

# TODO: Classify the image using the zero-shot image classification pipeline
# Provide a list of candidate labels for classification
predictions_assignment = detector(image_assignment, candidate_labels=["YOUR_LABEL_1", "YOUR_LABEL_2", "YOUR_LABEL_3", "..."])

# Print the classification results
print(predictions_assignment)
