<a href="https://colab.research.google.com/github/dantebarton/collabNotebook/blob/main/HW1_Dante_Barton.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **`Exploring different pretrained models in Hugging face`**

Objective: The primary goal of this assignment is to familiarize students with the Python programming environment and the use of pretrained models on Hugging Face. This foundational knowledge will be crucial for training and fine-tuning our own models in future assignments.

1.Setup and Requirements Installation


In [None]:
!pip install transformers
!pip install datasets

Collecting datasets
  Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dill, multiprocess, datasets
Successfully installed datasets-2.17.1 dill-0.3.8 multiprocess-0.70.16


2. Sentiment Analysis with a Pretrained Model:

We will start with a sentiment analysis task using a pretrained model from Hugging Face. Access the model via this link:
https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest


Exercise 1:

Use the following Python code to perform sentiment analysis. Your task is to modify the text variable with different prompts and observe how the model's sentiment predictions change. Pay attention to preprocessing the text for optimal model performance.

In [None]:
# Set up dependencies and load the model
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax

# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# Example running case

In [None]:
# Customize your input query
text = "I'm so happy!"

# Preprocess sentence before passing to the model
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
# Pass the input to the model and get the raw output
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

1) positive 0.9845
2) neutral 0.0098
3) negative 0.0057


# Integrate the sampling code into a single function

In [None]:
def clasifySentence(text):
    # Preprocess sentence before passing to the model
    text = preprocess(text)
    encoded_input = tokenizer(text, return_tensors='pt')
    # Pass the input to the model and get the raw output
    output = model(**encoded_input)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)

    # Print labels and scores
    ranking = np.argsort(scores)
    ranking = ranking[::-1]
    for i in range(scores.shape[0]):
        l = config.id2label[ranking[i]]
        s = scores[ranking[i]]
        print(f"{i+1}) {l} {np.round(float(s), 4)}")

In [None]:
clasifySentence('I am excited for my vacation')

1) positive 0.9822
2) neutral 0.0137
3) negative 0.0041


In [None]:
clasifySentence('I am learning Flutter today.')

1) neutral 0.5437
2) positive 0.446
3) negative 0.0104


In [None]:
clasifySentence('I feel hungry.')

1) neutral 0.625
2) negative 0.2138
3) positive 0.1612


Exercise 2:

 * Select another sentiment analysis model from Hugging Face and compare its performance with the model used in Exercise 1. Document your findings.

 * Encapsulate the prediction task into a single function, like the example we provided in Exercise 1.

List of text classification pretrained models:

https://huggingface.co/models?pipeline_tag=text-classification&sort=trending


In [None]:
# Set up dependencies and load the model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#Preprocess text
def preprocess_text(text):

  return text

MODEL = f"michellejieli/emotion_text_classifier"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

In [None]:
# Function to perform sentiment analysis using a specified model
def predict_sentiment(text, tokenizer, model):
    preprocessed_text = preprocess_text(text)
    inputs = tokenizer(preprocessed_text, return_tensors="pt")
    outputs = model(**inputs)
    predicted_class = torch.argmax(outputs.logits, dim=1).item()
    return model.config.id2label[predicted_class], torch.max(outputs.logits).item()

# Example text for analysis
text = "I hate this product! It's terrible!"

# Perform sentiment analysis using the third model
sentiment3_label, sentiment3_score = predict_sentiment(text, tokenizer, model)
print("Model 3 Sentiment:", sentiment3_label, "- Score:", sentiment3_score)


Model 3 Sentiment: disgust - Score: 3.9379727840423584


In [None]:
def classifySentence(text):
  sentiment_label, sentiment_score = predict_sentiment(text, tokenizer, model)
  print("Sentiment:", sentiment_label, "- Score:", sentiment_score)


In [None]:
classifySentence("I am really upset today!")

Sentiment: sadness - Score: 2.237473964691162


In [None]:
classifySentence("I am overly excited today")

Sentiment: joy - Score: 6.344247341156006


In [None]:
classifySentence("What?")

Sentiment: surprise - Score: 4.341314315795898


Exercise 3:
 * Utilize the ResNet 50 pretrained model for image classification. You can access the model through this link:
 https://huggingface.co/microsoft/resnet-50
 * Your task is to encapsulate the prediction task into a single function, like the example we provided in Exercise 1.
 * Pick and upload your own images.
 * Classify and visualize 3 custom images using this model.




In [None]:
from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from PIL import Image
import matplotlib.pyplot as plt

 # Load the image processor and ResNet-50 model
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")

def classify_custom_image(image_path):

    # Open and preprocess the custom image
    image = Image.open(image_path)
    inputs = processor(image, return_tensors="pt")

    # Perform classification
    with torch.no_grad():
        logits = model(**inputs).logits

    # Get the predicted label
    predicted_label = logits.argmax(-1).item()
    predicted_class = model.config.id2label[predicted_label]

    return predicted_class

# Define paths to three custom images
image_paths = ["/content/Husky.webp", "/content/taco.webp", "/content/kiwi.jpeg"]

 # Classify and visualize each custom image
for image_path in image_paths:
    predicted_class = classify_custom_image(image_path)

    # Display the image along with its predicted class
    print(f"Image: {image_path} - Predicted Class: {predicted_class}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/266 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/69.6k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/102M [00:00<?, ?B/s]

FileNotFoundError: [Errno 2] No such file or directory: '/content/Husky.webp'

In [None]:
classify_custom_image("/content/Husky.webp")

FileNotFoundError: [Errno 2] No such file or directory: '/content/Husky.webp'

In [None]:
classify_custom_image("/content/taco.webp")

'hotdog, hot dog, red hot'

In [None]:
classify_custom_image("/content/kiwi.jpeg")

'coil, spiral, volute, whorl, helix'