## **Sentiment Analysis Model Testing and Comparison**

In this section, I conducted a comparison between the **base model** (`cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual`) and the **fine-tuned model** (`AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual`) on a set of **nuanced YouTube comments**.

### Objective:
The goal was to evaluate the performance of both models in handling **nuanced comments** that include both positive and negative feedback, often leading to a **neutral** to slightly **Negative** sentiment.


In [1]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

In [2]:
model_finetuned_path = "AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual"
model_finetuned = AutoModelForSequenceClassification.from_pretrained(model_finetuned_path)
tokenizer_finetuned = AutoTokenizer.from_pretrained(model_finetuned_path)

model_cardiff_path = "cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual"
model_cardiff = AutoModelForSequenceClassification.from_pretrained(model_cardiff_path)
tokenizer_cardiff = AutoTokenizer.from_pretrained(model_cardiff_path)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/972 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/982 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [3]:
labels = ["Negative", "Neutral", "Positive"]

def predict_sentiment(model, tokenizer, comments):
    inputs = tokenizer(comments, return_tensors="pt", truncation=True, padding=True, max_length=64)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        predicted_class_indices = torch.argmax(logits, dim=-1).tolist()
    return [labels[idx] for idx in predicted_class_indices]

### Comments Tested:
I selected a set of **nuanced YouTube comments** that express a mix of both positive and negative feedback, as well as constructive criticism. Here are a few examples:

- "The video was quite informative, though I feel it missed some key points."
- "I loved the content, but the background noise was really distracting."
- "Great explanations, but I wish the video was a little shorter. It felt like it dragged at times."
- "Nice video, but I wasn’t a fan of the background music."

In [15]:
comments = [
    "The video was quite informative, though I feel it missed some key points.",
    "I loved the content, but the background noise was really distracting.",
    "Great explanations, but I wish the video was a little shorter. It felt like it dragged at times.",
    "I can’t believe how fast the speaker talks, but the information was valuable.",
    "The video quality was perfect, but I found the pacing a bit off.",
    "The video gave some good insights, though I expected more real-world examples.",
    "The tutorial was clear and easy to follow, but it would’ve been nice to have more advanced content.",
    "I was hoping for more tips, but the video still gave me a decent understanding of the topic.",
    "Nice video, but I wasn’t a fan of the background music.",
    "The content was good, but I don’t think it’s suitable for beginners. Could be more beginner-friendly."
]

predictions_finetuned = predict_sentiment(model_finetuned, tokenizer_finetuned, comments)
predictions_cardiff = predict_sentiment(model_cardiff, tokenizer_cardiff, comments)

print("Comparison of Sentiments:")
print("-" * 50)
for comment, finetuned_sentiment, cardiff_sentiment in zip(comments, predictions_finetuned, predictions_cardiff):
    print(f"Comment: {comment}")
    print(f"Base Model Prediction: {cardiff_sentiment}")
    print(f"Fine-Tuned Model Prediction: {finetuned_sentiment}")
    print("-" * 50)

Comparison of Sentiments:
--------------------------------------------------
Comment: The video was quite informative, though I feel it missed some key points.
Base Model Prediction: Negative
Fine-Tuned Model Prediction: Negative
--------------------------------------------------
Comment: I loved the content, but the background noise was really distracting.
Base Model Prediction: Negative
Fine-Tuned Model Prediction: Negative
--------------------------------------------------
Comment: Great explanations, but I wish the video was a little shorter. It felt like it dragged at times.
Base Model Prediction: Negative
Fine-Tuned Model Prediction: Neutral
--------------------------------------------------
Comment: I can’t believe how fast the speaker talks, but the information was valuable.
Base Model Prediction: Positive
Fine-Tuned Model Prediction: Positive
--------------------------------------------------
Comment: The video quality was perfect, but I found the pacing a bit off.
Base Model 




#### Results:
- Both models agreed on several comments, especially those with a **clear negative** tone (e.g., criticism about pacing or background noise).
- The **fine-tuned model** performed better on **mixed feedback** comments, correctly identifying **neutral** sentiments in cases where there was a **balance of positive and negative aspects**.
- For example, the comment **"Great explanations, but I wish the video was a little shorter"** was classified as **neutral** by the fine-tuned model, whereas the base model incorrectly predicted it as **negative**.
- The fine-tuned model also demonstrated better performance on comments where the **expectations were not fully met**, such as **"The video gave some good insights, though I expected more real-world examples"**, where it classified it as **neutral** instead of **positive**.

#### Observations:
- The fine-tuned model proved to be more adept at handling **nuanced sentiment** in YouTube-specific comments, especially those that were **constructive or mixed**.
- The base model showed limitations in identifying **neutral sentiments** in comments that contained both **positive and negative** feedback. It often classified these comments as **negative**, showing a tendency to focus more on criticisms rather than the overall tone.

#### Conclusion:
The fine-tuned model is **better suited** for sentiment analysis of YouTube comments, particularly when dealing with **mixed or neutral sentiment**. This indicates the importance of fine-tuning models on specific datasets to improve performance on domain-specific tasks.

