In [None]:
import pandas as pd
from transformers import pipeline

# Load the dataset (with 'talks_about' column already present)
df = pd.read_csv("classified_car_reviews.csv")

# Initialize the sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Function to classify sentiment and add percentage confidence
def detect_sentiment_with_score(text):
    try:
        result = sentiment_analyzer(text)[0]
        sentiment = result['label']  # 'POSITIVE' or 'NEGATIVE'
        score = result['score']  # Confidence score for the sentiment
        return sentiment, round(score * 100, 2)  # Return sentiment and percentage
    except Exception as e:
        return "error", 0

# Apply sentiment analysis and add confidence percentage
df[['sentiment', 'sentiment_percentage']] = df['Review'].apply(
    lambda x: pd.Series(detect_sentiment_with_score(x))
)

# Save the updated dataset
df.to_csv("updated_car_reviews_with_sentiment_and_percentage.csv", index=False)

# Print confirmation
print("Dataset updated with sentiment and percentages saved as updated_car_reviews_with_sentiment_and_percentage.csv.")


The distilbert-base-uncased-finetuned-sst-2-english model, used via the sentiment-analysis pipeline, is a highly reliable tool for sentiment analysis. This model is fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset, which is a benchmark dataset for sentiment classification tasks. Its architecture, based on DistilBERT, is a lighter and faster version of BERT, making it efficient without sacrificing much accuracy. This model is capable of classifying text into positive and negative sentiments with high precision, making it suitable for analyzing customer reviews, social media posts, and other user-generated content. Its ease of integration with the Hugging Face Transformers library allows for seamless application in a wide range of domains.

The sentiment-analysis pipeline is particularly useful for real-time sentiment monitoring and feedback loops, given its fast inference time and robust performance on English text. It is ideal for businesses looking to understand customer feedback, perform brand sentiment analysis, or monitor social media trends. Its pre-trained nature eliminates the need for task-specific fine-tuning, making it accessible for projects with limited labeled data or resources. Additionally, its lightweight DistilBERT architecture ensures that it can be deployed on devices with constrained computational capabilities while maintaining reliableÂ performance.