PHASE 1

SENTIMENTAL ANALYSIS CODE WITHOUT HUGGING FACE.

## Interactive Sentiment Analysis

### Subtask:
Allow the user to input a sentence and get a real-time sentiment prediction (Positive/Negative) using the updated `predict_sentiment` function.

In [15]:
import pandas as pd
import numpy as np
import re
import warnings
warnings.filterwarnings("ignore")

In [16]:
import nltk
nltk.download("punkt", quiet=True)
nltk.download("punkt_tab", quiet=True)
nltk.download("wordnet", quiet=True)
nltk.download("omw-1.4", quiet=True)
nltk.download("stopwords", quiet=True)

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

STOP = set(stopwords.words("english"))
LEMMATIZER = WordNetLemmatizer()

def preprocess(text):
    """NLP pipeline: lowercase, remove non-letters, tokenize, remove stop-words, lemmatize."""
    if pd.isna(text) or not isinstance(text, str):
        return ""
    text = text.lower().strip()
    text = re.sub(r"[^a-z\\s]", " ", text)
    tokens = word_tokenize(text)
    tokens = [t for t in tokens if t not in STOP and len(t) > 1]
    tokens = [LEMMATIZER.lemmatize(t) for t in tokens]
    return " ".join(tokens)

df["text_clean"] = df["text"].astype(str).apply(preprocess)
print("Example:")
print("  Original:", df["text"].iloc[0][:80])
print("  Cleaned: ", df["text_clean"].iloc[0][:80])

Example:
  Original: Dumb is as dumb does, in this thoroughly uninteresting, supposed black comedy. E
  Cleaned:  dumb dumb thoroughly uninteresting supposed black comedy essentially start chris


In [17]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

X_train, X_test, y_train, y_test = train_test_split(
    df["text_clean"], df["label"], test_size=0.2, random_state=42, stratify=df["label"]
)

vectorizer = TfidfVectorizer(max_features=10_000, ngram_range=(1, 2), min_df=2)
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

model = LogisticRegression(max_iter=500)
model.fit(X_train_vec, y_train)

y_pred = model.predict(X_test_vec)
print("Test accuracy:", round(accuracy_score(y_test, y_pred), 4))
print(classification_report(y_test, y_pred, target_names=["Negative", "Positive"]))

Test accuracy: 0.852
              precision    recall  f1-score   support

    Negative       0.87      0.83      0.85       503
    Positive       0.83      0.88      0.85       497

    accuracy                           0.85      1000
   macro avg       0.85      0.85      0.85      1000
weighted avg       0.85      0.85      0.85      1000



In [18]:
def predict_sentiment(sentence):
    """Given a sentence, return Positive or Negative using our NLP pipeline."""
    cleaned = preprocess(sentence)
    X = vectorizer.transform([cleaned])
    pred = model.predict(X)[0]
    return "Positive" if pred == 1 else "Negative"

# Try it on a few examples
examples = [
    "I really enjoyed the movie!",
    "This is the worst product ever.",
    "Not bad, actually quite good.",
]
for s in examples:
    print(f"  '{s}' -> {predict_sentiment(s)}")

  'I really enjoyed the movie!' -> Positive
  'This is the worst product ever.' -> Negative
  'Not bad, actually quite good.' -> Negative


In [19]:
# Change this sentence and run the cell to get the predicted sentiment
sentence = "I love this place, the food was amazing and the staff were friendly."

result = predict_sentiment(sentence)
print(f"Input: {sentence}")
print(f"Sentiment: {result}")

Input: I love this place, the food was amazing and the staff were friendly.
Sentiment: Positive


### PHASE 2 : Install Hugging Face Transformers and PyTorch

### Subtask:
Install the `transformers` library from Hugging Face and `torch`, which are necessary to work with transformer models. This step ensures all required dependencies for the new model are in place.


**Reasoning**:
The subtask requires installing the `transformers` library, so I will use a `code_block` to execute the pip install command for `transformers`.



In [20]:
import sys
!{sys.executable} -m pip install transformers
print("Installed 'transformers' library.")

Installed 'transformers' library.


**Reasoning**:
The subtask also requires installing the `torch` library, so I will use a `code_block` to execute the pip install command for `torch`.



In [21]:
import sys
!{sys.executable} -m pip install torch
print("Installed 'torch' library.")

Installed 'torch' library.


**Reasoning**:
To initialize the Hugging Face sentiment analysis pipeline, I need to import the `pipeline` function from the `transformers` library and then call it with 'sentiment-analysis' as the task.



In [22]:
from transformers import pipeline

# Initialize the sentiment analysis pipeline. This will download a pre-trained model.
hf_sentiment_pipeline = pipeline('sentiment-analysis')

print("Hugging Face sentiment analysis pipeline initialized successfully.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Hugging Face sentiment analysis pipeline initialized successfully.


**Reasoning**:
The previous step successfully initialized the Hugging Face sentiment analysis pipeline. The next logical step, as per the main task, is to evaluate its performance on the `X_test` and `y_test` data. This involves making predictions with the new pipeline and comparing them to the true labels.



In [23]:
def predict_sentiment(sentence):
    """Given a sentence, return Positive or Negative using the Hugging Face sentiment analysis pipeline."""
    # Pass the sentence to the Hugging Face pipeline
    result = hf_sentiment_pipeline(sentence)
    # The pipeline returns a list of dictionaries, e.g., [{'label': 'POSITIVE', 'score': 0.999}]
    label = result[0]['label']

    # Map the Hugging Face labels to 'Positive' or 'Negative'
    if label == 'POSITIVE':
        return 'Positive'
    elif label == 'NEGATIVE':
        return 'Negative'
    else:
        # In case of any unexpected label, though typically it's binary
        return 'Neutral'

print("The `predict_sentiment` function has been updated to use the Hugging Face model.")


The `predict_sentiment` function has been updated to use the Hugging Face model.


**Reasoning**:
To confirm that the `predict_sentiment` function has been successfully updated and is using the Hugging Face model, I need to test it with example sentences and observe the output, similar to how the original function was tested.



In [24]:
print('Testing the updated predict_sentiment function:')
# Try it on a few examples
examples = [
    "I really enjoyed the movie!",
    "This is the worst product ever.",
    "Not bad, actually quite good.",
    "I hate this product, it's terrible and a waste of money.",
    "The service was exceptional, I'm very happy."
]
for s in examples:
    print(f"  '{s}' -> {predict_sentiment(s)}")

# Also test with the previous `sentence` variable
print(f"\nInput: {sentence}")
print(f"Sentiment: {predict_sentiment(sentence)}")


Testing the updated predict_sentiment function:
  'I really enjoyed the movie!' -> Positive
  'This is the worst product ever.' -> Negative
  'Not bad, actually quite good.' -> Positive
  'I hate this product, it's terrible and a waste of money.' -> Negative
  'The service was exceptional, I'm very happy.' -> Positive

Input: I love this place, the food was amazing and the staff were friendly.
Sentiment: Positive


## Final Task

### Subtask:
Confirm that the sentiment analysis has been successfully updated to use a Hugging Face transformer model, that the analysis is working, and that comments and descriptions have been added to the relevant code sections as requested.


### Confirming the Hugging Face Model Used

To explicitly see which model the `hf_sentiment_pipeline` is using, we can inspect its `model.name_or_path` attribute.

In [26]:
print(f"The Hugging Face sentiment analysis pipeline is using the model: {hf_sentiment_pipeline.model.name_or_path}")

The Hugging Face sentiment analysis pipeline is using the model: distilbert/distilbert-base-uncased-finetuned-sst-2-english


In [25]:
# Prompt the user for a sentence
user_sentence = input("Enter a sentence for sentiment analysis: ")

# Get the sentiment prediction using the updated function
predicted_sentiment = predict_sentiment(user_sentence)

# Print the input sentence and its predicted sentiment
print(f"\nYour input: '{user_sentence}'")
print(f"Predicted sentiment: {predicted_sentiment}")

Enter a sentence for sentiment analysis: I am not well today. But I am happy that I get to spend time with my family.

Your input: 'I am not well today. But I am happy that I get to spend time with my family.'
Predicted sentiment: Positive
