
# 🎥 MOVIE SENTIMENT ANALYSIS

**Author:** T.MAHESH

---

## 📌 Project Overview
MOVIE SENTIMENT ANALYSIS is a text classification project that predicts whether a given movie review is **positive**, **neutral**, or **negative**.
It uses **TF-IDF vectorization** for feature extraction and **Logistic Regression** for classification.
The interface is built with **Gradio** to provide an interactive sentiment prediction tool that displays both the predicted label (with emoji) and the model's confidence as a bar chart.

---

## 📂 How to Run
1. **Install dependencies**
```bash
pip install gradio pandas nltk scikit-learn matplotlib seaborn
```
2. **Run the script**
```bash
python movie_sentiment_analysis.py
```
3. **Access the demo**
   - If running locally: open the link shown in the terminal.
   - If running with `share=True`, use the public Gradio share link.

---

## 📊 Model Evaluation
After training on the sample dataset:
- **Train Accuracy**: ~X.XXX (will display from script output)
- **Test Accuracy**: ~X.XXX (will display from script output)

Model is trained on **TF-IDF features** (up to bigrams, max 3000 features) with **Logistic Regression**.
Data is split **80% train / 20% test** with stratified sampling.

---

## 📝 Example Predictions
| Review | Predicted Sentiment |
|--------|--------------------|
| I absolutely loved this movie, it was amazing! | 🎬 Positive |
| This was the worst film I've ever seen | 💢 Negative |
| It was okay, not great but not bad | 😐 Neutral |
| Fantastic soundtrack and visuals | 🎬 Positive |
| Boring and predictable storyline | 💢 Negative |

---

## 📊 Demo Predictions on Custom Inputs
- **"Heartwarming and beautifully shot"** → 🎬 Positive
- **"Predictable ending"** → 💢 Negative
- **"Not bad, but I wouldn't watch again"** → 😐 Neutral

---

## ✅ Conclusion
The **MOVIE SENTIMENT ANALYSIS** project demonstrates how natural language processing techniques can be applied to determine the sentiment of movie reviews.
By combining **TF-IDF vectorization** with **Logistic Regression**, we achieve a reasonable balance between accuracy and interpretability.
The Gradio interface makes it easy for users to interact with the model, visualize its confidence, and understand predictions in an intuitive way.


In [None]:
# ---------------- Install & Imports ----------------
!pip install gradio --quiet
import io
import pandas as pd
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from nltk.corpus import stopwords
import gradio as gr

# ---------------- NLTK (fix for punkt_tab) ---------------
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')

# ---------------- Embedded sample dataset (offline) ---------------
csv_data = """review,sentiment
"I absolutely loved this movie, it was amazing!",positive
"This was the worst film I've ever seen",negative
"The acting was fantastic and the story was touching",positive
"Terrible plot and bad cinematography",negative
"Average movie, nothing memorable",neutral
"Brilliant performance by the lead actor",positive
"The pacing was awful and it dragged on",negative
"I enjoyed every minute of it",positive
"Poor direction and weak script",negative
"It was okay, not great but not bad",neutral
"Fantastic soundtrack and visuals",positive
"Boring and predictable storyline",negative
"A masterpiece with great emotional depth",positive
"Too long and felt unnecessary",negative
"Decent but could have been better",neutral
"Heartwarming and beautifully shot",positive
"Dialogues were cringe-worthy",negative
"Perfect balance of comedy and drama",positive
"Unrealistic and poorly executed scenes",negative
"Good entertainment for the whole family",positive
"Overrated and disappointing",negative
"Not bad, but I wouldn't watch again",neutral
"An inspiring and uplifting film",positive
"Terrible casting choices",negative
"Some good moments but overall average",neutral
"Engaging from start to finish",positive
"Plot holes everywhere",negative
"Visually stunning and emotional",positive
"Predictable ending",negative
"Nice performances but slow pace",neutral
"""
df = pd.read_csv(io.StringIO(csv_data))

# ---------------- Preprocessing ----------------
stop_words = set(stopwords.words('english'))
def clean_text(s):
    tokens = [w.lower() for w in nltk.word_tokenize(str(s)) if w.isalpha()]
    tokens = [t for t in tokens if t not in stop_words]
    return " ".join(tokens)

df['review_clean'] = df['review'].apply(clean_text)

# ---------------- Train/Test Split & Vectorize ----------------
X_train, X_test, y_train, y_test = train_test_split(
    df['review_clean'], df['sentiment'], test_size=0.2, random_state=42, stratify=df['sentiment']
)

vectorizer = TfidfVectorizer(max_features=3000, ngram_range=(1,2))
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# ---------------- Train Model ----------------
model = LogisticRegression(max_iter=500)
model.fit(X_train_tfidf, y_train)

print(f"✅ Train acc: {model.score(X_train_tfidf, y_train):.3f}  |  Test acc: {model.score(X_test_tfidf, y_test):.3f}")

# ---------------- Prediction function (emoji + confidence plot) ----------------
label_to_emoji = {"positive": "🎬 Positive", "neutral": "😐 Neutral", "negative": "💢 Negative"}

def predict_with_visuals(text):
    clean = clean_text(text)
    vec = vectorizer.transform([clean])
    probs = model.predict_proba(vec)[0]
    labels = model.classes_
    
    pred_idx = probs.argmax()
    pred_label = labels[pred_idx]
    pred_emoji_text = label_to_emoji.get(pred_label, pred_label)
    
    fig, ax = plt.subplots(figsize=(5,3))
    sns.barplot(x=list(labels), y=probs, ax=ax)
    ax.set_ylim(0,1)
    ax.set_ylabel("Confidence")
    ax.set_xlabel("Label")
    ax.set_title(f"Model confidence — predicted: {pred_emoji_text}")
    for i, p in enumerate(probs):
        ax.text(i, p + 0.02, f"{p:.2f}", ha='center')
    plt.tight_layout()
    
    return pred_emoji_text, fig

# ---------------- Gradio Interface ----------------
demo = gr.Interface(
    fn=predict_with_visuals,
    inputs=gr.Textbox(lines=3, placeholder="Type a movie review..."),
    outputs=["text", "plot"],
    title="🎥 Movie Review Sentiment (Emoji + Confidence)",
    description="Type a movie review and get sentiment (emoji + label) with a confidence bar chart.",
    allow_flagging="never"
)

demo.launch(share=True)
