# Sentiment Analysis Basics

**Objective:** Understand the fundamentals of sentiment analysis — how to process text data and classify sentiment (positive, negative, neutral) using machine learning.

---
## What is Sentiment Analysis?

Sentiment analysis is a **Natural Language Processing (NLP)** task that aims to identify whether a piece of text expresses a **positive**, **negative**, or **neutral** emotion.

**Example:**
- "I love this movie!" → Positive 🎉
- "This product is terrible." → Negative 😡

We’ll build a simple model using scikit-learn and TF-IDF.

---
##  Import Required Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import seaborn as sns
import matplotlib.pyplot as plt

---
## Sample Dataset

Let’s create a small sample dataset for demonstration. In practice, you would use larger datasets like IMDb reviews or Twitter Sentiment140.

In [None]:
data = {
    'text': [
        'I love this movie!',
        'This is an amazing product.',
        'I am so happy with the service.',
        'I hate this item.',
        'This is the worst experience ever.',
        'I am not satisfied with the quality.',
        'Absolutely fantastic!',
        'Terrible and disappointing.',
        'It was okay, not great.',
        'Pretty decent overall.'
    ],
    'sentiment': [1, 1, 1, 0, 0, 0, 1, 0, 2, 2]  # 1=Positive, 0=Negative, 2=Neutral
}

df = pd.DataFrame(data)
df.head()

---
##  Data Preprocessing
We'll split the dataset and convert text to numerical features using **TF-IDF**.

In [None]:
X = df['text']
y = df['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

print('✅ TF-IDF transformation complete!')
print('Vocabulary size:', len(vectorizer.get_feature_names_out()))

---
##  Model Training (Logistic Regression)
We’ll train a simple **Logistic Regression classifier** to predict sentiment labels.

In [None]:
model = LogisticRegression(max_iter=200)
model.fit(X_train_tfidf, y_train)

y_pred = model.predict(X_test_tfidf)

print('✅ Model training complete!')
print('\nAccuracy:', accuracy_score(y_test, y_pred))

---
## Evaluation

In [None]:
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=['Negative', 'Positive', 'Neutral']))

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Neg','Pos','Neu'], yticklabels=['Neg','Pos','Neu'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

---
##  Test with Custom Sentences
You can input your own sentences to test how the model performs.

In [None]:
samples = [
    'I absolutely love the design!',
    'This is horrible and useless.',
    'It’s fine, nothing special.'
]

sample_features = vectorizer.transform(samples)
predictions = model.predict(sample_features)

for text, label in zip(samples, predictions):
    sentiment = {0:'Negative 😡', 1:'Positive 😊', 2:'Neutral 😐'}[label]
    print(f'{text} → {sentiment}')

---
## 🧩 8️⃣ Key Insights
- Sentiment analysis converts text into numerical features (TF-IDF).
- Logistic Regression performs well for small datasets.
- For large-scale text, use deep learning models (like **LSTM** or **BERT**).

---
## Summary
In this notebook, we:
- Built a basic text classification pipeline.
- Used TF-IDF to represent words.
- Trained a Logistic Regression model for sentiment prediction.

---
 **Next:** `08-Advanced_Sentiment_Analysis_with_LSTM.ipynb` — Learn how to build sentiment models using **deep learning**.