# Sentiment Analysis using TF-IDF and Logistic Regression

This notebook performs sentiment analysis on customer reviews using TF-IDF vectorization and Logistic Regression.

In [None]:
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

## Load Dataset

For demonstration, we use a small sample customer review dataset.

In [None]:
data = {
    'review': [
        'This product is amazing and works perfectly',
        'Worst purchase ever, very disappointed',
        'Good quality and value for money',
        'Terrible experience, will not buy again',
        'Excellent service and fantastic product',
        'Not good, product broke after one day'
    ],
    'sentiment': ['positive', 'negative', 'positive', 'negative', 'positive', 'negative']
}
df = pd.DataFrame(data)
df

## Text Preprocessing

In [None]:
def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    return text

df['clean_review'] = df['review'].apply(clean_text)
df

## Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    df['clean_review'], df['sentiment'], test_size=0.3, random_state=42
)

## TF-IDF Vectorization

In [None]:
tfidf = TfidfVectorizer(stop_words='english')
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

## Train Logistic Regression Model

In [None]:
model = LogisticRegression()
model.fit(X_train_tfidf, y_train)

## Model Evaluation

In [None]:
y_pred = model.predict(X_test_tfidf)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('\nClassification Report:\n')
print(classification_report(y_test, y_pred))

## Confusion Matrix

In [None]:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

## Analysis

- TF-IDF converts text into numerical features based on word importance.
- Logistic Regression is effective for binary sentiment classification.
- Preprocessing improves model performance.
- Model performance depends heavily on dataset size and quality.