# 📝 Sentiment Analysis on Amazon Product Reviews

This notebook demonstrates how to perform sentiment analysis using Natural Language Processing (NLP) techniques on textual data from Amazon product reviews.

In [None]:
import pandas as pd

# Load dataset (replace with your path if needed)
df = pd.read_csv(r"C:\Users\9xrah\Downloads\archive (6)\text_emotion.csv")
df = df[['content', 'sentiment']]
df.columns = ['Review', 'Sentiment']
df.head()

Unnamed: 0,Review,Sentiment
0,@tiffanylue i know i was listenin to bad habi...,empty
1,Layin n bed with a headache ughhhh...waitin o...,sadness
2,Funeral ceremony...gloomy friday...,sadness
3,wants to hang out with friends SOON!,enthusiasm
4,@dannycastillo We want to trade with someone w...,neutral


## 🔄 Data Preprocessing

In [5]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

nltk.download('stopwords')
nltk.download('wordnet')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def clean_text(text):
    text = re.sub(r"http\S+", "", text)
    text = re.sub(r"[^a-zA-Z]", " ", text)
    text = text.lower().split()
    text = [lemmatizer.lemmatize(word) for word in text if word not in stop_words]
    return " ".join(text)

df['Cleaned_Review'] = df['Review'].apply(clean_text)
df.head()

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\9xrah\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\9xrah\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Unnamed: 0,Review,Sentiment,Cleaned_Review
0,@tiffanylue i know i was listenin to bad habi...,empty,tiffanylue know listenin bad habit earlier sta...
1,Layin n bed with a headache ughhhh...waitin o...,sadness,layin n bed headache ughhhh waitin call
2,Funeral ceremony...gloomy friday...,sadness,funeral ceremony gloomy friday
3,wants to hang out with friends SOON!,enthusiasm,want hang friend soon
4,@dannycastillo We want to trade with someone w...,neutral,dannycastillo want trade someone houston ticke...


## 🔠 Text Vectorization

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(df['Cleaned_Review']).toarray()
y = df['Sentiment']

## 🤖 Model Building & Evaluation

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = MultinomialNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

       anger       0.00      0.00      0.00        19
     boredom       0.00      0.00      0.00        31
       empty       0.00      0.00      0.00       162
  enthusiasm       0.00      0.00      0.00       163
         fun       0.00      0.00      0.00       338
   happiness       0.34      0.26      0.30      1028
        hate       0.00      0.00      0.00       268
        love       0.50      0.27      0.35       762
     neutral       0.31      0.56      0.40      1740
      relief       0.00      0.00      0.00       352
     sadness       0.33      0.11      0.16      1046
    surprise       0.50      0.00      0.00       425
       worry       0.30      0.58      0.39      1666

    accuracy                           0.32      8000
   macro avg       0.18      0.14      0.12      8000
weighted avg       0.29      0.32      0.26      8000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## 📊 Insights & Conclusion
- The model was able to predict sentiment with decent accuracy using TF-IDF and MultinomialNB.
- You can enhance this by using deep learning (e.g., LSTM or BERT) for better results.
- Explore misclassified examples to improve the model further.