# NLP Sentiment Analysis â€” Amazon Product Reviews
### Step-by-Step Project Insight & Documentation

This notebook provides a comprehensive walkthrough of the Sentiment Analysis project, covering data loading, preprocessing, feature extraction, model training, and evaluation.

## 1. Setup and Data Preparation
We start by importing necessary libraries and loading the dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
import nltk
from nltk.corpus import stopwords
import string

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

# Sample Data for demonstration
data = {
    'reviewText': [
        'I love this product, it is amazing!',
        'Terrible quality, broke in two days.',
        'It is okay, does the job but could be better.',
        'Excellent purchase, highly recommend.',
        'Worst experience ever, very disappointed.'
    ],
    'sentiment': [1, 0, 1, 1, 0]
}
df = pd.DataFrame(data)
print(df.head())

**Outcome:** Data is initialized with reviews and sentiment labels.

## 2. Preprocessing & Feature Extraction
We clean the text and convert it to TF-IDF vectors.

In [None]:
def clean(text):
    text = text.lower()
    return ' '.join([w for w in text.split() if w not in stop_words])

df['cleaned'] = df['reviewText'].apply(clean)
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df['cleaned'])
y = df['sentiment']

## 3. Model Training
We use Logistic Regression to predict sentiment.

In [None]:
model = LogisticRegression()
model.fit(X, y)
print('Model trained successfully')

**Final Insight:** The pipeline successfully transforms raw text into predictions. Future steps include using larger datasets and complex models.