In [None]:
```xml
<VSCode.Cell language="markdown">
# Sentiment Analysis AI - Demo Notebook

This notebook demonstrates the complete workflow of the sentiment analysis model:
1. Data loading and exploration
2. Text preprocessing
3. Model training
4. Evaluation
5. Making predictions

Let's get started!
</VSCode.Cell>

<VSCode.Cell language="python">
# Import required libraries
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent if 'notebooks' in str(Path.cwd()) else Path.cwd()
sys.path.insert(0, str(project_root))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.data_loader import load_data, DataLoader
from src.preprocessor import TextPreprocessor
from src.model import SentimentModel
from src.trainer import ModelTrainer
from src.evaluator import ModelEvaluator
from src.inference import SentimentPredictor
from src.utils import plot_confusion_matrix

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ All imports successful!")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 1. Data Loading and Exploration

Let's load our dataset and explore its contents.
</VSCode.Cell>

<VSCode.Cell language="python">
# Load the dataset
data_path = project_root / 'data' / 'raw' / 'reviews.csv'
df = pd.read_csv(data_path)

print(f"Dataset shape: {df.shape}")
print(f"\nFirst few rows:")
print(df.head(10))
</VSCode.Cell>

<VSCode.Cell language="python">
# Check class distribution
print("Sentiment Distribution:")
print(df['sentiment_label'].value_counts())
print(f"\nClass balance:")
print(df['sentiment_label'].value_counts(normalize=True))

# Visualize distribution
plt.figure(figsize=(10, 6))
df['sentiment_label'].value_counts().plot(kind='bar', color=['red', 'gray', 'green'])
plt.title('Sentiment Distribution', fontsize=16, fontweight='bold')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
</VSCode.Cell>

<VSCode.Cell language="python">
# Text length analysis
df['text_length'] = df['text'].str.len()
df['word_count'] = df['text'].str.split().str.len()

print("Text Statistics:")
print(df[['text_length', 'word_count']].describe())

# Visualize text lengths by sentiment
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

df.boxplot(column='text_length', by='sentiment_label', ax=axes[0])
axes[0].set_title('Text Length by Sentiment')
axes[0].set_xlabel('Sentiment')
axes[0].set_ylabel('Character Count')

df.boxplot(column='word_count', by='sentiment_label', ax=axes[1])
axes[1].set_title('Word Count by Sentiment')
axes[1].set_xlabel('Sentiment')
axes[1].set_ylabel('Word Count')

plt.suptitle('')
plt.tight_layout()
plt.show()
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 2. Text Preprocessing

Let's see how the preprocessing pipeline transforms our text.
</VSCode.Cell>

<VSCode.Cell language="python">
# Initialize preprocessor
preprocessor = TextPreprocessor(
    lowercase=True,
    remove_stopwords=True,
    remove_punctuation=True,
    remove_numbers=False,
    lemmatize=True
)

# Test on sample texts
sample_texts = [
    "This product is absolutely AMAZING! Best purchase ever!!!",
    "Terrible quality. Very disappointed with this product.",
    "It's okay. Nothing special but does the job."
]

print("Preprocessing Examples:\n")
for i, text in enumerate(sample_texts, 1):
    processed = preprocessor.preprocess(text)
    print(f"{i}. Original:  {text}")
    print(f"   Processed: {processed}\n")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 3. Model Training

Now let's split the data and train our model.
</VSCode.Cell>

<VSCode.Cell language="python">
# Load and split data
X_train, X_val, X_test, y_train, y_val, y_test = load_data(
    str(data_path),
    test_size=0.15,
    val_size=0.15,
    random_seed=42
)

print(f"Training set size:   {len(X_train)}")
print(f"Validation set size: {len(X_val)}")
print(f"Test set size:       {len(X_test)}")
</VSCode.Cell>

<VSCode.Cell language="python">
# Initialize model
model = SentimentModel(
    model_type='naive_bayes',
    max_features=5000,
    ngram_range=(1, 2),
    min_df=2,
    max_df=0.95
)

# Initialize trainer
trainer = ModelTrainer(model=model, preprocessor=preprocessor)

# Train the model
print("Training model...")
trainer.train(X_train, y_train)
print("✓ Training complete!")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 4. Model Evaluation

Let's evaluate our model's performance.
</VSCode.Cell>

<VSCode.Cell language="python">
# Evaluate on validation set
print("Evaluating on validation set...")
val_metrics = trainer.evaluate(X_val, y_val)

print(f"\nValidation Metrics:")
print(f"  Accuracy:  {val_metrics['accuracy']:.4f}")
print(f"  Precision: {val_metrics['precision_macro']:.4f}")
print(f"  Recall:    {val_metrics['recall_macro']:.4f}")
print(f"  F1-Score:  {val_metrics['f1_macro']:.4f}")
</VSCode.Cell>

<VSCode.Cell language="python">
# Evaluate on test set
print("Evaluating on test set...")
test_metrics = trainer.evaluate(X_test, y_test)

evaluator = ModelEvaluator()
evaluator.print_metrics(test_metrics)
</VSCode.Cell>

<VSCode.Cell language="python">
# Visualize confusion matrix
plot_confusion_matrix(
    test_metrics['confusion_matrix'],
    classes=['Negative', 'Neutral', 'Positive'],
    title='Test Set Confusion Matrix'
)
</VSCode.Cell>

<VSCode.Cell language="python">
# Feature importance analysis
print("Top predictive features for each class:\n")

feature_importance = trainer.get_feature_importance(n=15)

for class_idx, features in feature_importance.items():
    class_name = ['Negative', 'Neutral', 'Positive'][class_idx]
    print(f"\n{class_name} Sentiment - Top Features:")
    print("-" * 50)
    
    for i, (feature, score) in enumerate(features[:10], 1):
        print(f"  {i:2d}. {feature:25s} {score:8.4f}")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 5. Making Predictions

Let's use our trained model to make predictions on new text.
</VSCode.Cell>

<VSCode.Cell language="python">
# Save the model first
model_path = project_root / 'models' / 'demo_model.pkl'
trainer.save_model(str(model_path))
print(f"Model saved to: {model_path}")
</VSCode.Cell>

<VSCode.Cell language="python">
# Load the model for inference
predictor = SentimentPredictor(str(model_path))

# Test predictions
test_texts = [
    "This is the best product I've ever used!",
    "Absolutely terrible. Waste of money.",
    "It's okay, nothing special.",
    "Love it! Highly recommend!",
    "Disappointed with the quality.",
    "Average product, does the job."
]

print("Making predictions:\n")
results = predictor.predict(test_texts, return_proba=True)

for result in results:
    sentiment_label = result['sentiment_label']
    confidence = result['confidence']
    
    # Color code by sentiment
    if sentiment_label == 'positive':
        color = '🟢'
    elif sentiment_label == 'negative':
        color = '🔴'
    else:
        color = '🟡'
    
    print(f"{color} {sentiment_label.upper():8s} ({confidence:.1%}) - {result['text']}")
</VSCode.Cell>

<VSCode.Cell language="python">
# Visualize prediction probabilities
fig, ax = plt.subplots(figsize=(14, 8))

sentiments = ['negative', 'neutral', 'positive']
colors = ['#ff6b6b', '#95a5a6', '#51cf66']

# Get probabilities for each text
proba_data = []
for result in results:
    probs = [result['probabilities'][s] for s in sentiments]
    proba_data.append(probs)

proba_array = np.array(proba_data)

# Create stacked bar chart
x = np.arange(len(test_texts))
bottom = np.zeros(len(test_texts))

for i, sentiment in enumerate(sentiments):
    ax.bar(x, proba_array[:, i], bottom=bottom, label=sentiment.capitalize(), 
           color=colors[i], alpha=0.8)
    bottom += proba_array[:, i]

ax.set_ylabel('Probability', fontsize=12)
ax.set_xlabel('Text Sample', fontsize=12)
ax.set_title('Prediction Probability Distribution', fontsize=16, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([f"Text {i+1}" for i in range(len(test_texts))], rotation=0)
ax.legend(loc='upper right')
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print("\nText samples:")
for i, text in enumerate(test_texts, 1):
    print(f"  Text {i}: {text}")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 6. Interactive Prediction

Try your own text!
</VSCode.Cell>

<VSCode.Cell language="python">
# Interactive prediction function
def predict_sentiment(text):
    """Predict and display sentiment for custom text."""
    result = predictor.predict(text, return_proba=True)
    
    print("="*70)
    print(f"TEXT: {text}")
    print("="*70)
    
    sentiment_label = result['sentiment_label']
    confidence = result['confidence']
    
    print(f"\n✨ PREDICTION: {sentiment_label.upper()}")
    print(f"📊 CONFIDENCE: {confidence:.1%}\n")
    
    print("Probability Distribution:")
    for label, prob in result['probabilities'].items():
        bar_length = int(prob * 40)
        bar = "█" * bar_length + "░" * (40 - bar_length)
        print(f"  {label:8s} [{bar}] {prob:.1%}")
    
    print("="*70)

# Example usage - modify the text below to test different inputs
predict_sentiment("This product exceeded all my expectations! Absolutely fantastic!")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## 7. Model Comparison

Let's compare different model types.
</VSCode.Cell>

<VSCode.Cell language="python">
# Train and compare multiple models
model_types = ['naive_bayes', 'logistic_regression']
comparison_results = {}

for model_type in model_types:
    print(f"\nTraining {model_type}...")
    
    # Initialize model and trainer
    model = SentimentModel(model_type=model_type, max_features=5000)
    trainer = ModelTrainer(model=model, preprocessor=preprocessor)
    
    # Train
    trainer.train(X_train, y_train)
    
    # Evaluate
    metrics = trainer.evaluate(X_test, y_test)
    comparison_results[model_type] = metrics
    
    print(f"  Accuracy: {metrics['accuracy']:.4f}")
    print(f"  F1-Score: {metrics['f1_macro']:.4f}")

print("\n✓ Model comparison complete!")
</VSCode.Cell>

<VSCode.Cell language="python">
# Visualize comparison
metrics_to_compare = ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro']
model_names = list(comparison_results.keys())

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(metrics_to_compare))
width = 0.35

for i, model_name in enumerate(model_names):
    values = [comparison_results[model_name][metric] for metric in metrics_to_compare]
    ax.bar(x + i*width, values, width, label=model_name.replace('_', ' ').title())

ax.set_xlabel('Metrics', fontsize=12)
ax.set_ylabel('Score', fontsize=12)
ax.set_title('Model Performance Comparison', fontsize=16, fontweight='bold')
ax.set_xticks(x + width / 2)
ax.set_xticklabels([m.replace('_', ' ').title() for m in metrics_to_compare])
ax.legend()
ax.grid(axis='y', alpha=0.3)
ax.set_ylim([0, 1])

plt.tight_layout()
plt.show()
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Summary

In this notebook, we:

1. ✅ Loaded and explored the sentiment dataset
2. ✅ Preprocessed text data with cleaning and normalization
3. ✅ Trained a sentiment analysis model
4. ✅ Evaluated model performance with comprehensive metrics
5. ✅ Made predictions on new text data
6. ✅ Compared different model architectures

### Next Steps

- Try different preprocessing configurations
- Experiment with model hyperparameters
- Add more training data
- Deploy the model to production
- Integrate with a web application

**Happy modeling! 🚀**
</VSCode.Cell>
```