# üìö Neural Additive Models (NAM) - Complete Educational Tutorial

**Welcome!** This notebook teaches you how to build, train, and interpret Neural Additive Models.

## üéØ Learning Objectives:
1. Understand NAM architecture and why it's explainable
2. Load and process daily sales data (250 records)
3. Build a single-layer NAM for interpretability
4. Train with proper validation strategies
5. Visualize predictions and elasticity curves
6. Decompose predictions into business drivers

## üìä What Makes NAM Special?

**Traditional Neural Networks:**
```
y = NN(x‚ÇÅ, x‚ÇÇ, ..., x‚Çô)  # Black box!
```

**Neural Additive Models:**
```
y = f‚ÇÅ(x‚ÇÅ) + f‚ÇÇ(x‚ÇÇ) + ... + f‚Çô(x‚Çô)  # Explainable!
```

Each feature has its **own neural network**, and predictions are **summed**.

**Benefits:**
- ‚úÖ Can plot individual feature contribution curves
- ‚úÖ No feature interactions (easier to explain)
- ‚úÖ Business-friendly interpretations
- ‚úÖ Regulatory compliant (explainable AI)

In [None]:
# Setup - Run this first!
import os
os.environ['KERAS_BACKEND'] = 'jax'

import sys
from pathlib import Path
sys.path.insert(0, str(Path('.').absolute() / 'src'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

print("‚úì Environment setup complete!")
print("Using Keras backend: JAX")

## üì¶ Section 1: Load Daily Sales Data

**Key Insight:** Daily data provides 20x more samples than monthly aggregation!

**Data Flow:**
```
Sales.csv (1M+ transactions)
    ‚Üì Parse dates
    ‚Üì Aggregate by day
    ‚Üì Result: 250 daily records
```

In [None]:
from src.data.data_loader import DataLoader

# Load daily data
loader = DataLoader('data/raw')
daily_data = loader.load_daily_sales()

print(f"üìä Daily Data Loaded:")
print(f"   Records: {len(daily_data):,}")
print(f"   Date range: {daily_data['Date'].min().date()} to {daily_data['Date'].max().date()}")
print(f"   Columns: {len(daily_data.columns)}")

# Show first few rows
daily_data.head()

## üîß Section 2: Feature Engineering & Scaling

**Critical Steps:**
1. Log-transform large values (GMV, investment)
2. StandardScaler for all features
3. Drop raw (unscaled) columns

**Why?** Neural networks need features in similar scales!

In [None]:
from src.data.data_preprocessing import DataPreprocessor
from src.data.feature_engineering import FeatureEngineer

# Preprocessing
preprocessor = DataPreprocessor({})
data = preprocessor.handle_missing_values(daily_data)
data = preprocessor.treat_outliers(data)

# Feature engineering
engineer = FeatureEngineer({})
data = engineer.engineer_all_features(data)

print(f"‚úì Feature engineering complete")
print(f"   Total columns: {len(data.columns)}")

# Scaling
data_scaled, scalers = preprocessor.scale_features(data)

print(f"‚úì Feature scaling complete")
print(f"   All features in range [-3, +3]")

# Show distribution
numeric_cols = data_scaled.select_dtypes(include=[np.number]).columns
print(f"\nüìä Scaled Features ({len(numeric_cols)}):")
for col in list(numeric_cols)[:5]:
    print(f"   {col}: [{data_scaled[col].min():.2f}, {data_scaled[col].max():.2f}]")

## üìä Section 3: Train/Val/Test Split (Time Series)

**Important:** For time series, we NEVER shuffle!

**Split Strategy:**
- Train: 70% (175 days)
- Val: 15% (37 days)
- Test: 15% (38 days)

**Result:** 38 test days for **clear trend visualization**!

In [None]:
# Time series split
train_size = int(len(data_scaled) * 0.70)
val_size = int(len(data_scaled) * 0.15)

train_data = data_scaled.iloc[:train_size]
val_data = data_scaled.iloc[train_size:train_size+val_size]
test_data = data_scaled.iloc[train_size+val_size:]

print(f"üìä Data Split:")
print(f"   Train: {len(train_data)} days")
print(f"   Val:   {len(val_data)} days")
print(f"   Test:  {len(test_data)} days")
print(f"\n‚úì Statistical Power:")
print(f"   Samples per feature: {len(train_data) / 9:.1f} (EXCELLENT!)")
print(f"   vs Monthly: 0.27 samples/feature (POOR)")

## üèóÔ∏è Section 4: Build Single-Layer NAM

**Architecture Choice:** Single layer [16] for explainability

**Structure:**
```python
For each feature i:
    f·µ¢(x·µ¢) = Dense(16, relu)(x·µ¢) ‚Üí Dense(1)(¬∑)

Final prediction = Œ£ f·µ¢(x·µ¢)
```

**Parameters:** Only 441 total (highly interpretable!)

In [None]:
from src.models.simple_nam import SimpleNAM
from src.training.trainer import NAMTrainer

# Prepare data
X_train, y_train = NAMTrainer.prepare_data_for_keras(train_data)

print(f"üìä Prepared Data:")
print(f"   X shape: {X_train.shape}")
print(f"   y shape: {y_train.shape}")
print(f"   Features: {X_train.shape[1]}")

# Build model
model = SimpleNAM(
    n_features=X_train.shape[1],
    feature_types=['unconstrained'] * X_train.shape[1],
    hidden_dims=[16]  # Single layer!
)

# Build
_ = model(X_train[:1])

print(f"\nüèóÔ∏è Model Built:")
print(f"   Architecture: Single-layer NAM")
print(f"   Parameters: {model.count_params():,}")
print(f"   Explainability: HIGH ‚òÖ‚òÖ‚òÖ‚òÖ‚òÜ")

# Show model summary
model.summary()

## üéì Section 5: Train the Model

**Training Strategy:**
- Optimizer: Adam (lr=0.001)
- Early stopping (patience=30)
- Learning rate reduction on plateau
- Model checkpointing

**Watch the convergence!**

In [None]:
from src.utils.config import load_config

# Load training config
training_config = load_config('configs/training_config.yaml')

# Create trainer
trainer = NAMTrainer(model, training_config['training'])

# Train!
print("üöÄ Starting training...")
history = trainer.train(train_data, val_data, epochs=50)

print(f"\n‚úì Training complete!")
print(f"   Best val_loss: {min(history.history['val_loss']):.4f}")
print(f"   Total epochs: {len(history.history['loss'])}")

## üìà Section 6: Visualize Training (Interactive!)

**Interactive Plotly charts** - zoom, pan, hover for details!

In [None]:
# Create interactive training history
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Training & Validation Loss', 'Mean Absolute Error')
)

# Loss curves
epochs = range(1, len(history.history['loss']) + 1)
fig.add_trace(
    go.Scatter(x=list(epochs), y=history.history['loss'], name='Train Loss',
               line=dict(color='#2E86AB', width=3)),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=list(epochs), y=history.history['val_loss'], name='Val Loss',
               line=dict(color='#A23B72', width=3)),
    row=1, col=1
)

# MAE curves
fig.add_trace(
    go.Scatter(x=list(epochs), y=history.history['mae'], name='Train MAE',
               line=dict(color='#2E86AB', width=3)),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=list(epochs), y=history.history['val_mae'], name='Val MAE',
               line=dict(color='#A23B72', width=3)),
    row=1, col=2
)

fig.update_layout(height=500, title_text="Training History (Interactive - Try Hovering!)")
fig.show()

print(f"üí° Hover over the lines to see exact values!")
print(f"üí° Double-click legend to isolate a curve!")

## üéØ Section 7: Make Predictions on Test Set (38 Days!)

**This shows the complete time series trend - a key advantage of daily data!**

In [None]:
# Get test predictions
X_test, y_test = NAMTrainer.prepare_data_for_keras(test_data)
predictions = model.predict(X_test).flatten()

# Inverse transform (if needed)
# For simplicity, we'll use scaled values here
# (See main_daily.py for full inverse transform)

test_dates = test_data['Date'].values

print(f"üìä Test Predictions:")
print(f"   Test samples: {len(predictions)}")
print(f"   Date range: {test_dates[0]} to {test_dates[-1]}")
print(f"   ‚úì Complete trend visible with {len(predictions)} daily points!")

In [None]:
# Interactive time series visualization
fig = go.Figure()

# Actual
fig.add_trace(go.Scatter(
    x=test_dates,
    y=y_test,
    mode='lines+markers',
    name='Actual',
    line=dict(color='#2E86AB', width=3),
    marker=dict(size=8),
    hovertemplate='<b>Actual</b><br>Date: %{x}<br>GMV: %{y:.3f}<extra></extra>'
))

# Predicted
fig.add_trace(go.Scatter(
    x=test_dates,
    y=predictions,
    mode='lines+markers',
    name='Predicted',
    line=dict(color='#A23B72', width=2, dash='dash'),
    marker=dict(size=6, symbol='square'),
    hovertemplate='<b>Predicted</b><br>Date: %{x}<br>GMV: %{y:.3f}<extra></extra>'
))

fig.update_layout(
    title='38-Day Test Period: Actual vs Predicted (Interactive!)',
    xaxis_title='Date',
    yaxis_title='GMV (Scaled)',
    height=600,
    hovermode='x unified'
)

fig.show()

print("\nüí° Try these interactions:")
print("   - Hover to see exact values")
print("   - Click and drag to zoom")
print("   - Double-click to reset zoom")
print("   - Click legend to show/hide series")

## üìä Section 8: Calculate Advanced Metrics

**Beyond R¬≤ and MAPE - industry-standard KPIs!**

In [None]:
from src.evaluation.advanced_metrics import compute_all_metrics

# Compute all metrics
metrics = compute_all_metrics(y_test, predictions)

# Display as table
metrics_df = pd.DataFrame([
    {'Metric': 'R¬≤ Score', 'Value': f"{metrics['r2']:.4f}"},
    {'Metric': 'MAE', 'Value': f"{metrics['mae']:.4f}"},
    {'Metric': 'RMSE', 'Value': f"{metrics['rmse']:.4f}"},
    {'Metric': 'MAPE', 'Value': f"{metrics['mape']:.2f}%"},
    {'Metric': 'Weighted MAPE', 'Value': f"{metrics['wmape']:.2f}%"},
    {'Metric': 'Symmetric MAPE', 'Value': f"{metrics['smape']:.2f}%"},
    {'Metric': 'Bias %', 'Value': f"{metrics['bias_pct']:.2f}%"},
])

print("üìä Comprehensive Metrics:")
print(metrics_df.to_string(index=False))

# Highlight
if metrics['r2'] > 0.3:
    print(f"\n‚úì R¬≤ = {metrics['r2']:.3f} indicates good learning!")
if metrics['smape'] < 50:
    print(f"‚úì sMAPE = {metrics['smape']:.1f}% is acceptable for forecasting!")

## üî¨ Section 9: NAM Explainability - Feature Contributions

**The POWER of NAM:** We can extract how much each feature contributes!

**Example:** "Price contributed -$X to GMV" vs "Marketing contributed +$Y"

In [None]:
# Extract feature contributions
if hasattr(model, 'get_feature_contributions'):
    contributions = model.get_feature_contributions(X_test)
    
    # Average contribution per feature
    feature_names = [col for col in data_scaled.select_dtypes(include=[np.number]).columns 
                     if col != 'total_gmv_log']
    
    avg_contributions = {}
    for i, contrib in enumerate(contributions[:len(feature_names)]):
        avg_contributions[feature_names[i]] = np.mean(contrib)
    
    # Plot
    contrib_df = pd.DataFrame(list(avg_contributions.items()), 
                              columns=['Feature', 'Contribution'])
    contrib_df = contrib_df.sort_values('Contribution', ascending=True)
    
    fig = go.Figure(go.Bar(
        x=contrib_df['Contribution'],
        y=contrib_df['Feature'],
        orientation='h',
        marker=dict(color=contrib_df['Contribution'], 
                   colorscale='RdBu', cmid=0)
    ))
    
    fig.update_layout(
        title='Average Feature Contributions',
        xaxis_title='Contribution to GMV',
        yaxis_title='Feature',
        height=400
    )
    
    fig.show()
    
    print("\nüí° This is NAM's interpretability advantage!")
    print("   Red bars: Positive contribution")
    print("   Blue bars: Negative contribution")
else:
    print("Note: Feature contribution extraction available in full model")

## üé® Section 10: Elasticity Curves

**Business Question:** "How does GMV change if I adjust price by 10%?"

**NAM Answer:** Plot the learned curve and find optimal point!

In [None]:
# Extract elasticity for a feature
def plot_feature_elasticity(model, X_baseline, feature_idx, feature_name):
    """Plot how GMV changes as we vary one feature"""
    
    # Vary feature from -3 to +3 (scaled range)
    feature_values = np.linspace(-3, 3, 100)
    predictions_range = []
    
    for val in feature_values:
        X_test = X_baseline.copy()
        X_test[:, feature_idx] = val
        pred = model.predict(X_test, verbose=0).flatten()[0]
        predictions_range.append(pred)
    
    # Plot
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=feature_values,
        y=predictions_range,
        mode='lines',
        name='Elasticity Curve',
        line=dict(color='#06A77D', width=4)
    ))
    
    # Mark current value
    current_val = X_baseline[0, feature_idx]
    current_pred = predictions_range[np.argmin(np.abs(feature_values - current_val))]
    fig.add_trace(go.Scatter(
        x=[current_val],
        y=[current_pred],
        mode='markers',
        name='Current Point',
        marker=dict(size=15, color='red', symbol='star')
    ))
    
    # Mark optimal
    optimal_idx = np.argmax(predictions_range)
    fig.add_trace(go.Scatter(
        x=[feature_values[optimal_idx]],
        y=[predictions_range[optimal_idx]],
        mode='markers',
        name='Optimal Point',
        marker=dict(size=15, color='gold', symbol='diamond')
    ))
    
    fig.update_layout(
        title=f'Elasticity Curve: {feature_name}',
        xaxis_title=f'{feature_name} (Scaled)',
        yaxis_title='GMV Contribution',
        height=500
    )
    
    return fig

# Example: Plot elasticity for first feature
X_baseline = np.median(X_train, axis=0, keepdims=True)
feature_names = [col for col in data_scaled.select_dtypes(include=[np.number]).columns 
                 if col != 'total_gmv_log']

if len(feature_names) > 0:
    fig = plot_feature_elasticity(model, X_baseline, 0, feature_names[0])
    fig.show()
    
    print(f"\nüí° Interpretation:")
    print(f"   - Red star: Current feature value")
    print(f"   - Gold diamond: Optimal value for max GMV")
    print(f"   - Curve shape: How GMV responds to this feature")

## üìö Section 11: Student Exercises

**Try These:**

1. **Experiment with architecture:**
   - Try `hidden_dims=[8]` (simpler) or `[32]` (more complex)
   - Compare R¬≤ scores

2. **Different train/test splits:**
   - Try 60/20/20 split
   - How does it affect performance?

3. **Feature importance:**
   - Extract contributions for all features
   - Rank by importance
   - Which drives GMV most?

4. **Elasticity analysis:**
   - Plot curves for all features
   - Find optimal points
   - Calculate revenue impact

5. **Walk-forward validation:**
   - Enable in config
   - Run with 10-day holdouts
   - Analyze robustness

## üéì Key Takeaways

**What You Learned:**
1. ‚úÖ NAM provides interpretability via additive structure
2. ‚úÖ Daily data (250 records) >> Monthly data (12 records)
3. ‚úÖ Single-layer [16] balances explainability & performance
4. ‚úÖ Proper scaling is critical (log + StandardScaler)
5. ‚úÖ 38 test points give clear trend visualization
6. ‚úÖ Can extract feature contributions and elasticities

**Why NAM for Business:**
- Explainable predictions (regulatory compliance)
- Feature contribution curves (investment decisions)
- Elasticity analysis (pricing optimization)
- No black-box problem (stakeholder trust)

**Next Steps:**
- Try exercises above
- Explore `main_daily.py` for production code
- Read `FINAL_SUMMARY.md` for complete system details
- Check `START_HERE.md` for quick reference

## üìñ Additional Resources

**Documentation in this repo:**
- `START_HERE.md` - Quick start guide
- `FINAL_SUMMARY.md` - Complete technical summary
- `HOW_TO_RUN_VISUALIZATIONS.md` - Visualization guide
- `INTERACTIVE_VISUALIZATION_GUIDE.md` - Plotly dashboards

**Academic Papers:**
- Agarwal et al. (2021) "Neural Additive Models: Interpretable Machine Learning with Neural Nets"
- Original NAM paper from Google Research

**Questions?** Check the documentation or experiment with the code!