# Notebook 5: Model Evaluation & Insights

## Purpose
This notebook provides a comprehensive evaluation of the best performing model from Notebook 4, performs detailed error analysis, extracts business insights, and creates compelling visualizations for portfolio presentation.

## Objectives
1. Load best model and test data
2. Perform detailed evaluation on test set
3. Conduct error analysis (where does the model fail?)
4. Extract and visualize feature importance
5. Test model on edge cases and outliers
6. Answer business questions using model insights
7. Create compelling visualizations for portfolio
8. Document limitations and future improvements
9. Provide actionable business recommendations

## Key Questions to Answer
- Which features matter most for box office prediction?
- How does budget impact revenue? (linear or non-linear?)
- What's the value of having A-list actors?
- How much does release timing matter?
- Are sequels reliably more valuable?
- What types of movies does the model struggle with?

## Deliverables
- Feature importance visualization
- Actual vs Predicted scatter plot
- Error distribution analysis
- Business insights and recommendations
- Limitations and future work documentation

## Notes
- This is the "presentation" notebook - keep it clean and polished
- Focus on insights and interpretation, not experimentation
- All visualizations should be publication-quality
- Document any surprising or counter-intuitive findings

---
## Setup and Imports

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Visualization settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 12

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)

---
## Load Best Model and Data

In [None]:
# Load saved model
# with open('models/best_model.pkl', 'rb') as f:
#     model = pickle.load(f)

# Load test data
# X_test, y_test = ...

---
## Detailed Model Evaluation

### Overall Performance Metrics

In [None]:
# Calculate and display:
# - R-squared
# - MAE (in millions)
# - RMSE (in millions)
# - Percentage of predictions within 20% of actual
# - Percentage of predictions within $50M of actual

### Actual vs Predicted Visualization

In [None]:
# Create scatter plot of actual vs predicted revenue
# Add perfect prediction line (y=x)
# Color points by genre or budget tier
# Highlight notable outliers with labels

---
## Error Analysis

### Error Distribution

In [None]:
# Plot distribution of prediction errors
# Identify if errors are normally distributed
# Check for systematic bias (over/under prediction)

### Biggest Prediction Errors

In [None]:
# Identify top 10 over-predictions and under-predictions
# Analyze common characteristics
# What types of movies does the model struggle with?

### Error by Category

In [None]:
# Average absolute error by:
# - Genre
# - Budget category
# - Release month
# - Sequel vs original
# Identify where model performs best/worst

---
## Feature Importance Analysis

### Top 15 Most Important Features

In [None]:
# Extract feature importance from model
# Create horizontal bar chart of top 15 features
# Compare with correlation findings from EDA

### Feature Impact Analysis

In [None]:
# For top features, show how they impact predictions:
# - Budget vs predicted revenue
# - Star power vs predicted revenue
# - Release month vs predicted revenue

---
## Business Insights

### 1. Budget Impact

In [None]:
# How does budget affect revenue?
# Is the relationship linear or non-linear?
# What's the ROI for different budget tiers?

### 2. Star Power Value

In [None]:
# What's the revenue impact of A-list actors?
# Is director reputation or lead actor more important?

### 3. Release Timing Strategy

In [None]:
# Best and worst months for releases
# Value of summer/holiday releases
# Impact of competition

### 4. Sequel Premium

In [None]:
# Do sequels consistently outperform originals?
# At what installment does franchise fatigue set in?

### 5. Genre Performance

In [None]:
# Which genres have highest predicted revenue?
# Which genres are most predictable?

---
## Edge Case Analysis

In [None]:
# Test model on specific scenarios:
# - Very low budget indie films
# - $200M+ blockbusters
# - First-time directors
# - Movies with no A-list actors
# - R-rated action films

---
## Model Limitations

### Documented Limitations

1. **Data Constraints**
   - Limited to 2010-2024 (may not capture long-term trends)
   - Missing data for some variables (trailers, awards)
   - Survivorship bias (only includes movies with reported revenue)

2. **Feature Limitations**
   - Cannot capture word-of-mouth or critical reception
   - Marketing spend not included (data unavailable)
   - Subjective quality not quantified

3. **Model Assumptions**
   - Assumes past patterns continue (may not hold during disruptions)
   - Struggles with unprecedented events (COVID, streaming shift)
   - Cannot predict breakout hits with no historical precedent

4. **Use Case Constraints**
   - Best for wide releases, less accurate for limited releases
   - Domestic box office only (international revenue not modeled)
   - Pre-release prediction (cannot incorporate opening weekend buzz)

---
## Future Improvements

### Potential Enhancements

1. **Additional Data Sources**
   - Social media sentiment analysis (Twitter/X, Reddit)
   - Marketing budget data
   - Early review scores (Rotten Tomatoes embargo lift)
   - Google Trends search volume

2. **Feature Engineering**
   - NLP on plot summaries to identify themes
   - Director/actor collaboration history
   - Studio track record with similar films
   - Weather data for release weekend

3. **Modeling Improvements**
   - Ensemble methods combining multiple models
   - Neural networks for complex interactions
   - Separate models for different genres/budget tiers
   - Time series components for trend detection

4. **Expanded Scope**
   - International box office prediction
   - Opening weekend vs total gross (two-stage model)
   - Streaming performance prediction
   - Post-release update model (incorporate early weekend data)

---
## Business Recommendations

### Actionable Insights for Studios

Based on model findings, provide specific recommendations:

1. **Budget Allocation**
   - [Insert finding on optimal budget ranges]
   
2. **Release Strategy**
   - [Insert finding on best release windows]
   
3. **Talent Decisions**
   - [Insert finding on star power ROI]
   
4. **Genre Focus**
   - [Insert finding on genre performance]
   
5. **Risk Assessment**
   - [Insert finding on predictable vs risky projects]

---
## Final Visualizations for Portfolio

In [None]:
# Create 3-5 compelling, publication-quality visualizations:
# 1. Actual vs Predicted scatter (main result)
# 2. Feature importance bar chart
# 3. Error analysis by category
# 4. Budget vs Revenue relationship
# 5. Model comparison (from Notebook 4)

---
## Conclusion

### Project Summary

**Objective:** Predict movie box office revenue using pre-release data

**Dataset:** [X] movies from 2010-2024

**Best Model:** [Model name] with:
- R² = [value]
- MAE = $[value]M
- RMSE = $[value]M

**Success Criteria Met:** [Yes/No - R² > 0.70, MAE < $25M]

**Key Findings:**
1. [Top finding]
2. [Second finding]
3. [Third finding]

**Most Important Features:**
1. [Top feature]
2. [Second feature]
3. [Third feature]

**Limitations:** [Brief summary]

**Future Work:** [Brief summary]