# 05 — Executive Summary & Key Findings

This notebook summarises the entire analysis in a format scannable by a hiring manager in ~3 minutes.

---

## Problem

Which game genres, tag combinations, and price points represent the highest-revenue opportunities for publishers entering the Steam market?

In [None]:
import json
import sys
from pathlib import Path

import pandas as pd
from IPython.display import Markdown, display

sys.path.insert(0, str(Path.cwd().parent))

PROCESSED_DIR = Path("../data/processed")
RESULTS_DIR = Path("../results")

## Data Overview

In [None]:
# Data quality summary
report_path = PROCESSED_DIR / "data_quality_report.json"
if report_path.exists():
    with open(report_path) as f:
        report = json.load(f)
    display(Markdown(f"""
| Metric | Value |
|--------|-------|
| Total users collected | {report.get('users_total', 'N/A'):,} |
| Total unique games | {report.get('games_total', 'N/A'):,} |
| Median games per user | {report.get('median_games_per_user', 'N/A')} |
| RAWG metadata match rate | {report.get('rawg_match_rate', 0):.1%} |
| Median playtime (hours) | {report.get('playtime_median_hrs', 'N/A')} |
| Zero-playtime rate | {report.get('playtime_zero_pct', 0):.1%} |
"""))
else:
    print("Data quality report not yet generated.")

## Finding 1: Top Underserved Market Niches

In [None]:
niches_path = RESULTS_DIR / "tables" / "top_niches.csv"
if niches_path.exists():
    niches = pd.read_csv(niches_path)
    display(Markdown("### Top 10 Market Opportunities"))
    display(niches.head(10))
else:
    print("Run notebook 04 to generate niche analysis.")

## Finding 2: Recommendation Engine Performance

In [None]:
display(Markdown("""
*To be populated after model training:*

| Segment | P@10 | NDCG@10 | Revenue-Weighted HR@10 |
|---------|------|---------|------------------------|
| All users | — | — | — |
| Warm items (≥100 interactions) | — | — | — |
| Cold-start items (<100 interactions) | — | — | — |
| Popularity baseline | — | — | — |

The hybrid model improves cold-start recommendations by X% over the popularity baseline,
demonstrating that content features provide signal where collaborative filtering lacks data.
"""))

## Finding 3: Price Sensitivity

In [None]:
elasticities_path = RESULTS_DIR / "tables" / "genre_elasticities.csv"
if elasticities_path.exists():
    elasticities = pd.read_csv(elasticities_path)
    display(Markdown("### Price Sensitivity by Genre"))
    display(elasticities[["genre", "n_games", "price_pct_per_dollar", "r_squared", "interpretation"]])
else:
    print("Run notebook 04 to generate price analysis.")

## Finding 4: Revenue Estimates for New Entrants

In [None]:
estimates_path = RESULTS_DIR / "tables" / "revenue_estimates.json"
if estimates_path.exists():
    with open(estimates_path) as f:
        estimates = json.load(f)
    for est in estimates[:5]:
        display(Markdown(f"""
### {est['niche']}
- **Existing games:** {est['num_existing_games']}
- **Total players:** {est['total_estimated_players']:,}
- **Revenue range:** ${est['revenue_low']:,.0f} – ${est['revenue_high']:,.0f}
- **Avg price point:** ${est['avg_price']:.2f}
- **Recency trend:** {est['recency_multiplier']:.2f}x
"""))
else:
    print("Run notebook 04 to generate revenue estimates.")

## Action Items

*To be populated with specific, actionable recommendations after analysis:*

1. Publishers targeting **[niche X]** can expect $A–$B revenue at the $C price point
2. The $10–15 price range maximises total revenue for indie **[genre Y]**
3. **[Niche Z]** shows a strong recency trend (Nx) — newer titles outperform, suggesting growing demand

## Assumptions & Limitations

- All revenue estimates derived from SteamSpy owner midpoints × current price (not historical)
- Owner estimates carry ±20–30% uncertainty for smaller titles
- Friend-graph crawling introduces selection bias toward socially connected users
- Price analysis is observational, not causal — cannot claim causal pricing effects
- RAWG metadata coverage is not 100% — very small/niche titles may be underrepresented

---

## Key Visualisations

See `results/figures/` for all exported charts:
- `niche_bubble_chart.html` — interactive supply vs. demand
- `revenue_range_comparison.png` — revenue potential by niche
- `niche_metrics_heatmap.png` — multi-metric scorecard
- `genre_cooccurrence_heatmap.png` — genre landscape
- `revenue_by_genre_violin.png` — revenue distribution
- `playtime_vs_owners_scatter.png` — engagement vs. scale
- `releases_over_time.png` — genre growth trends