# 04 — Market Gap Analysis & Price Sensitivity

**This is the headline deliverable.** We identify underserved niches where player demand is high but game supply is low, and quantify the revenue opportunity for a new entrant.

We also model price sensitivity across genres to answer: what price point maximises total revenue?

In [None]:
import json
import sys
from pathlib import Path

import pandas as pd

sys.path.insert(0, str(Path.cwd().parent))

from src.models.market_gaps import (
    build_opportunity_table,
    compute_recency_trend,
    estimate_new_entrant_revenue,
    score_niches,
)
from src.models.price_analysis import (
    compute_genre_elasticities,
    compute_price_segments,
    find_optimal_price_range,
    fit_price_model,
)
from src.processing.features import build_niche_descriptors
from src.visualisation.niche_explorer import (
    plot_niche_bubble_chart,
    plot_niche_metrics_heatmap,
    plot_opportunity_distribution,
    plot_revenue_range_comparison,
)

PROCESSED_DIR = Path("../data/processed")
RESULTS_DIR = Path("../results")

In [None]:
games = pd.read_json(PROCESSED_DIR / "games.json", lines=True)
print(f"Games loaded: {len(games):,}")

## Part 1: Niche Identification

For every pairwise and triple-wise tag combination with ≥5 games, we compute:

| Metric | Definition |
|--------|------------|
| **Supply** | Number of games in this niche |
| **Demand** | Total estimated owners across all games |
| **Engagement** | Median playtime per owner |
| **Satisfaction** | Median review score |
| **Revenue** | Median (owners × price) per game |
| **Opportunity** | (demand × engagement × satisfaction) / supply |

In [None]:
# Build niche descriptors
niche_df = build_niche_descriptors(games, max_combo_size=3, min_games_per_niche=5)
print(f"Total niches identified: {len(niche_df):,}")

# Score and rank
scored = score_niches(niche_df)
print(f"\nTop 10 niches by opportunity score:")
scored.head(10)[["rank", "niche", "supply", "demand_proxy", "engagement", "satisfaction", "median_revenue", "opportunity_score"]]

In [None]:
# Visualise
plot_niche_bubble_chart(scored, top_n=30)
plot_opportunity_distribution(scored)
plot_revenue_range_comparison(scored, top_n=15)
plot_niche_metrics_heatmap(scored, top_n=20)

## Part 2: Revenue Estimates for Top Niches

For each top niche, we estimate the revenue a new entrant could expect — adjusted for recency trends.

In [None]:
estimates = []
for _, row in scored.head(10).iterrows():
    tags = row["niche"].split(" + ")
    recency = compute_recency_trend(games, tags)
    est = estimate_new_entrant_revenue(row, recency_multiplier=recency)
    estimates.append(est)
    print(f"\n--- {est['niche']} ---")
    print(f"  Games in niche:    {est['num_existing_games']}")
    print(f"  Total players:     {est['total_estimated_players']:,}")
    print(f"  Revenue estimate:  ${est['revenue_low']:,.0f} – ${est['revenue_high']:,.0f}")
    print(f"  Recency trend:     {recency:.2f}x")

## Part 3: Price Sensitivity Analysis

### Price Segments by Genre

In [None]:
segments = compute_price_segments(games)
print(f"Price segments: {len(segments)} genre × price combinations")
segments.head(20)

### Log-Linear Price Model

**Caveat:** This is observational, not causal. Higher-quality games tend to be priced higher, creating endogeneity. We cannot claim "lowering price by $5 will increase sales by X%." We can say: "Games priced at $X–$Y tend to have higher total revenue."

In [None]:
model_result = fit_price_model(games)
print(f"R²: {model_result['r_squared']}")
print(f"N:  {model_result['n_observations']:,}")
print(f"\nCoefficients:")
for name, coef in model_result["coefficients"].items():
    print(f"  {name}: {coef}")
print(f"\nInterpretation: {model_result['interpretation']['price_effect']}")
print(f"Caveat: {model_result['interpretation']['caveat']}")

### Genre-Specific Elasticities

In [None]:
elasticities = compute_genre_elasticities(games)
print("Price elasticity by genre (sorted by sensitivity):")
elasticities

## Save Results

In [None]:
tables_dir = RESULTS_DIR / "tables"
tables_dir.mkdir(parents=True, exist_ok=True)

# Opportunity table
opportunity = build_opportunity_table(scored, top_n=25)
opportunity.to_csv(tables_dir / "top_niches.csv", index=False)

# Price segments
segments.to_csv(tables_dir / "price_segments.csv", index=False)

# Elasticities
elasticities.to_csv(tables_dir / "genre_elasticities.csv", index=False)

# Revenue estimates
with open(tables_dir / "revenue_estimates.json", "w") as f:
    json.dump(estimates, f, indent=2)

print(f"Results saved to {tables_dir}")