AI Sports Sponsorship Intelligence Platform
WorldCupROI blends match performance, media attention, fan behavior, sponsor investment, scenario simulation, and uncertainty risk into one sponsor ROI decision platform. The goal is not only to predict football results, but to help answer: which sponsorship strategy should a brand choose, under what risk, and why?
| Link | Target |
|---|---|
| Live Demo | make dashboard |
| Static Dashboard | dashboard/panel_dashboard.html |
| Executive Summary | reports/executive_summary.pdf |
| Business Report | reports/business_insights.md |
| Data Card | reports/data_card.md |
| Model Card | reports/model_card.md |
| Deployment Guide | docs/deployment.md |
Core result snapshot
| Area | Current value |
|---|---|
| Platform health score | 100 / 100 |
| Match accuracy | 0.5566 |
| Match log loss | 0.9780 |
| Sponsor ROI MAE | 0.1177 |
| Sponsor ROI R2 | 0.8838 |
| Match conformal coverage | 0.9021 |
| ROI interval coverage | 0.8814 |
| Average Monte Carlo std | 0.1320 |
WorldCupROI is a reproducible sports sponsorship analytics project with four layers:
| Layer | What it does | Business value |
|---|---|---|
| Data intelligence | Separates real historical data, real text data, and proxy/mock commercial data | Makes data boundaries visible before decisions |
| ML modeling | Trains match outcome and sponsor ROI models with validation outputs | Converts sports and attention signals into measurable ROI forecasts |
| Risk and explainability | Adds SHAP-style drivers, conformal intervals, Monte Carlo risk, and scenario lift | Turns point estimates into defensible decisions |
| Product dashboard | Discover -> Explain -> Predict -> Simulate -> Recommend | Makes the work usable by analysts and business reviewers |
What it shows: Compares trained baseline and benchmark models on primary evaluation metrics.
Why it matters: It shows whether the current model choice is a stable baseline or only a placeholder.
Business takeaway: Use the benchmark spread to decide which model family deserves production tuning first.
| Task | Model | Metric | Value |
|---|---|---|---|
| Match outcome | Centroid classifier | Accuracy | 0.5566 |
| Match outcome | Centroid classifier | Log loss | 0.9780 |
| Sponsor ROI | Ridge regression | MAE | 0.1177 |
| Sponsor ROI | Ridge regression | R2 | 0.8838 |
What it shows: Ranks the strongest sponsor ROI drivers using SHAP-style feature contribution scores.
Why it matters: Explainability keeps ROI recommendations auditable and helps detect proxy-label overdependence.
Business takeaway: Improve brand heat, sponsor-team fit, media exposure, and activation quality before scaling spend.
What it shows: Ranks sponsors by predicted commercial ROI and network influence evidence.
Why it matters: A sponsor can look attractive because expected ROI is high or because relationship influence is broad.
Business takeaway: Prioritize sponsors that combine high ROI with strong team-player-network leverage.
What it shows: Shows conservative, balanced, and aggressive strategy lift against the baseline.
Why it matters: Scenario analysis turns the model from prediction into a decision simulator.
Business takeaway: Select aggressive strategies only when lift is positive and risk remains tolerable.
What it shows: Displays ROI point estimates with conformal-style prediction intervals.
Why it matters: Prediction intervals show forecast reliability, not just expected value.
Business takeaway: Prefer narrow-interval opportunities when sponsor budgets are constrained.
What it shows: Shows the distribution of Monte Carlo ROI standard deviation and risk scores.
Why it matters: The spread of risk is often more important than average ROI for sponsorship planning.
Business takeaway: Use high-risk tails as triggers for staged spend, insurance clauses, or additional analyst review.
What it shows: Visualizes sponsor, team, and player relationships as a weighted commercial graph.
Why it matters: Graph position captures activation leverage that flat tables miss.
Business takeaway: Use central sponsors and teams as anchor partnerships for campaign portfolios.
What it shows: Shows future sponsor ROI forecasts across the 2026, 2030, and 2034 World Cup cycles.
Why it matters: It makes time dependence visible instead of treating every tournament as the same planning context.
Business takeaway: Use the trend as a budget planning prior, then review uncertainty before committing long-cycle spend.
What it shows: Compares ROI deltas for positive sentiment spikes, stage attention shocks, and baseline attention events.
Why it matters: Sentiment can change conversion quality even when media exposure is high.
Business takeaway: Prepare contingency messaging and spend limits around high-attention negative events.
What it shows: Maps risk-adjusted ROI under different sponsor budgets and media multiplier combinations.
Why it matters: Resource optimization converts model output into a concrete allocation recommendation.
Business takeaway: Scale spend where the sensitivity surface is high and stable, not only where raw ROI is high.
What it shows: Ranks sponsor nodes by graph attention-style contribution to ROI.
Why it matters: It explains relationship leverage beyond flat sponsor ranking or tabular SHAP alone.
Business takeaway: Use high-contribution sponsors as anchor nodes in portfolio planning.
What it shows: Stress-tests key player injury, sentiment crisis, sponsor policy change, and positive viral upside scenarios.
Why it matters: Extreme cases reveal downside intervals that average ROI hides.
Business takeaway: Pre-approve response playbooks before the tournament starts.
What it shows: Combines ROI, media exposure value, fan conversion, social spread, and brand influence.
Why it matters: Sponsor decisions are multi-objective; ROI alone is too narrow for portfolio planning.
Business takeaway: Prioritize high composite score opportunities, then review interval width and graph influence.
What it shows: Visualizes sponsor and player influence pathways from the heterogeneous commercial graph.
Why it matters: Player and sponsor influence can amplify or weaken projected ROI under the same match context.
Business takeaway: Pair high-influence sponsors with resilient player/team nodes before selecting activation themes.
Most football analytics projects stop at predicting who wins. Sponsorship decisions need more: media exposure, fan attention, brand fit, player availability, commercial momentum, downside risk, and an explanation a non-technical stakeholder can trust.
WorldCupROI frames the World Cup as an attention market where sponsor ROI depends on both match context and business activation.
Tournament sponsorship budgets are committed before all outcomes are known. A high-profile campaign can underperform if the model ignores uncertainty, audience behavior, or sponsor-team fit.
| Audience | Value |
|---|---|
| Sports business analysts | Compare ROI, risk, sponsor fit, and scenario lift |
| ML reviewers | Inspect model cards, validation, feature importance, and leakage risks |
| Researchers | Study links between match performance, text signals, user attention, and ROI |
| Product reviewers | Open a dashboard and reproduce the analysis end to end |
| Innovation | Implementation |
|---|---|
| Data boundary documentation | reports/data_card.md, reports/data_quality_report.md |
| Generalization checks | K-fold, sub-sample, and temporal sliding validation in reports/cross_validation_summary.csv |
| User research chain | Media exposure -> user attention -> social interaction -> sponsor conversion |
| Explainable ROI modeling | SHAP-style feature contributions and grouped driver reports |
| Risk-aware decisions | Conformal intervals, bootstrap intervals, Monte Carlo risk, scenario ranking |
| Dynamic ROI and sentiment impact | Future cycle ROI forecast plus key event sentiment ROI deltas |
| Resource allocation | Budget/media mix optimization and sensitivity analysis |
| Extreme scenario planning | Key-player injury, sentiment crisis, policy change, and viral upside stress tests |
| Graph intelligence | NetworkX centrality plus reproducible GCN/GraphSAGE-style and graph-attention contribution scores |
| Commercial decision score | Media value, fan conversion, social spread, brand influence, and ROI combined |
| Productized workflow | Dashboard pages: Discover -> Explain -> Predict -> Simulate -> Recommend |
- How much do match strength, player availability, and tournament stage change sponsor ROI?
- Do media narratives and fan behavior improve ROI analysis beyond match results?
- Which sponsor features create the strongest ROI lift under uncertainty?
- How stable are ROI predictions under cross-validation and subsample checks?
- Can graph centrality reveal sponsor-team-player influence patterns?
- How can a dashboard convert model output into a business recommendation?
| Data category | Examples | Trust level | Boundary |
|---|---|---|---|
| Real historical data | International match records, World Cup history | Medium-high | Public historical sports facts |
| Real text data | GDELT/Wikimedia style article metadata and text windows | Medium | Real-source text, lightweight NLP features |
| Proxy/mock commercial data | Sponsor spend, ad exposure, activation quality, conversion proxy | Medium-low | Reproducible demo data, not audited revenue |
| Derived model outputs | Predicted ROI, risk score, scenario lift | Model-dependent | Decision support only |
Detailed documentation:
reports/data_card.md
reports/data_quality_report.md
docs/data_card.md
Deep analysis landing artifacts:
reports/deep_analysis_landing_report.md
reports/deep_analysis_landing_report.pdf
reports/future_roi_forecast.csv
reports/sentiment_event_roi_impact.csv
reports/resource_optimization_recommendations.csv
reports/extreme_scenario_roi_risk.csv
data/commercial_decision_metrics.csv
assets/figures/deep_analysis_figure_notes.md
Validation is generated by src/model_validation.py and saved to reports/cross_validation_summary.csv. It now includes random k-fold validation, sub-sample sensitivity checks, and tournament-era temporal sliding validation.
| Validation | Task | Model | Metric | Folds | Mean | Std | Min | Max |
|---|---|---|---|---|---|---|---|---|
| kfold | match_outcome | CentroidOutcomeModel | accuracy | 5 | 0.5436 | 0.0389 | 0.5026 | 0.6010 |
| kfold | sponsor_roi | RidgeROIModel | r2 | 5 | 0.8836 | 0.0126 | 0.8680 | 0.9026 |
| subsample_70pct | match_outcome | CentroidOutcomeModel | accuracy | 1 | 0.5552 | 0.0000 | 0.5552 | 0.5552 |
| subsample_70pct | sponsor_roi | RidgeROIModel | r2 | 1 | 0.8813 | 0.0000 | 0.8813 | 0.8813 |
| temporal_train_to_2014_test_2018 | match_outcome | CentroidOutcomeModel | accuracy | 1 | 0.6094 | 0.0000 | 0.6094 | 0.6094 |
| temporal_train_to_2018_test_2022 | sponsor_roi | RidgeROIModel | r2 | 1 | 0.8885 | 0.0000 | 0.8885 | 0.8885 |
Model governance:
reports/model_card.md
reports/match_outcome_model_card.md
reports/sponsor_roi_model_card.md
Explainability artifacts:
reports/roi_feature_importance.csv
reports/roi_driver_explanations.csv
reports/explainability_report.md
assets/figures/roi_feature_importance_shap.png
The ROI explanation layer is designed for business review: it connects model output to sponsor spend, brand heat, media exposure, FanScore, stage premium, player influence, and sponsor-team fit.
| Reliability layer | Output | Current value |
|---|---|---|
| Match conformal prediction | Coverage rate | 0.9021 |
| Match conformal prediction | Average set size | 2.3814 |
| ROI conformal prediction | Coverage rate | 0.8557 |
| ROI conformal prediction | Average interval width | 0.4745 |
| Monte Carlo risk | Average std | 0.1320 |
| Monte Carlo risk | Medium-risk cases | 119 |
Risk artifacts:
data/roi_uncertainty.csv
reports/conformal_prediction_report.md
reports/uncertainty_summary.md
assets/figures/monte_carlo_risk_distribution.png
assets/figures/prediction_interval_conformal.png
WorldCupROI supports conservative, balanced, and aggressive sponsor strategies. Each scenario includes ROI, lift, risk score, confidence interval, recommendation reason, and rank.
| Strategy | Intended use |
|---|---|
| Conservative | Reduce downside when uncertainty is high |
| Balanced | Default planning mode for stable sponsor activation |
| Aggressive | Capture high-attention stages when upside justifies risk |
Generated artifacts:
data/scenario_recommendations.csv
reports/scenario_ranking.md
reports/scenario_strategy_summary.csv
assets/figures/scenario_roi_lift.png
The graph layer upgrades a flat sponsor table into a heterogeneous team-player-sponsor-match network.
| Graph output | File |
|---|---|
| Node centrality | reports/graph_node_centrality.csv |
| Sponsor influence | reports/sponsor_influence_scores.csv |
| Player influence | reports/player_commercial_influence.csv |
| GCN / GraphSAGE baseline | reports/gnn_baseline_node_scores.csv |
| GNN + SHAP bridge | reports/gnn_explainability_bridge.md |
| Graph report | reports/graph_analysis_report.md |
| Network figure | assets/figures/sponsor_team_player_network.png |
The current graph baseline uses centrality features, weighted two-hop propagation, and a GraphSAGE-style neighbor aggregation score. It is not a production neural GNN yet, but it gives reviewers a reproducible bridge from relationship structure to sponsor/player influence and SHAP-style ROI drivers.
What it shows: The full platform architecture from data sources to features, models, risk controls, reports, and dashboard delivery.
Why it matters: Reviewers can understand how data, modeling, uncertainty, graph intelligence, and product outputs connect.
Business takeaway: Sponsors can trace a recommendation back to evidence rather than treating the dashboard as a black box.
What it shows: The modeling pipeline for match prediction, ROI prediction, explanation, conformal intervals, and scenario outputs.
Why it matters: Separating match outcome modeling from sponsor ROI modeling keeps the business target clear.
Business takeaway: Match probability becomes one commercial input, not the final product.
The Streamlit app is structured as five decision pages:
Discover -> Explain -> Predict -> Simulate -> Recommend
| Page | Purpose | Interactive/exportable outputs |
|---|---|---|
| Discover | Select team, sponsor, stage, and year context | KPI export |
| Explain | Inspect ROI, sponsor ranking, and attention map | Chart hover and filtered tables |
| Predict | Review FanScore and prediction intervals | Risk CSV / PDF / Markdown |
| Simulate | Compare weather, venue, and stage effects | Scenario charts |
| Recommend | Compare conservative/balanced/aggressive strategies and inspect sponsor network influence | Scenario and Network CSV / PDF / Markdown |
What it shows: The dashboard pages as a decision workflow, not a loose chart wall.
Why it matters: Each page answers one business question and passes the user to the next step.
Business takeaway: Analysts can move from evidence to recommendation without leaving the platform.
Additional GIF previews:
| Preview | GIF |
|---|---|
| Static dashboard overview | ![]() |
| Scenario simulation | ![]() |
| Risk analysis | ![]() |
| Network analysis | ![]() |
The repository includes generated demo media:
assets/videos/worldcuproi_demo.mp4
assets/gifs/platform_hero_overview.gif
assets/gifs/static_platform_dashboard.gif
assets/gifs/scenario_simulation.gif
assets/gifs/risk_uncertainty.gif
assets/gifs/network_graph.gif
The main README GIF is captured from the polished static HTML dashboard, so the GitHub preview works without a server while still reflecting the five-page World Cup styled decision workflow.
git clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demogit clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demogit clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demomake pipeline
make dashboard
make assets
make demodocker build -t worldcuproi .
docker run --rm -p 8501:8501 worldcuproi.github/workflows/ci.yml
.github/workflows/streamlit-cloud.yml
docs/deployment.md
The Streamlit Cloud workflow runs the demo pipeline, smoke-tests the Streamlit app, and optionally calls STREAMLIT_DEPLOY_HOOK_URL when configured.
| Contribution type | What this project contributes |
|---|---|
| Academic | Data card, model card, k-fold/sub-sample/temporal validation, uncertainty quantification, GNN baseline |
| Engineering | Reproducible pipeline, Makefile, Dockerfile, CI/CD, dashboard, generated assets |
| Business | Sponsor ROI ranking, strategy templates, risk-aware recommendations, user research funnel |
| Phase | Product goal |
|---|---|
| v1 Portfolio platform | Stable demo mode, generated reports, dashboard, README showcase |
| v2 Data upgrade | Replace proxy commercial data with licensed sponsor CRM, broadcast, social, and sales data |
| v3 Model upgrade | Calibrated boosted models, uplift modeling, drift monitoring, stronger temporal feature stores |
| v4 Graph AI | Production GraphSAGE/GCN training with licensed conversion labels, temporal graph influence, sponsor portfolio optimization |
| v5 Deployment | Hosted Streamlit Cloud demo, GitHub Pages static site, automated release artifacts |
This repository is a portfolio and research demonstration project. Commercial sponsor variables are documented as proxy/mock where audited campaign data is unavailable.
Repository: github.com/2417467487-hub/WorldCupROI


















