WorldCupROI

AI Sports Sponsorship Intelligence Platform

WorldCupROI blends match performance, media attention, fan behavior, sponsor investment, scenario simulation, and uncertainty risk into one sponsor ROI decision platform. The goal is not only to predict football results, but to help answer: which sponsorship strategy should a brand choose, under what risk, and why?

Link	Target
Live Demo	`make dashboard`
Static Dashboard	dashboard/panel_dashboard.html
Executive Summary	reports/executive_summary.pdf
Business Report	reports/business_insights.md
Data Card	reports/data_card.md
Model Card	reports/model_card.md
Deployment Guide	docs/deployment.md

Platform Hero Overview

Core result snapshot

Area	Current value
Platform health score	100 / 100
Match accuracy	0.5566
Match log loss	0.9780
Sponsor ROI MAE	0.1177
Sponsor ROI R2	0.8838
Match conformal coverage	0.9021
ROI interval coverage	0.8814
Average Monte Carlo std	0.1320

10-Second Overview

WorldCupROI is a reproducible sports sponsorship analytics project with four layers:

Layer	What it does	Business value
Data intelligence	Separates real historical data, real text data, and proxy/mock commercial data	Makes data boundaries visible before decisions
ML modeling	Trains match outcome and sponsor ROI models with validation outputs	Converts sports and attention signals into measurable ROI forecasts
Risk and explainability	Adds SHAP-style drivers, conformal intervals, Monte Carlo risk, and scenario lift	Turns point estimates into defensible decisions
Product dashboard	Discover -> Explain -> Predict -> Simulate -> Recommend	Makes the work usable by analysts and business reviewers

Results Showcase

Model Performance Comparison

What it shows: Compares trained baseline and benchmark models on primary evaluation metrics.

Why it matters: It shows whether the current model choice is a stable baseline or only a placeholder.

Business takeaway: Use the benchmark spread to decide which model family deserves production tuning first.

Task	Model	Metric	Value
Match outcome	Centroid classifier	Accuracy	0.5566
Match outcome	Centroid classifier	Log loss	0.9780
Sponsor ROI	Ridge regression	MAE	0.1177
Sponsor ROI	Ridge regression	R2	0.8838

ROI Feature Importance / SHAP

What it shows: Ranks the strongest sponsor ROI drivers using SHAP-style feature contribution scores.

Why it matters: Explainability keeps ROI recommendations auditable and helps detect proxy-label overdependence.

Business takeaway: Improve brand heat, sponsor-team fit, media exposure, and activation quality before scaling spend.

Sponsor ROI Ranking

What it shows: Ranks sponsors by predicted commercial ROI and network influence evidence.

Why it matters: A sponsor can look attractive because expected ROI is high or because relationship influence is broad.

Business takeaway: Prioritize sponsors that combine high ROI with strong team-player-network leverage.

Scenario ROI Lift

What it shows: Shows conservative, balanced, and aggressive strategy lift against the baseline.

Why it matters: Scenario analysis turns the model from prediction into a decision simulator.

Business takeaway: Select aggressive strategies only when lift is positive and risk remains tolerable.

Prediction Interval / Conformal Prediction

What it shows: Displays ROI point estimates with conformal-style prediction intervals.

Why it matters: Prediction intervals show forecast reliability, not just expected value.

Business takeaway: Prefer narrow-interval opportunities when sponsor budgets are constrained.

Monte Carlo Risk Distribution

What it shows: Shows the distribution of Monte Carlo ROI standard deviation and risk scores.

Why it matters: The spread of risk is often more important than average ROI for sponsorship planning.

Business takeaway: Use high-risk tails as triggers for staged spend, insurance clauses, or additional analyst review.

Sponsor-Team-Player Network

What it shows: Visualizes sponsor, team, and player relationships as a weighted commercial graph.

Why it matters: Graph position captures activation leverage that flat tables miss.

Business takeaway: Use central sponsors and teams as anchor partnerships for campaign portfolios.

Future Event ROI Trend Forecast

What it shows: Shows future sponsor ROI forecasts across the 2026, 2030, and 2034 World Cup cycles.

Why it matters: It makes time dependence visible instead of treating every tournament as the same planning context.

Business takeaway: Use the trend as a budget planning prior, then review uncertainty before committing long-cycle spend.

Sentiment Event Impact on ROI

What it shows: Compares ROI deltas for positive sentiment spikes, stage attention shocks, and baseline attention events.

Why it matters: Sentiment can change conversion quality even when media exposure is high.

Business takeaway: Prepare contingency messaging and spend limits around high-attention negative events.

Budget and Media Sensitivity

What it shows: Maps risk-adjusted ROI under different sponsor budgets and media multiplier combinations.

Why it matters: Resource optimization converts model output into a concrete allocation recommendation.

Business takeaway: Scale spend where the sensitivity surface is high and stable, not only where raw ROI is high.

Graph Attention ROI Contribution

What it shows: Ranks sponsor nodes by graph attention-style contribution to ROI.

Why it matters: It explains relationship leverage beyond flat sponsor ranking or tabular SHAP alone.

Business takeaway: Use high-contribution sponsors as anchor nodes in portfolio planning.

Extreme Scenario ROI and Risk Intervals

What it shows: Stress-tests key player injury, sentiment crisis, sponsor policy change, and positive viral upside scenarios.

Why it matters: Extreme cases reveal downside intervals that average ROI hides.

Business takeaway: Pre-approve response playbooks before the tournament starts.

Integrated Commercial Decision Score

What it shows: Combines ROI, media exposure value, fan conversion, social spread, and brand influence.

Why it matters: Sponsor decisions are multi-objective; ROI alone is too narrow for portfolio planning.

Business takeaway: Prioritize high composite score opportunities, then review interval width and graph influence.

Sponsor and Player Influence Network

What it shows: Visualizes sponsor and player influence pathways from the heterogeneous commercial graph.

Why it matters: Player and sponsor influence can amplify or weaken projected ROI under the same match context.

Business takeaway: Pair high-influence sponsors with resilient player/team nodes before selecting activation themes.

Problem

Most football analytics projects stop at predicting who wins. Sponsorship decisions need more: media exposure, fan attention, brand fit, player availability, commercial momentum, downside risk, and an explanation a non-technical stakeholder can trust.

WorldCupROI frames the World Cup as an attention market where sponsor ROI depends on both match context and business activation.

Why It Matters

Tournament sponsorship budgets are committed before all outcomes are known. A high-profile campaign can underperform if the model ignores uncertainty, audience behavior, or sponsor-team fit.

Audience	Value
Sports business analysts	Compare ROI, risk, sponsor fit, and scenario lift
ML reviewers	Inspect model cards, validation, feature importance, and leakage risks
Researchers	Study links between match performance, text signals, user attention, and ROI
Product reviewers	Open a dashboard and reproduce the analysis end to end

Key Innovations

Innovation	Implementation
Data boundary documentation	`reports/data_card.md`, `reports/data_quality_report.md`
Generalization checks	K-fold, sub-sample, and temporal sliding validation in `reports/cross_validation_summary.csv`
User research chain	Media exposure -> user attention -> social interaction -> sponsor conversion
Explainable ROI modeling	SHAP-style feature contributions and grouped driver reports
Risk-aware decisions	Conformal intervals, bootstrap intervals, Monte Carlo risk, scenario ranking
Dynamic ROI and sentiment impact	Future cycle ROI forecast plus key event sentiment ROI deltas
Resource allocation	Budget/media mix optimization and sensitivity analysis
Extreme scenario planning	Key-player injury, sentiment crisis, policy change, and viral upside stress tests
Graph intelligence	NetworkX centrality plus reproducible GCN/GraphSAGE-style and graph-attention contribution scores
Commercial decision score	Media value, fan conversion, social spread, brand influence, and ROI combined
Productized workflow	Dashboard pages: Discover -> Explain -> Predict -> Simulate -> Recommend

Research Questions

How much do match strength, player availability, and tournament stage change sponsor ROI?
Do media narratives and fan behavior improve ROI analysis beyond match results?
Which sponsor features create the strongest ROI lift under uncertainty?
How stable are ROI predictions under cross-validation and subsample checks?
Can graph centrality reveal sponsor-team-player influence patterns?
How can a dashboard convert model output into a business recommendation?

Dataset & Data Sources

Data category	Examples	Trust level	Boundary
Real historical data	International match records, World Cup history	Medium-high	Public historical sports facts
Real text data	GDELT/Wikimedia style article metadata and text windows	Medium	Real-source text, lightweight NLP features
Proxy/mock commercial data	Sponsor spend, ad exposure, activation quality, conversion proxy	Medium-low	Reproducible demo data, not audited revenue
Derived model outputs	Predicted ROI, risk score, scenario lift	Model-dependent	Decision support only

Detailed documentation:

reports/data_card.md
reports/data_quality_report.md
docs/data_card.md

Deep analysis landing artifacts:

reports/deep_analysis_landing_report.md
reports/deep_analysis_landing_report.pdf
reports/future_roi_forecast.csv
reports/sentiment_event_roi_impact.csv
reports/resource_optimization_recommendations.csv
reports/extreme_scenario_roi_risk.csv
data/commercial_decision_metrics.csv
assets/figures/deep_analysis_figure_notes.md

Model Performance

Validation is generated by src/model_validation.py and saved to reports/cross_validation_summary.csv. It now includes random k-fold validation, sub-sample sensitivity checks, and tournament-era temporal sliding validation.

Validation	Task	Model	Metric	Folds	Mean	Std	Min	Max
kfold	match_outcome	CentroidOutcomeModel	accuracy	5	0.5436	0.0389	0.5026	0.6010
kfold	sponsor_roi	RidgeROIModel	r2	5	0.8836	0.0126	0.8680	0.9026
subsample_70pct	match_outcome	CentroidOutcomeModel	accuracy	1	0.5552	0.0000	0.5552	0.5552
subsample_70pct	sponsor_roi	RidgeROIModel	r2	1	0.8813	0.0000	0.8813	0.8813
temporal_train_to_2014_test_2018	match_outcome	CentroidOutcomeModel	accuracy	1	0.6094	0.0000	0.6094	0.6094
temporal_train_to_2018_test_2022	sponsor_roi	RidgeROIModel	r2	1	0.8885	0.0000	0.8885	0.8885

Model governance:

reports/model_card.md
reports/match_outcome_model_card.md
reports/sponsor_roi_model_card.md

Explainability & SHAP

Explainability artifacts:

reports/roi_feature_importance.csv
reports/roi_driver_explanations.csv
reports/explainability_report.md
assets/figures/roi_feature_importance_shap.png

The ROI explanation layer is designed for business review: it connects model output to sponsor spend, brand heat, media exposure, FanScore, stage premium, player influence, and sponsor-team fit.

Uncertainty & Conformal Prediction

Reliability layer	Output	Current value
Match conformal prediction	Coverage rate	0.9021
Match conformal prediction	Average set size	2.3814
ROI conformal prediction	Coverage rate	0.8557
ROI conformal prediction	Average interval width	0.4745
Monte Carlo risk	Average std	0.1320
Monte Carlo risk	Medium-risk cases	119

Risk artifacts:

data/roi_uncertainty.csv
reports/conformal_prediction_report.md
reports/uncertainty_summary.md
assets/figures/monte_carlo_risk_distribution.png
assets/figures/prediction_interval_conformal.png

Scenario Simulation

WorldCupROI supports conservative, balanced, and aggressive sponsor strategies. Each scenario includes ROI, lift, risk score, confidence interval, recommendation reason, and rank.

Strategy	Intended use
Conservative	Reduce downside when uncertainty is high
Balanced	Default planning mode for stable sponsor activation
Aggressive	Capture high-attention stages when upside justifies risk

Generated artifacts:

data/scenario_recommendations.csv
reports/scenario_ranking.md
reports/scenario_strategy_summary.csv
assets/figures/scenario_roi_lift.png

Graph Intelligence

The graph layer upgrades a flat sponsor table into a heterogeneous team-player-sponsor-match network.

Graph output	File
Node centrality	`reports/graph_node_centrality.csv`
Sponsor influence	`reports/sponsor_influence_scores.csv`
Player influence	`reports/player_commercial_influence.csv`
GCN / GraphSAGE baseline	`reports/gnn_baseline_node_scores.csv`
GNN + SHAP bridge	`reports/gnn_explainability_bridge.md`
Graph report	`reports/graph_analysis_report.md`
Network figure	`assets/figures/sponsor_team_player_network.png`

The current graph baseline uses centrality features, weighted two-hop propagation, and a GraphSAGE-style neighbor aggregation score. It is not a production neural GNN yet, but it gives reviewers a reproducible bridge from relationship structure to sponsor/player influence and SHAP-style ROI drivers.

Architecture

What it shows: The full platform architecture from data sources to features, models, risk controls, reports, and dashboard delivery.

Why it matters: Reviewers can understand how data, modeling, uncertainty, graph intelligence, and product outputs connect.

Business takeaway: Sponsors can trace a recommendation back to evidence rather than treating the dashboard as a black box.

What it shows: The modeling pipeline for match prediction, ROI prediction, explanation, conformal intervals, and scenario outputs.

Why it matters: Separating match outcome modeling from sponsor ROI modeling keeps the business target clear.

Business takeaway: Match probability becomes one commercial input, not the final product.

Dashboard Gallery

The Streamlit app is structured as five decision pages:

Discover -> Explain -> Predict -> Simulate -> Recommend

Page	Purpose	Interactive/exportable outputs
Discover	Select team, sponsor, stage, and year context	KPI export
Explain	Inspect ROI, sponsor ranking, and attention map	Chart hover and filtered tables
Predict	Review FanScore and prediction intervals	Risk CSV / PDF / Markdown
Simulate	Compare weather, venue, and stage effects	Scenario charts
Recommend	Compare conservative/balanced/aggressive strategies and inspect sponsor network influence	Scenario and Network CSV / PDF / Markdown

What it shows: The dashboard pages as a decision workflow, not a loose chart wall.

Why it matters: Each page answers one business question and passes the user to the next step.

Business takeaway: Analysts can move from evidence to recommendation without leaving the platform.

Additional GIF previews:

Preview	GIF
Static dashboard overview
Scenario simulation
Risk analysis
Network analysis

Demo Video

The repository includes generated demo media:

assets/videos/worldcuproi_demo.mp4
assets/gifs/platform_hero_overview.gif
assets/gifs/static_platform_dashboard.gif
assets/gifs/scenario_simulation.gif
assets/gifs/risk_uncertainty.gif
assets/gifs/network_graph.gif

The main README GIF is captured from the polished static HTML dashboard, so the GitHub preview works without a server while still reflecting the five-page World Cup styled decision workflow.

Installation & Reproducibility

Windows PowerShell

git clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demo

macOS

git clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demo

Linux

git clone https://github.com/2417467487-hub/WorldCupROI.git
cd WorldCupROI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python src/pipeline.py --demo
streamlit run dashboard/app.py -- --demo

Make Shortcuts

make pipeline
make dashboard
make assets
make demo

Docker

docker build -t worldcuproi .
docker run --rm -p 8501:8501 worldcuproi

CI/CD

.github/workflows/ci.yml
.github/workflows/streamlit-cloud.yml
docs/deployment.md

The Streamlit Cloud workflow runs the demo pipeline, smoke-tests the Streamlit app, and optionally calls STREAMLIT_DEPLOY_HOOK_URL when configured.

Contributions

Contribution type	What this project contributes
Academic	Data card, model card, k-fold/sub-sample/temporal validation, uncertainty quantification, GNN baseline
Engineering	Reproducible pipeline, Makefile, Dockerfile, CI/CD, dashboard, generated assets
Business	Sponsor ROI ranking, strategy templates, risk-aware recommendations, user research funnel

Roadmap

Phase	Product goal
v1 Portfolio platform	Stable demo mode, generated reports, dashboard, README showcase
v2 Data upgrade	Replace proxy commercial data with licensed sponsor CRM, broadcast, social, and sales data
v3 Model upgrade	Calibrated boosted models, uplift modeling, drift monitoring, stronger temporal feature stores
v4 Graph AI	Production GraphSAGE/GCN training with licensed conversion labels, temporal graph influence, sponsor portfolio optimization
v5 Deployment	Hosted Streamlit Cloud demo, GitHub Pages static site, automated release artifacts

License / Contact

This repository is a portfolio and research demonstration project. Commercial sponsor variables are documented as proxy/mock where audited campaign data is unavailable.

Repository: github.com/2417467487-hub/WorldCupROI

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
assets		assets
config		config
dashboard		dashboard
data		data
docs		docs
java		java
notebooks		notebooks
reports		reports
scripts		scripts
sql		sql
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
index.html		index.html
preview_visuals.html		preview_visuals.html
requirements.txt		requirements.txt
sample_report.pdf		sample_report.pdf

Folders and files

Latest commit

History

Repository files navigation

WorldCupROI

Platform Hero Overview

10-Second Overview

Results Showcase

Model Performance Comparison

ROI Feature Importance / SHAP

Sponsor ROI Ranking

Scenario ROI Lift

Prediction Interval / Conformal Prediction

Monte Carlo Risk Distribution

Sponsor-Team-Player Network

Future Event ROI Trend Forecast

Sentiment Event Impact on ROI

Budget and Media Sensitivity

Graph Attention ROI Contribution

Extreme Scenario ROI and Risk Intervals

Integrated Commercial Decision Score

Sponsor and Player Influence Network

Problem

Why It Matters

Key Innovations

Research Questions

Dataset & Data Sources

Model Performance

Explainability & SHAP

Uncertainty & Conformal Prediction

Scenario Simulation

Graph Intelligence

Architecture

Dashboard Gallery

Demo Video

Installation & Reproducibility

Windows PowerShell

macOS

Linux

Make Shortcuts

Docker

CI/CD

Contributions

Roadmap

License / Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages