# Executive Exploratory Data Analysis

## Georgia Tech MSA Spring 2026 Practicum

# EDA Executive Summary

## Winning Without an xG Advantage: Positional Structure in xG-Parity Matches

## 1. Project Motivation and Research Question

Expected goals (xG) is widely used to evaluate team performance and explain match outcomes.
However, a substantial number of matches feature teams with nearly identical xG totals yet
different results. In these cases, chance quality alone does not fully explain why one team wins
while the other does not.

This project focuses on **xG-parity matches**, defined as matches in which the absolute difference
in total team xG does not exceed a small threshold. By conditioning on this subset of matches, we
control for overall chance quality and investigate whether **positional and possession-based
structures** help explain differences in outcomes beyond xG alone.

Among the many factors that could influence outcomes in these matches, we focus on positional and
possession structure as a stable, event-level signal that reflects tactical intent over the full
duration of a match.

The central research question is:

**In matches where teams generate similar expected goals, do winning teams exhibit systematically
different positional and possession structures than non-winning teams?**


## 2. Data and EDA Scope

Our exploratory analysis uses StatsBomb open event data, aggregated to the team–match level.
Match-level expected goals are constructed by summing shot xG values, and outcomes are assigned
from official match results.

To study structure rather than volume, possession is decomposed into **positional touch shares**.
Touch events are aggregated by standardized positional groups and normalized by total team touches
within each match. This framework allows comparison of how teams allocate possession across
positions, independent of overall possession share.

All exploratory work supporting this analysis is contained in the accompanying technical EDA
notebook.


## 3. Key Exploratory Findings

Restricting analysis to xG-parity matches reveals consistent structural differences between winning
and non-winning teams, despite comparable chance quality.

Winning teams allocate a greater share of possession to **advanced and connective attacking roles**,
particularly attacking midfielders, wide forwards, and strikers. In contrast, non-winning teams
retain a larger share of touches in **defensive midfield and backline positions**, including center
backs and fullbacks.

These differences are modest in magnitude but consistent in direction across the dataset. The
pattern suggests that winning teams in tightly contested matches tend to shift possession forward
and engage attacking structures more frequently, while non-winning teams circulate possession more
conservatively in deeper areas.


![image.png](attachment:image.png)

## 4. Why These Findings Motivate the Final Project

The observed structural differences indicate that match outcomes in xG-parity contexts are not
purely random, nor fully explained by chance quality. Instead, **how possession is distributed
across positions** appears to be meaningfully associated with winning outcomes.

These findings motivate a deeper investigation into whether teams can be grouped into distinct
possession and positional archetypes within xG-parity matches. Identifying such “winning shapes”
would provide a structural complement to xG-based evaluation and offer a richer understanding of
tactical performance in closely contested games.


## 5. Next Steps and Planned Deliverable

Building on this exploratory analysis, the next phase of the project will extend positional touch
share analysis to capture richer notions of possession and control within xG-parity matches.
While touch share provides a baseline measure of how possession is distributed across positions,
additional proxies will be developed to better reflect progression and influence.

These extensions may include progression-weighted touch shares, zone-adjusted possession measures,
and composite control scores that combine positional usage with forward movement and ball security.
Such metrics aim to distinguish between sterile circulation and possession structures that actively
support attacking outcomes.

Using these representations, team–match observations will be clustered to identify recurring
possession and positional archetypes. The final deliverable will be an interactive soccer analytics
dashboard (Track 2) that allows users to explore these archetypes, compare winning and non-winning
structures, and examine representative match examples across leagues and seasons.

Additional exploratory analyses supporting these findings are summarized in the technical EDA notebook.