# Georgia Tech MSA Spring 2026 Practicum

#### Team 4 - Alexander Avramov, Noah Boonin, Thomas LaRock
(Track 2) Soccer Analytics Dashboard Exploration

## EDA Executive Summary

### Winning Without an xG Advantage: Positional Structure in xG-Parity Matches

The central research question is:

**In matches where teams generate similar expected goals, do winning teams exhibit systematically
different positional and possession structures than non-winning teams?**

### Key Findings
- Among 1,302 xG-parity matches (|ΔxG| ≤ 0.3), winning teams allocate **2.67% more** 
  touches to advanced attacking positions than non-winning teams
- Non-winning teams retain **1.61% more** touches in defensive/holding positions
- Progressive carry distance is higher for attacking players on winning teams,
  and lower for defenders, mirroring the touch share pattern
- Pass location variance increases in wins for attacking players, suggesting 
  more varied, unpredictable movement
- **Deliverable:** Dashboard to explore positional archetypes in 
  xG-parity matches

## 1. Project Motivation

Expected goals (xG) is widely used to evaluate team performance and explain match outcomes.
There exists a substantial number of matches feature teams with nearly identical xG totals. In these cases, chance quality alone does not fully explain why one team wins while the other does not.

This project focuses on **xG-parity matches**, defined as matches in which the absolute difference
in total team xG does not exceed a small threshold (0.3 xG). By conditioning on this subset of matches, we
control for overall chance quality and investigate whether **positional and possession-based
structures** help explain differences in outcomes beyond xG alone.

Among the many factors which could influence outcomes in these matches, we focus on positional and
possession structure as a stable, event-level signal which reflects tactical intent over the full
duration of a match.


## 2. Data and EDA Scope

Our exploratory analysis uses <a href="https://github.com/statsbomb/open-data" target=blank>StatsBomb</a> open event data, which is a comprehensive collection of football data provided in a structured JSON format. The data is organized into two primary systems connected via the unique match_id field, allowing for comprehensive analysis of match contexts and detailed player activities. 

The StatsBomb data contains 3,464 matches in total of which 1,302 meet the xG-parity threshold of 0.3. These 1,302 matches are distributed across 18 distinct competitions spanning 25 different seasons. Our analysis required the aggregated data for the team–match level. Match-level expected goals are constructed by summing shot xG values, and outcomes are assigned from official match results.

To study structure rather than volume, possession is decomposed into **positional touch shares**.
Touch events are aggregated by standardized positional groups and normalized by total team touches
within each match. This framework allows comparison of how teams allocate possession across
positions, independent of overall possession share.

All exploratory work supporting this analysis is contained in the accompanying technical EDA
notebook.


## 3. Exploratory Findings

Restricting analysis to xG-parity matches reveals consistent structural differences between winning
and non-winning teams, despite comparable chance quality.

Winning teams allocate a greater share of possession to **advanced and connective attacking roles**,
particularly attacking midfielders, wide forwards, and strikers. In contrast, non-winning teams
retain a larger share of touches in **defensive midfield and backline positions**, including center
backs and fullbacks.

![image.png](attachment:image.png)

These differences are modest in magnitude but consistent in direction across the dataset. The
pattern suggests winning teams in tightly contested matches tend to shift possession forward
and engage attacking structures more frequently, while non-winning teams circulate possession more
conservatively in deeper areas.

Pass Location distribution was another point of interest in our EDA. The heatmaps below show pass locations for the highest-volume passer in each position bin, comparing wins (red) to non-wins (blue) across xG-parity matches. Attacking players show the most dramatic shifts: passing locations spread across the attacking third in wins but collapse to tighter, more conservative clusters in non-wins.

![image.png](attachment:image.png)


When looking at progressive carry distance (distance gained toward the opponent's goal), we see a very similar result as possession share seen above. 

The chart below shows the difference in average progressive carry distance (yards gained toward goal) between winning and non-winning teams across xG-parity matches. Strikers carry the ball nearly 0.4 yards further per carry in wins, with Attacking Midfielders and Wide Forwards also showing strong positive differences, while Center Backs and Defensive Midfielders carry less progressively, reinforcing the same forward-shift pattern seen in the touch share data.

![image.png](attachment:image.png)

## 4. Why These Findings Motivate the Final Project

The observed structural differences indicate match outcomes in xG-parity contexts are not
purely random, nor fully explained by chance quality. Instead, **how possession is distributed
across positions** appears to be meaningfully associated with winning outcomes.

These findings motivate a deeper investigation into whether teams can be grouped into distinct
possession and positional archetypes within xG-parity matches. Identifying such “winning shapes”
would provide a structural complement to xG-based evaluation and offer a richer understanding of
tactical performance in closely contested games.


## 5. Next Steps and Planned Deliverable

Building on this exploratory analysis, the next phase of the project will extend positional touch
share analysis to capture richer notions of possession and control within xG-parity matches.
While touch share provides a baseline measure of how possession is distributed across positions,
additional proxies will be developed to better reflect progression and influence.

These extensions may include progression-weighted touch shares, zone-adjusted possession measures,
and composite control scores which combine positional usage with forward movement and ball security.
Such metrics aim to distinguish between sterile circulation and possession structures which actively
support attacking outcomes.

Using these representations, team–match observations will be clustered to identify recurring
possession and positional archetypes. The final deliverable will be an interactive soccer analytics
dashboard which allows users to explore these archetypes, compare winning and non-winning
structures, and examine representative match examples across leagues and seasons.

Additional exploratory analyses supporting these findings including data integrity checks, competition coverage analysis, and full xG-parity filtering methodology are documented in <a href="\EDA.ipynb">EDA.ipynb</a>