---
## ⚽ Interactive xG Dashboard: Explore Team Performance Like Never Before  

In this notebook, we introduce a **Dash web app** designed to provide an interactive experience for exploring xG-based insights.  
Rather than relying on static tables and charts, this app enables **real-time filtering, comparisons, and deeper analysis** of team performances.  

### 🚀 Features of the Dash App:
🔹 **Select and compare teams** across multiple seasons.  
🔹 **Visualize xG metrics** in an interactive way.  
🔹 **Examine key statistics** such as xG for, xG against, goal-to-xG ratios, and more.  
🔹 **Identify trends** in finishing efficiency and over/underperformance.  

### 🔧 How to Use This Notebook:
📌 This notebook contains the **full source code** for the Dash app.  
📌 Follow the instructions to **run the app locally** and interact with the data.  
📌 If you prefer, you can view screenshots of the interface before running it.

---

🔽 **Next, we walk through the code behind the app and how to launch it.**

## 📊 Expanding the Dataset: Adding More Context  

While our previous analysis focused on **xG-related insights**, understanding a team’s performance requires a **broader context**.  
To achieve this, we incorporated an **additional dataset** that provides deeper tactical insights, including:  


### 🔄 Expanding the Data  

To enhance our team-by-team xG analysis, we incorporated an additional dataset containing:  

- ⏰ **Match time** (categorized: early, mid, late kick-offs)  
- 🎯 **Total shots** taken  
- 📊 **Formation** used  
- 🔄 **Possession %**  

This extra data allows for a **deeper tactical breakdown**, helping us explore how these factors influence xG performance.

### 📥 Loading & Merging the Expanded Dataset  

Now, we'll import the new dataset and merge it with our existing xG data to integrate key match-level insights.  
This ensures we can analyze xG trends alongside formation choices, shot volume, possession, and game timing.


In [49]:
from Team_Name_Formatting import standardize_team_name
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import warnings
# Suppress SettingWithCopyWarning
warnings.simplefilter(action='ignore', category=pd.errors.SettingWithCopyWarning)

# Load the CSV files
extra_stats = pd.read_csv(r'C:\Users\d_par\OneDrive\Desktop\Danny\2025\Data Science\Portolio Projects\prem-xg-analysis\Data\matches.csv')
match_stats = pd.read_csv(r'C:\Users\d_par\OneDrive\Desktop\Danny\2025\Data Science\Portolio Projects\prem-xg-analysis\Data\cleaned_prem_data.csv')

# Standardize team names
extra_stats['Team'] = extra_stats['Team'].apply(standardize_team_name)
match_stats['Home'] = match_stats['Home'].apply(standardize_team_name)
match_stats['Away'] = match_stats['Away'].apply(standardize_team_name)

# Fix Man Utd naming
match_stats.loc[match_stats['Home'] == 'manchester utd', 'Home'] = 'Manchester Utd'
match_stats.loc[match_stats['Away'] == 'manchester utd', 'Away'] = 'Manchester Utd'

# Drop 'Time' from match_stats 
match_stats = match_stats.drop(columns=['Time'])

# 🔹 Ensure 'Formation' is stored as a string to prevent Excel formatting issues
extra_stats['Formation'] = extra_stats['Formation'].astype(str)

# Merge for Home Stats (Possession, Formation, Shots)
home_merge = match_stats.merge(
    extra_stats[['Date', 'Team', 'Poss', 'Formation', 'Sh', 'Time']],
    left_on=['Date', 'Home'],
    right_on=['Date', 'Team'],
    how='left'
).rename(columns={
    'Poss': 'Home Possession',
    'Formation': 'Home Formation',
    'Sh': 'Home Shots'
}).drop(columns=['Team'], errors='ignore')

# Merge for Away Stats (Possession, Formation, Shots)
final_merge = home_merge.merge(
    extra_stats[['Date', 'Team', 'Poss', 'Formation', 'Sh']],
    left_on=['Date', 'Away'],
    right_on=['Date', 'Team'],
    how='left'
).rename(columns={
    'Poss': 'Away Possession',
    'Formation': 'Away Formation',
    'Sh': 'Away Shots'
}).drop(columns=['Team'], errors='ignore')

# Drop any additional unexpected duplicate columns
final_merge = final_merge.drop(columns=[col for col in final_merge.columns if col.endswith('_x') or col.endswith('_y') or col.endswith('.1')], errors='ignore')






In [50]:
team_data = final_merge

# Calculate Net xG
team_data['Net xG'] = team_data['Home_xg'] - team_data['Away_xg']

# Convert time into Afternoon (12:00-17:00) and Evening (17:01 onward)
team_data['Match Period'] = np.where(pd.to_datetime(team_data['Time'], format='%H:%M').dt.hour < 17, 'Afternoon', 'Evening')

# Convert Date to Month
team_data['Month'] = pd.to_datetime(team_data['Date']).dt.strftime('%B')


team_data.to_csv('team_data.csv', index=False)


display(team_data.head())

Unnamed: 0,Day,Date,Home,Home_xg,Score,Away_xg,Away,Home Rank,Away Rank,Home_Goals,...,Home Possession,Home Formation,Home Shots,Time,Away Possession,Away Formation,Away Shots,Net xG,Match Period,Month
0,Fri,2023-08-11,Burnley,0.3,0–3,1.9,Manchester City,19.0,1.0,0,...,35.0,5-4-1,6.0,20:00,65.0,4-2-3-1,17.0,-1.6,Evening,August
1,Sun,2023-10-08,Arsenal,0.4,1–0,0.5,Manchester City,2.0,1.0,1,...,49.0,4-3-3,12.0,16:30,51.0,4-3-3,4.0,-0.1,Afternoon,October
2,Wed,2023-12-27,Everton,1.0,1–3,2.4,Manchester City,15.0,1.0,1,...,28.0,4-2-3-1,7.0,20:15,72.0,4-2-3-1,22.0,-1.4,Evening,December
3,Sun,2023-08-27,Sheffield Ud,0.7,1–2,3.5,Manchester City,20.0,1.0,1,...,21.0,3-5-2,6.0,14:00,79.0,4-2-3-1,29.0,-2.8,Afternoon,August
4,Thu,2024-04-25,Brighton,0.6,0–4,1.4,Manchester City,11.0,1.0,0,...,36.0,4-2-3-1,7.0,20:00,64.0,4-1-4-1,14.0,-0.8,Evening,April


# Here we will build the functions 

### 1️⃣ Possession vs xG 
- Separate charts for home and away games
- Quadratic regression line to capture non-linearity

In [218]:
# Function to analyze xG vs Possession (Separate Home & Away Charts)
def analyze_xg_vs_possession(team, matches):
    team_matches = matches[(matches['Home'] == team) | (matches['Away'] == team)].copy()
    
    # Separate Home and Away Matches
    home_matches = team_matches[team_matches['Home'] == team].copy()
    away_matches = team_matches[team_matches['Away'] == team].copy()
    
    home_matches = home_matches.assign(Possession=home_matches['Home Possession'], Opponent=home_matches['Away'], Venue='H', Net_xG=home_matches['Home_xg'] - home_matches['Away_xg'])
    away_matches = away_matches.assign(Possession=away_matches['Away Possession'], Opponent=away_matches['Home'], Venue='A', Net_xG=away_matches['Away_xg'] - away_matches['Home_xg'])
    
    fig_home = go.Figure()
    fig_away = go.Figure()

    # Quadratic regression function
    def quadratic_fit(data, fig, color, label):
        X = data['Possession'].values.reshape(-1, 1)
        y = data['Net_xG'].astype(float).values
        poly = PolynomialFeatures(degree=2)
        X_poly = poly.fit_transform(X)
        model = LinearRegression().fit(X_poly, y)
        x_range = np.linspace(X.min(), X.max(), 100)
        y_pred = model.predict(poly.transform(x_range.reshape(-1, 1)))
        
        fig.add_trace(go.Scatter(
            x=data['Possession'], y=data['Net_xG'], mode='markers',
            marker=dict(color=color), name=label,
            hovertemplate="Opponent: %{text} (%{customdata})", 
            text=data['Opponent'], customdata=data['Venue']
        ))
        fig.add_trace(go.Scatter(x=x_range, y=y_pred, mode='lines', line=dict(color=color), name=f'{label} Trend'))
    
    quadratic_fit(home_matches, fig_home, 'blue', 'Home')
    quadratic_fit(away_matches, fig_away, 'red', 'Away')
    
    fig_home.update_layout(title=f"{team} - Home xG vs Possession", xaxis_title="Possession (%)", yaxis_title="Net xG", template="plotly_white")
    fig_away.update_layout(title=f"{team} - Away xG vs Possession", xaxis_title="Possession (%)", yaxis_title="Net xG", template="plotly_white")
    
    return fig_home, fig_away

### 2️⃣ Formation vs xG (Multicolor Bar Chart)
- Green bars: Average xG created
- Red bars: Average xG conceded
- Bars overlap to compare attacking vs defensive impact

In [221]:
# Function to analyze xG vs Formation 
def analyze_xg_vs_formation(team, matches):
    team_matches = matches[(matches['Home'] == team) | (matches['Away'] == team)].copy()

    team_matches = team_matches.assign(
        Formation=np.where(team_matches['Home'] == team, team_matches['Home Formation'], team_matches['Away Formation']),
        xG_Created=np.where(team_matches['Home'] == team, team_matches['Home_xg'], team_matches['Away_xg']),
        xG_Conceded=np.where(team_matches['Home'] == team, team_matches['Away_xg'], team_matches['Home_xg'])
    )

    formation_counts = team_matches['Formation'].value_counts()
    formation_xg_created = team_matches.groupby("Formation")["xG_Created"].mean()
    formation_xg_conceded = team_matches.groupby("Formation")["xG_Conceded"].mean()

    formatted_labels = [f"{f} ({formation_counts[f]})" for f in formation_xg_created.index]
    
    fig = go.Figure()
    fig.add_trace(go.Bar(x=formatted_labels, y=formation_xg_created, name="xG Created", marker_color="green"))
    fig.add_trace(go.Bar(x=formatted_labels, y=formation_xg_conceded, name="xG Conceded", marker_color="red"))
    
    fig.update_layout(
        barmode='group', 
        title=f"{team} - xG by Formation",
        xaxis_title="Formation (Games Played)",
        yaxis_title="Average xG",
        template="plotly_white"
    )
    
    return fig 

### 3️⃣ Shots vs xG (Scatterplot)
- Y-axis changed from Net xG to xG
- Scatterplot with regression line

In [224]:
# Function to analyze xG vs Shots 
def analyze_xg_vs_shots(team, matches):
    team_matches = matches[(matches['Home'] == team) | (matches['Away'] == team)].copy()

    team_matches = team_matches.assign(
        Shots=np.where(team_matches['Home'] == team, team_matches['Home Shots'], team_matches['Away Shots']),
        xG=np.where(team_matches['Home'] == team, team_matches['Home_xg'], team_matches['Away_xg']),
        Opponent=np.where(team_matches['Home'] == team, team_matches['Away'], team_matches['Home']),
        Venue=np.where(team_matches['Home'] == team, 'H', 'A'),
        Match_Date=pd.to_datetime(team_matches['Date']).dt.strftime('%m/%d')
    )

    fig = px.scatter(
        team_matches, x="Shots", y="xG", 
        hover_data={"Shots": False, "xG": False, "Opponent": True, "Venue": True, "Match_Date": True},
        title=f"{team} - xG vs Shots Taken",
        labels={"Shots": "Shots Taken", "xG": "xG"}
    )

    fig.update_traces(marker=dict(size=8, color="blue"))
    return fig 



### 4️⃣ xG by Month (Line Chart)
- Blue Line: Average xG created each month
- Red Line: Average xG conceded each month
- Points connected for better visualization

In [228]:
# Function to analyze xG vs Month (Line Chart for xG Created & Conceded)
def analyze_xg_vs_month(team, matches):
    team_matches = matches[(matches['Home'] == team) | (matches['Away'] == team)].copy()

    team_matches = team_matches.assign(
        xG_Created=np.where(team_matches['Home'] == team, team_matches['Home_xg'], team_matches['Away_xg']),
        xG_Conceded=np.where(team_matches['Home'] == team, team_matches['Away_xg'], team_matches['Home_xg'])
    )

    monthly_xg_created = team_matches.groupby("Month")["xG_Created"].mean()
    monthly_xg_conceded = team_matches.groupby("Month")["xG_Conceded"].mean()

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=monthly_xg_created.index, y=monthly_xg_created, mode='lines+markers', name="xG Created", line=dict(color="blue")))
    fig.add_trace(go.Scatter(x=monthly_xg_conceded.index, y=monthly_xg_conceded, mode='lines+markers', name="xG Conceded", line=dict(color="red")))

    fig.update_layout(title=f"{team} - xG Trend by Month", xaxis_title="Month", yaxis_title="Average xG", template="plotly_white")
    return fig


### 5️⃣ xG by Time of Match (Bar Chart for Net xG)
- Two bars for Afternoon & Evening
- Shows Net xG averages

In [231]:
# Function to analyze xG vs Time of Match (Bar Chart for Net xG with Game Count and y=0 line)
def analyze_xg_vs_time(team, matches):
    team_matches = matches[(matches['Home'] == team) | (matches['Away'] == team)].copy()

    team_matches = team_matches.assign(
        Net_xG=np.where(team_matches['Home'] == team, team_matches['Home_xg'] - team_matches['Away_xg'], team_matches['Away_xg'] - team_matches['Home_xg'])
    )

    xg_by_time = team_matches.groupby("Match Period")["Net_xG"].mean()
    match_counts = team_matches["Match Period"].value_counts()

    labels = [f"{period} ({match_counts[period]})" for period in xg_by_time.index]

    fig = go.Figure()
    fig.add_trace(go.Bar(x=labels, y=xg_by_time.values, marker_color=["blue", "red"], name="Net xG"))

    fig.add_hline(y=0, line=dict(color='black', width=1))  # Add y=0 reference line

    fig.update_layout(title=f"{team} - Net xG by Match Time", xaxis_title="Match Period", yaxis_title="Average Net xG", template="plotly_white")
    return fig

In [239]:
teams = sorted(team_data['Home'].unique())

for team in teams:
    print(f"Analyzing {team}...\n")
    
    analyze_xg_vs_possession(team, team_data)
    analyze_xg_vs_formation(team, team_data)
    analyze_xg_vs_shots(team, team_data)
    analyze_xg_vs_month(team, team_data)
    analyze_xg_vs_time(team, team_data)
    
    print("\n")


Analyzing Arsenal...



Analyzing Aston Villa...



Analyzing Bournemouth...



Analyzing Brentford...



Analyzing Brighton...



Analyzing Burnley...



Analyzing Chelsea...



Analyzing Crystal Palace...



Analyzing Everton...



Analyzing Fulham...



Analyzing Liverpool...



Analyzing Luton Town...



Analyzing Manchester City...



Analyzing Manchester Utd...



Analyzing Newcastle Utd...



Analyzing Nott'ham Forest...



Analyzing Sheffield Ud...



Analyzing Tottenham...



Analyzing West Ham...



Analyzing Wolves...



