In [None]:
#| label: load-packages
#| include: false

# Load packages here
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
 # Set up plot theme and figure resolution
sns.set_theme(style="whitegrid")
sns.set_context("notebook", font_scale=1.1)

import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 300
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['figure.figsize'] = (6, 6 * 0.618)

In [None]:
#| label: load-data
#| include: false
# Load data in Python
import_data = pd.read_csv("data/soccer21-22.csv")

# Question 1


##
### What is the connection between in-game metrics such as shots on goal, fouls committed, and cards received, and the outcomes of soccer matches?

#### Approach


We choose relevant attributes such as shots, shots on goal, fouls committed, and cards both the home and away teams have earned. To guarantee the robustness of the model, the dataset is divided into training and testing sets. The logistic regression model is then trained, and its effectiveness is assessed using an accuracy score and a confusion matrix.

## Exploring the complex connections between in-game metrics

::: panel-tabset
### Critical variables

-   **FTHG** and **FTAG** are crucial indicators
-   **FTR** serves as the target variable
-   **HS**, **AS**, **HST**,and **AST** reflect attacking performance.
-   **HF**, **AF**, **HY**, **AY**, **HR**,and **AR** all indicate team discipline and aggresion

### Data preparation

-   Combining match information with team-level season summaries
-   The consistency of the dataset ensures reliability of our analysis


In [None]:
#| echo: true
# Creating a binary target variable 'Result' where 1 represents a home win ('H') and 0 otherwise
df['Result'] = (df['FTR'] == 'H').astype(int)

# Feature selection
features = ['HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HY', 'AY', 'HR', 'AR']
X = df[features]
y = df['Result']

### Analysis

-   Splitting data into appropriate sets then applying regression techniques
-   Regression techniques
-   Discovering patterns and relationships critical for building predictive models
:::

## Feature Importance Visualization

![](images/image1.png){.nostretch fig-align="center" width="725px"}

## Confusion Matrix

![](images/Image2.png){.nostretch fig-align="center" width="900px"}

## Feature Correlation Heatmap

![](images/image3.png){.nostretch fig-align="center" width="900px"}
# Question 2

## What if the matches ended at halftime?

::: panel-tabset
### Critical variables

-   **FTHG** and **FTAG** are needed to determine team placements
-   **FTR** denotes fthe final actual outcome
-   **HTHG**, **HTAG**,and **HTR** are important as well
-   **HomeTeam** and **AwayTeam** combined with the rest help determine the winner

### Data preparation

-   Performing thorough checks and creating visualizations.
-   Making sure data has no missing values

### Analysis

-   Creating a function to determine results of matches at halftime and using this re-calibration as a new baseline of analysis.
-   Chronologically organized data
:::

## Code


In [None]:
#| echo: true
#| code-fold: false
#| code-summary: "Data Prep & Pre-processing"
# Import libraries
import pandas as pd

# Load the dataset
df = pd.read_csv('data/soccer21-22.csv')

# Function to determine the winner based on points
def calculate_points(row):
    if row['FTR'] == 'H':
        return 3
    elif row['FTR'] == 'D':
        return 1
    else:
        return 0

# Apply the function to calculate points for each match
df['HomePoints'] = df.apply(lambda row: calculate_points(row), axis = 1)
df['AwayPoints'] = df.apply(lambda row: 3 - calculate_points(row) if row['FTR'] != 'D' else 1, axis = 1)

# Aggregate points for each team
home_points = df.groupby('HomeTeam')['HomePoints'].sum().reset_index()
away_points = df.groupby('AwayTeam')['AwayPoints'].sum().reset_index()

# Combine home and away points
team_points = pd.merge(home_points, away_points, how = 'outer', left_on = 'HomeTeam', right_on = 'AwayTeam')
team_points['TotalPoints'] = team_points['HomePoints'] + team_points['AwayPoints']

# Sort team_points DataFrame based on TotalPoints
team_points = team_points.sort_values(by = 'TotalPoints', ascending = False)

# Create ranking DataFrame
ft_ranking = pd.DataFrame({
    'Team': team_points['HomeTeam'],  # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging
    'Points': team_points['TotalPoints'],
    'Ranking': range(1, len(team_points) + 1)
})

<!-- ```{python}
#| echo: true
#| code-fold: true
#| code-summary: "Official Rankings"
# Aggregate goals scored and conceded for each team
home_goals_scored = df.groupby('HomeTeam')['FTHG'].sum().reset_index()
home_goals_conceded = df.groupby('HomeTeam')['FTAG'].sum().reset_index()
away_goals_scored = df.groupby('AwayTeam')['FTAG'].sum().reset_index()
away_goals_conceded = df.groupby('AwayTeam')['FTHG'].sum().reset_index()

# Merge home and away goals scored and conceded with team_points
team_points = pd.merge(team_points, home_goals_scored, how = 'left', left_on = 'HomeTeam', right_on = 'HomeTeam')
team_points = pd.merge(team_points, home_goals_conceded, how = 'left', left_on = 'HomeTeam', right_on = 'HomeTeam')
team_points = pd.merge(team_points, away_goals_scored, how = 'left', left_on = 'HomeTeam', right_on = 'AwayTeam')
team_points = pd.merge(team_points, away_goals_conceded, how = 'left', left_on = 'HomeTeam', right_on = 'AwayTeam')

# Rename columns during merge
team_points.rename(columns={'FTHG_x': 'HomeGoalsScored', 'FTAG_x': 'HomeGoalsConceded',
                            'FTAG_y': 'AwayGoalsScored', 'FTHG_y': 'AwayGoalsConceded'}, inplace = True)

# Fill NaN values with 0
team_points.fillna(0, inplace = True)

# Calculate total goals scored and conceded
team_points['TotalGoalsScored'] = team_points['HomeGoalsScored'] + team_points['AwayGoalsScored']
team_points['TotalGoalsConceded'] = team_points['HomeGoalsConceded'] + team_points['AwayGoalsConceded']

# Calculate goal difference
team_points['GoalDifference'] = team_points['TotalGoalsScored'] - team_points['TotalGoalsConceded']

# Sort team_points DataFrame based on TotalPoints and GoalDifference
team_points = team_points.sort_values(by = ['TotalPoints', 'GoalDifference'], ascending = [False, False])

# Create ranking DataFrame
ft_ranking = pd.DataFrame({
    'Team': team_points['HomeTeam'],  # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging
    'Points': team_points['TotalPoints'],
    'GoalDifference': team_points['GoalDifference'],
    'Ranking': range(1, len(team_points) + 1)
})
``` -->

<!-- ```{python}
#| echo: true
#| code-fold: true
#| code-summary: "Half Time Results"
# Import libraries
import pandas as pd

# Load the dataset
df = pd.read_csv('data/soccer21-22.csv')

# Function to determine the winner based on points
def calculate_points(row):
    if row['HTR'] == 'H':
        return 3
    elif row['HTR'] == 'D':
        return 1
    else:
        return 0

# Apply the function to calculate points for each match
df['HomePoints'] = df.apply(lambda row: calculate_points(row), axis = 1)
df['AwayPoints'] = df.apply(lambda row: 3 - calculate_points(row) if row['HTR'] != 'D' else 1, axis = 1)

# Aggregate points for each team
home_points = df.groupby('HomeTeam')['HomePoints'].sum().reset_index()
away_points = df.groupby('AwayTeam')['AwayPoints'].sum().reset_index()

# Combine home and away points
team_points = pd.merge(home_points, away_points, how = 'outer', left_on = 'HomeTeam', right_on = 'AwayTeam')
team_points['TotalPoints'] = team_points['HomePoints'] + team_points['AwayPoints']

# Sort team_points DataFrame based on TotalPoints
team_points = team_points.sort_values(by = 'TotalPoints', ascending = False)

# Create ranking DataFrame
ht_ranking = pd.DataFrame({
    'Team': team_points['HomeTeam'],  # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging
    'Points': team_points['TotalPoints'],
    'Ranking': range(1, len(team_points) + 1)
})
```


<!-- ```{python}
#| echo: true
#| code-fold: true
#| code-summary: "Repeated Half Time Results"
# Aggregate goals scored and conceded for each team
home_goals_scored = df.groupby('HomeTeam')['HTHG'].sum().reset_index()
home_goals_conceded = df.groupby('HomeTeam')['HTAG'].sum().reset_index()
away_goals_scored = df.groupby('AwayTeam')['HTAG'].sum().reset_index()
away_goals_conceded = df.groupby('AwayTeam')['HTHG'].sum().reset_index()

# Merge home and away goals scored and conceded with team_points
team_points = pd.merge(team_points, home_goals_scored, how = 'left', left_on = 'HomeTeam', right_on = 'HomeTeam')
team_points = pd.merge(team_points, home_goals_conceded, how = 'left', left_on = 'HomeTeam', right_on = 'HomeTeam')
team_points = pd.merge(team_points, away_goals_scored, how = 'left', left_on = 'HomeTeam', right_on = 'AwayTeam')
team_points = pd.merge(team_points, away_goals_conceded, how = 'left', left_on = 'HomeTeam', right_on = 'AwayTeam')

# Rename columns during merge
team_points.rename(columns={'HTHG_x': 'HomeGoalsScored', 'HTAG_x': 'HomeGoalsConceded',
                            'HTAG_y': 'AwayGoalsScored', 'HTHG_y': 'AwayGoalsConceded'}, inplace = True)

# Fill NaN values with 0
team_points.fillna(0, inplace = True)

# Calculate total goals scored and conceded
team_points['TotalGoalsScored'] = team_points['HomeGoalsScored'] + team_points['AwayGoalsScored']
team_points['TotalGoalsConceded'] = team_points['HomeGoalsConceded'] + team_points['AwayGoalsConceded']

# Calculate goal difference
team_points['GoalDifference'] = team_points['TotalGoalsScored'] - team_points['TotalGoalsConceded']

# Sort team_points DataFrame based on TotalPoints and GoalDifference
team_points = team_points.sort_values(by = ['TotalPoints', 'GoalDifference'], ascending = [False, False])

# Create ranking DataFrame
ht_ranking = pd.DataFrame({
    'Team': team_points['HomeTeam'],  # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging
    'Points': team_points['TotalPoints'],
    'GoalDifference': team_points['GoalDifference'],
    'Ranking': range(1, len(team_points) + 1)
})
``` --> 

## Plots
::: panel-tabset

###### Full-Time
![](images/fifa1.png){.nostretch fig-align="center" width="700px"}

###### Half-Time
![](images/fifa2.png){.nostretch fig-align="center" width="700px"}

:::

# Discussion
Overall, the insights derived from this analysis can offer valuable guidance for teams undergoing significant shifts in rankings. By identifying potential areas for improvement, such as halftime strategies, conditioning, or tactical adjustments, teams can make informed decisions to enhance their performance and competitiveness in professional football leagues.

#

![](images/suui.gif){.nostretch fig-align="center" width="30000px"}