# End of Season Prediction

## 1. Introduction

This notebook contains a data science project predicting the Bundesliga league table at the end of the season using data from the first 18 matchday. A bivariate poisson model is used to predict match outcomes. Based on that a Monte Carlo simulation is run with `n=10.000` to estimate the probabilities of teams moving to certain positions in the table. 

To estimate `Lambda` for the poisson distribution: 
1. the average home and away goals will be used as an estimate. 
2. In the next step the home and away efficiency is calculated as the ratio of goals scored divided by chances created as well as goals conceded and chances allowed. Then `Lambda`  will be the estimated goals which will be defined as `chances created * attacking efficiency * defending effeciency`. 

At the end the two models will be compared to see if the more complex model will yield better results.

## 2. Data Preperation 

Before modelling we need to prepare the data to be in the correct format. We require data that we can derive the necessary stats from and additionally all the matches from the season. This two data sources have different characteristics. Therefore we need to make them compatible and usable for our model. 

In [1]:
# load packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 2.1 Load all matches of the season

First we load a table, that contains all the league matches with the columns `Round Number`, `Home Team`, `Away Team` and `Result`. The `Result` has the format `0 - 0`. This data is loaded from https://fixturedownload.com/results/bundesliga-2025. 

In [3]:
# load all matches from dataset
all_matches = pd.read_csv("../data/raw/ligue-1-2025-UTC.csv")

# drop the columns Date, Location, Match Number
all_matches = all_matches.drop(columns=["Date", "Location", "Match Number"], axis=1)

all_matches.head()

Unnamed: 0,Round Number,Home Team,Away Team,Result
0,1,Stade Rennais FC,Olympique de Marseille,1 - 0
1,1,RC Lens,Olympique Lyonnais,0 - 1
2,1,AS Monaco,Havre Athletic Club,3 - 1
3,1,OGC Nice,Toulouse FC,0 - 1
4,1,Stade Brestois 29,LOSC Lille,3 - 3


Further in our project we need to match the team names. Therefore we need to see the unique values for team names in the `all_matches` data. 

In [4]:
# show unique values of Home Team and Away Team in all matches and sort them alphabetically
team_names = all_matches["Home Team"].unique().tolist()
team_names.sort()
team_names

['AJ Auxerre',
 'AS Monaco',
 'Angers SCO',
 'FC Lorient',
 'FC Metz',
 'FC Nantes',
 'Havre Athletic Club',
 'LOSC Lille',
 'OGC Nice',
 'Olympique Lyonnais',
 'Olympique de Marseille',
 'Paris FC',
 'Paris Saint-Germain',
 'RC Lens',
 'RC Strasbourg Alsace',
 'Stade Brestois 29',
 'Stade Rennais FC',
 'Toulouse FC']

## 2.2 Load match statistics

After having loaded all matches of the season, we now load a table with all played matches with results and statistics.

The data is loaded from https://www.football-data.co.uk/germanym.php. This table contains a lot of data, but we will only use the following columns

Key to results data:

`Div = League Division`

`HomeTeam = Home Team`

`AwayTeam = Away Team`

`FTHG and HG = Full Time Home Team Goals`

`FTAG and AG = Full Time Away Team Goals`

`FTR and Res = Full Time Result (H=Home Win, D=Draw, A=Away Win)`

`HS = Home Team Shots`

`AS = Away Team Shots`

`HST = Home Team Shots on Target`

`AST = Away Team Shots on Target`

In [5]:
# Load the dataset containing Ligue 1 match data for the 2025 season
stats = pd.read_csv("../data/raw/ligue1_2025.csv")

# Display the first few rows of the dataset to understand its structure and contents
stats.head()

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA
0,F1,15/08/2025,19:45,Rennes,Marseille,1,0,H,0,0,...,1.95,1.9,1.94,1.98,1.95,1.97,1.86,1.89,1.97,2.02
1,F1,16/08/2025,16:00,Lens,Lyon,0,1,A,0,1,...,1.8,2.05,1.76,2.18,1.82,2.1,1.77,2.02,1.86,2.15
2,F1,16/08/2025,18:00,Monaco,Le Havre,3,1,H,1,0,...,2.0,1.85,2.1,1.79,2.05,1.85,1.96,1.77,2.07,1.88
3,F1,16/08/2025,20:05,Nice,Toulouse,0,1,A,0,0,...,1.98,1.88,2.14,1.78,1.98,1.93,1.93,1.83,2.06,1.92
4,F1,17/08/2025,14:00,Brest,Lille,3,3,D,1,2,...,2.1,1.78,2.29,1.68,2.12,1.78,2.02,1.7,2.22,1.79


In [6]:
# inspect dataset
stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Columns: 131 entries, Div to BFECAHA
dtypes: float64(108), int64(16), object(7)
memory usage: 184.3+ KB


In [7]:
# print full list of columns
print(stats.columns.tolist())

['Div', 'Date', 'Time', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'HTHG', 'HTAG', 'HTR', 'HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HC', 'AC', 'HY', 'AY', 'HR', 'AR', 'B365H', 'B365D', 'B365A', 'BFDH', 'BFDD', 'BFDA', 'BMGMH', 'BMGMD', 'BMGMA', 'BVH', 'BVD', 'BVA', 'BWH', 'BWD', 'BWA', 'CLH', 'CLD', 'CLA', 'LBH', 'LBD', 'LBA', 'PSH', 'PSD', 'PSA', 'MaxH', 'MaxD', 'MaxA', 'AvgH', 'AvgD', 'AvgA', 'BFEH', 'BFED', 'BFEA', 'B365>2.5', 'B365<2.5', 'P>2.5', 'P<2.5', 'Max>2.5', 'Max<2.5', 'Avg>2.5', 'Avg<2.5', 'BFE>2.5', 'BFE<2.5', 'AHh', 'B365AHH', 'B365AHA', 'PAHH', 'PAHA', 'MaxAHH', 'MaxAHA', 'AvgAHH', 'AvgAHA', 'BFEAHH', 'BFEAHA', 'B365CH', 'B365CD', 'B365CA', 'BFDCH', 'BFDCD', 'BFDCA', 'BMGMCH', 'BMGMCD', 'BMGMCA', 'BVCH', 'BVCD', 'BVCA', 'BWCH', 'BWCD', 'BWCA', 'CLCH', 'CLCD', 'CLCA', 'LBCH', 'LBCD', 'LBCA', 'PSCH', 'PSCD', 'PSCA', 'MaxCH', 'MaxCD', 'MaxCA', 'AvgCH', 'AvgCD', 'AvgCA', 'BFECH', 'BFECD', 'BFECA', 'B365C>2.5', 'B365C<2.5', 'PC>2.5', 'PC<2.5', 'MaxC>2.5', 'MaxC<2.5', 'AvgC>

We reduce our DataFrame to only the columns that we will use during modelling. 

In [8]:
# reduce size of the dataset
stats = stats[
    ["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG", "FTR", "HS", "AS", "HST", "AST"]
]
stats.tail(20)

Unnamed: 0,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HS,AS,HST,AST
160,18/01/2026,Rennes,Le Havre,1,1,D,19,7,7,4
161,18/01/2026,Lyon,Brest,2,1,H,19,5,6,2
162,23/01/2026,Auxerre,Paris SG,0,1,A,8,16,0,8
163,24/01/2026,Rennes,Lorient,0,2,A,22,6,5,4
164,24/01/2026,Le Havre,Monaco,0,0,D,7,10,1,2
165,24/01/2026,Marseille,Lens,3,1,H,10,7,5,2
166,25/01/2026,Nantes,Nice,1,4,A,20,7,5,4
167,25/01/2026,Brest,Toulouse,0,2,A,13,8,5,5
168,25/01/2026,Metz,Lyon,2,5,A,15,13,6,10
169,25/01/2026,Paris FC,Angers,0,0,D,8,7,1,1


To have matching column names in our `stats` and `all_matches` DataFrame we nee to rename some columns.

In [9]:
# change the column names for HomeTeam to Home Team and AwayTeam to Away Team
stats = stats.rename(columns={"HomeTeam": "Home Team", "AwayTeam": "Away Team"})
stats.head()

Unnamed: 0,Date,Home Team,Away Team,FTHG,FTAG,FTR,HS,AS,HST,AST
0,15/08/2025,Rennes,Marseille,1,0,H,12,24,5,2
1,16/08/2025,Lens,Lyon,0,1,A,18,11,5,3
2,16/08/2025,Monaco,Le Havre,3,1,H,13,9,3,4
3,16/08/2025,Nice,Toulouse,0,1,A,13,10,2,2
4,17/08/2025,Brest,Lille,3,3,D,18,10,5,6


In [10]:
# create a list of unique team names from HomeTeam in data and sort them alphabetically
team_names_stats = stats["Home Team"].unique().tolist()
team_names_stats.sort()
team_names_stats

['Angers',
 'Auxerre',
 'Brest',
 'Le Havre',
 'Lens',
 'Lille',
 'Lorient',
 'Lyon',
 'Marseille',
 'Metz',
 'Monaco',
 'Nantes',
 'Nice',
 'Paris FC',
 'Paris SG',
 'Rennes',
 'Strasbourg',
 'Toulouse']

The team names are different in the `matches` and `stats` DataFrame. We therefore need to do a matching. 

In [None]:
# match team names of data to fit that of all_matches for HomeTeam
stats["Home Team"] = stats["Home Team"].replace(
    {
        "Heidenheim": "1. FC Heidenheim 1846",
        "FC Koln": "1. FC Köln",
        "Union Berlin": "1. FC Union Berlin",
        "Mainz": "1. FSV Mainz 05",
        "Leverkusen": "Bayer 04 Leverkusen",
        "Dortmund": "Borussia Dortmund",
        "M'gladbach": "Borussia Mönchengladbach",
        "Ein Frankfurt": "Eintracht Frankfurt",
        "Augsburg": "FC Augsburg",
        "Bayern Munich": "FC Bayern München",
        "St Pauli": "FC St. Pauli",
        "Hamburg": "Hamburger SV",
        "RB Leipzig": "RB Leipzig",
        "Werder Bremen": "SV Werder Bremen",
        "Freiburg": "Sport-Club Freiburg",
        "Hoffenheim": "TSG Hoffenheim",
        "Stuttgart": "VfB Stuttgart",
        "Wolfsburg": "VfL Wolfsburg",
    }
)

In [None]:
# match team names of data to fit that of all_matches for AwayTeam
stats["Away Team"] = stats["Away Team"].replace(
    {
        "Heidenheim": "1. FC Heidenheim 1846",
        "FC Koln": "1. FC Köln",
        "Union Berlin": "1. FC Union Berlin",
        "Mainz": "1. FSV Mainz 05",
        "Leverkusen": "Bayer 04 Leverkusen",
        "Dortmund": "Borussia Dortmund",
        "M'gladbach": "Borussia Mönchengladbach",
        "Ein Frankfurt": "Eintracht Frankfurt",
        "Augsburg": "FC Augsburg",
        "Bayern Munich": "FC Bayern München",
        "St Pauli": "FC St. Pauli",
        "Hamburg": "Hamburger SV",
        "RB Leipzig": "RB Leipzig",
        "Werder Bremen": "SV Werder Bremen",
        "Freiburg": "Sport-Club Freiburg",
        "Hoffenheim": "TSG Hoffenheim",
        "Stuttgart": "VfB Stuttgart",
        "Wolfsburg": "VfL Wolfsburg",
    }
)

## 2.2 Team Statistics

We will now derive individual team statistics from the the match data for being the home or the away team. We will therefore calculate the average goals scored as home and away team, as well as the efficiency as home and away team. This will be the ratio of goals scored divided by chances created as well as goals conceded divided by chances allowed. 

In [None]:
# create home statistics
home_stats = stats.groupby("Home Team").agg(
    {"FTHG": "mean", "FTAG": "mean", "HS": "mean", "AS": "mean"}
)

home_stats.columns = [
    "avg_home_goals_scored",
    "avg_home_goals_conceded",
    "avg_home_shots_made",
    "avg_home_shots_conceded",
]

home_stats

In [None]:
# create away statistics
away_stats = stats.groupby("Away Team").agg(
    {"FTAG": "mean", "FTHG": "mean", "AS": "mean", "HS": "mean"}
)

away_stats.columns = [
    "avg_away_goals_scored",
    "avg_away_goals_conceded",
    "avg_away_shots_made",
    "avg_away_shots_conceded",
]

away_stats

In [None]:
# create measure of efficiency for home and away teams
home_stats["home_attack_eff"] = (
    home_stats["avg_home_goals_scored"] / home_stats["avg_home_shots_made"]
)

home_stats["home_defense_eff"] = (
    home_stats["avg_home_goals_conceded"] / home_stats["avg_home_shots_conceded"]
)

away_stats["away_attack_eff"] = (
    away_stats["avg_away_goals_scored"] / away_stats["avg_away_shots_made"]
)

away_stats["away_defense_eff"] = (
    away_stats["avg_away_goals_conceded"] / away_stats["avg_away_shots_conceded"]
)

In [None]:
# sort HomeTeams by home_scoring_efficiency
home_stats = home_stats.sort_values(by="home_attack_eff", ascending=False)

# sort AwayTeams by away_scoring_efficiency
away_stats = away_stats.sort_values(by="away_attack_eff", ascending=False)

In [None]:
home_stats

In [None]:
away_stats

### 2.2.1 Visualisations of Team Statistics

To understand the data better we create scatter plots of different stats.

In [None]:
plt.figure(figsize=(14, 10))
sns.scatterplot(
    data=home_stats,
    x="avg_home_goals_scored",
    y="avg_home_goals_conceded",
    s=100,
)

# Annotate each point with the team name
for i in range(home_stats.shape[0]):
    plt.text(
        home_stats["avg_home_goals_scored"].iloc[i],
        home_stats["avg_home_goals_conceded"].iloc[i],
        home_stats.index[i],
        fontsize=9,
        ha="right",
    )

plt.title("avg_home_goals_scored vs avg_home_goals_conceded")
plt.xlabel("avg_home_goals_scored")
plt.ylabel("avg_home_goals_conceded")
plt.grid()
plt.show()

## 2.1 Create Current League Table

We create the current League Table after 18 matches (for some teams 17, as matches were delayed before of weather conditions) as a reference to later determine the probabilities of ending in a different position of the table. 

In [None]:
# create a function that calculates the league table based on the DataFrame stats
def calculate_league_table(matches):
    """
    Calculates the league table based on the given matches.

    Parameters:
    matches (DataFrame): A Pandas DataFrame containing match data.
                         Expected columns are "Home Team", "Away Team",
                         "Home Goals", "Away Goals", and "Result".

    Returns:
    DataFrame: A DataFrame representing the league table with teams,
                number of matches played, wins, draws, losses,
                goals for, goals against, goal difference, points, and position.
    """
    teams = matches["Home Team"].unique()
    league_table = pd.DataFrame(
        {
            "Team": teams,
            "Played": 0,
            "Wins": 0,
            "Draws": 0,
            "Losses": 0,
            "Goals For": 0,
            "Goals Against": 0,
            "Goal Difference": 0,
            "Points": 0,
        }
    )
    league_table.set_index("Team", inplace=True)

    for _, match in matches.iterrows():
        home_team = match["Home Team"]
        away_team = match["Away Team"]
        home_goals = match["Home Goals"]
        away_goals = match["Away Goals"]
        result = match["Result"]

        league_table.at[home_team, "Played"] += 1
        league_table.at[away_team, "Played"] += 1
        league_table.at[home_team, "Goals For"] += home_goals
        league_table.at[home_team, "Goals Against"] += away_goals
        league_table.at[away_team, "Goals For"] += away_goals
        league_table.at[away_team, "Goals Against"] += home_goals

        if result == "H":
            league_table.at[home_team, "Wins"] += 1
            league_table.at[away_team, "Losses"] += 1
            league_table.at[home_team, "Points"] += 3
        elif result == "A":
            league_table.at[away_team, "Wins"] += 1
            league_table.at[home_team, "Losses"] += 1
            league_table.at[away_team, "Points"] += 3
        else:
            league_table.at[home_team, "Draws"] += 1
            league_table.at[away_team, "Draws"] += 1
            league_table.at[home_team, "Points"] += 1
            league_table.at[away_team, "Points"] += 1

    league_table["Goal Difference"] = (
        league_table["Goals For"] - league_table["Goals Against"]
    )

    league_table = league_table.sort_values(
        by=["Points", "Goal Difference", "Goals For"], ascending=False
    )
    league_table["Position"] = range(1, len(league_table) + 1)

    return league_table.reset_index()

In [None]:
# calculate the league table based on played matches
league_table = calculate_league_table(stats)
league_table

## 2.3 Remaining Matches

In [None]:
# only keep maches that have not yet been played and reset index
remaining_matches = all_matches[all_matches["Result"].isnull()].reset_index(drop=True)

# drop the round number column and the result column
remaining_matches = remaining_matches.drop(columns=["Round Number", "Result"], axis=1)

remaining_matches.head()

## 2.4 Played Matches

In [None]:
# creating the DataFrame played_matches that only contains the matches that have already been played from the stats DataFrame
played_matches = stats.copy()

# drop the Date, HS, AS, HST, AST columns
played_matches = played_matches.drop(columns=["Date", "HS", "AS", "HST", "AST"], axis=1)

# rename the FTHG column to Home Goals and FTAG to Away Goals and FTR to Result
played_matches = played_matches.rename(
    columns={"FTHG": "Home Goals", "FTAG": "Away Goals", "FTR": "Result"}
)

played_matches.head()

## 3 Poisson Model

We will model the goals by the home and away team indipendently from each other with a poisson model. 

## 3.1. Defining the Model

In [None]:
# defining a function to simulate a football match between two teams using their statistics and a poisson distribution
def simulate_match(home_team, away_team, home_stats, away_stats):
    """
    Simulates a football match between two teams based on their statistics.

    Parameters:
    home_team (str): The name of the home team.
    away_team (str): The name of the away team.
    home_stats (DataFrame): A DataFrame containing statistics for home teams.
    away_stats (DataFrame): A DataFrame containing statistics for away teams.

    Returns:
    tuple: A tuple containing the number of goals scored by the home team and the away team.
    """

    # Retrieve statistics for the home team
    home_team_stats = home_stats.loc[home_team]

    # Retrieve statistics for the away team
    away_team_stats = away_stats.loc[away_team]

    # Calculate efficiency metrics for the home team
    home_attack_eff = home_stats.loc[home_team, "home_attack_eff"]
    home_defense_eff = home_stats.loc[home_team, "home_defense_eff"]

    # Calculate efficiency metrics for the away team
    away_attack_eff = away_stats.loc[away_team, "away_attack_eff"]
    away_defense_eff = away_stats.loc[away_team, "away_defense_eff"]

    # Calculate expected goals for the home team
    expected_home_goals = (
        home_team_stats["avg_home_shots_made"]
        * home_attack_eff
        * (1 - away_defense_eff)
    )

    # Calculate expected goals for the away team
    expected_away_goals = (
        away_team_stats["avg_away_shots_made"]
        * away_attack_eff
        * (1 - home_defense_eff)
    )

    # Simulate the number of goals scored by the home team using a Poisson distribution
    home_goals = np.random.poisson(expected_home_goals)

    # Simulate the number of goals scored by the away team using a Poisson distribution
    away_goals = np.random.poisson(expected_away_goals)

    return home_goals, away_goals

In [None]:
# testing model for one match
home_team = "FC St. Pauli"
away_team = "RB Leipzig"

# Calculate efficiency metrics for the home team
home_attack_eff = home_stats.loc[home_team, "home_attack_eff"]
home_defense_eff = home_stats.loc[home_team, "home_defense_eff"]

# Calculate efficiency metrics for the away team
away_attack_eff = away_stats.loc[away_team, "away_attack_eff"]
away_defense_eff = away_stats.loc[away_team, "away_defense_eff"]

# simulate test match
simulate_match("FC St. Pauli", "RB Leipzig", home_stats, away_stats)

## 3.2 Simulating Remaining Matches

In [None]:
# define a function to simulate the rest of the season
def simulate_season(remaining_matches, home_stats, away_stats):
    """
    Simulates the remaining matches of a football season.

    Parameters:
    remaining_matches (DataFrame): A DataFrame containing the remaining matches to be played.
    home_stats (DataFrame): A DataFrame containing statistics for home teams.
    away_stats (DataFrame): A DataFrame containing statistics for away teams.

    Returns:
    DataFrame: A DataFrame containing the results of the simulated matches.
    """

    # Create a copy of the remaining matches DataFrame to store results
    simulated_results = remaining_matches.copy()

    # Iterate through each match in the remaining matches
    for index, row in simulated_results.iterrows():
        home_team = row["Home Team"]  # Home team name
        away_team = row["Away Team"]  # Away team name

        # Simulate the match and get the number of goals scored by each team
        home_goals, away_goals = simulate_match(
            home_team, away_team, home_stats, away_stats
        )

        # Store the simulated results in the DataFrame
        simulated_results.at[index, "Home Goals"] = home_goals
        simulated_results.at[index, "Away Goals"] = away_goals

        # Determine the result of the match based on the number of goals scored
        if home_goals > away_goals:
            simulated_results.at[index, "Result"] = "H"  # Home team wins
        elif away_goals > home_goals:
            simulated_results.at[index, "Result"] = "A"  # Away team wins
        else:
            simulated_results.at[index, "Result"] = "D"  # Draw

    return simulated_results  # Return the DataFrame with simulated results

In [None]:
# simulate the rest of the season
simulated_season_results = simulate_season(remaining_matches, home_stats, away_stats)
simulated_season_results

# 4. Monte Carlo Simulation

We will create a Monte Carlo Simulation with `N=1000` simulations of the rest of the season. 

In [None]:
# define a function to perform Monte Carlo simulation of the remaining matches
def monte_carlo_simulation(
    all_matches, played_matches, home_stats, away_stats, n_simulations=1000
):
    """
    Performs a Monte Carlo simulation of the remaining matches in a football season.

    Parameters:
    all_matches (DataFrame): A DataFrame containing all matches in the season.
    home_stats (DataFrame): A DataFrame containing statistics for home teams.
    away_stats (DataFrame): A DataFrame containing statistics for away teams.
    n_simulations (int): The number of simulations to run.

    Returns:
    list: A list of DataFrames containing the league tables from each simulation.
    """

    league_tables = []  # Initialize a list to store league tables from each simulation

    full_season_results_list = (
        []
    )  # Initialize a list to store full season results from each simulation

    # Perform the specified number of simulations
    for simulation_number in range(n_simulations):

        # Identify remaining matches that have not yet been played
        remaining_matches = all_matches[all_matches["Result"].isnull()].reset_index(
            drop=True
        )

        # drop the round number column and the result column
        remaining_matches = remaining_matches.drop(
            columns=["Round Number", "Result"], axis=1
        )

        # Simulate the remaining matches
        simulated_results = simulate_season(remaining_matches, home_stats, away_stats)

        # Combine simulated results with already played matches
        full_season_results = pd.concat(
            [played_matches, simulated_results], ignore_index=True
        )
        full_season_results_list.append(full_season_results)

        # Create the league table based on the full season results
        league_table = calculate_league_table(full_season_results)

        # Append the league table to the list
        league_tables.append(league_table)

    return (
        league_tables,
        full_season_results_list,
    )  # Return the list of league tables from simulations

In [None]:
# run monte carlo simulation with 1000 simulations
league_tables_simulations, full_season_results_simulations = monte_carlo_simulation(
    all_matches, played_matches, home_stats, away_stats, n_simulations=1000
)
# display first league table from simulations
league_tables_simulations[0]

# display first full season results from simulations
# full_season_results_simulations[0]

# 5 Visualisation of Monte Carlo Results

In [None]:
# create a DataFrame to store the final positions of each team across all simulations
final_positions = pd.DataFrame(0, index=team_names, columns=range(1, 19))
# populate the final_positions DataFrame containing the count of final positions for each team
for league_table in league_tables_simulations:
    for _, row in league_table.iterrows():
        team = row["Team"]
        position = row["Position"]
        final_positions.at[team, position] += 1

# normalize the final_positions DataFrame to get probabilities
final_positions = final_positions.div(final_positions.sum(axis=1), axis=0)

# change the order so that each team is in the row of its current position in the league table
current_league_table = calculate_league_table(played_matches)
final_positions = final_positions.reindex(current_league_table["Team"])

# display final positions DataFrame
final_positions

In [None]:
# create a heatmap to visualize the probabilities of each team finishing in each position
plt.figure(figsize=(14, 10))
ax = sns.heatmap(
    final_positions,
    annot=True,
    fmt=".2f",
    cmap="YlGnBu",
    cbar_kws={"label": "Probability"},
)

plt.title("Probabilities of Final League Positions after Monte Carlo Simulations")
plt.xlabel("Final Position")
plt.ylabel("Team")
plt.yticks(rotation=0)

# X-Achsen-Werte nach oben verschieben
ax.xaxis.tick_top()
ax.xaxis.set_label_position("top")

plt.show()