# Predicting a Combined League Table Using Elo Ratings


# **Football Elo Rating System: Serie A & Ligue 1 (With Inter-League Matches)**

## **Overview**
This notebook implements an **Elo rating system** to evaluate the relative strength of football teams from **Serie A (Italy) and Ligue 1 (France)**. It processes real match results from both leagues, updates Elo ratings accordingly, and then simulates inter-league matches to form a combined ranking.

---



In [2]:
import pandas as pd
from IPython.display import display
#load the dataset
file_path1="seriea-2324.csv"
file_path2="Ligue1-2324.csv"
bd=pd.read_csv(file_path1)
fd=pd.read_csv(file_path2)
print("\n")
display(fd)
display(bd)







Unnamed: 0,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,...,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR
0,11/08/23,Nice,Lille,1,1,D,1,0,H,,...,3,5,13,9,3,3,3,2,0,0
1,12/08/23,Marseille,Reims,2,1,H,1,1,D,,...,4,4,7,16,10,10,1,3,0,0
2,12/08/23,Paris SG,Lorient,0,0,D,0,0,D,,...,4,0,8,6,9,2,0,0,0,0
3,13/08/23,Brest,Lens,3,2,H,1,2,A,,...,8,2,13,15,6,8,2,3,0,1
4,13/08/23,Clermont,Monaco,2,4,A,1,2,A,,...,7,8,7,16,6,2,0,4,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
301,19/05/24,Lyon,Strasbourg,2,1,H,1,0,H,,...,6,4,7,10,7,2,1,0,0,0
302,19/05/24,Metz,Paris SG,0,2,A,0,2,A,,...,2,8,6,9,1,11,2,1,0,0
303,19/05/24,Monaco,Nantes,4,0,H,3,0,H,,...,6,3,14,23,2,2,1,2,0,0
304,19/05/24,Reims,Rennes,2,1,H,0,0,D,,...,4,5,11,17,2,4,1,1,0,0


Unnamed: 0,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,...,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR
0,19/08/23,Empoli,Verona,0,1,A,0,0,D,,...,4,4,17,18,2,4,2,2,0,0
1,19/08/23,Frosinone,Napoli,1,3,A,1,2,A,,...,1,8,14,17,4,6,3,3,0,0
2,19/08/23,Genoa,Fiorentina,1,4,A,0,3,A,,...,2,5,14,13,3,4,2,3,0,0
3,19/08/23,Inter,Monza,2,0,H,1,0,H,,...,3,2,8,13,8,3,1,1,0,0
4,20/08/23,Roma,Salernitana,2,2,D,1,1,D,,...,3,2,12,9,9,1,0,4,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,26/05/24,Empoli,Roma,2,1,H,1,1,D,,...,7,3,18,12,4,5,4,1,0,0
376,26/05/24,Frosinone,Udinese,0,1,A,0,0,D,,...,5,5,6,13,11,1,0,2,0,0
377,26/05/24,Lazio,Sassuolo,1,1,D,0,0,D,,...,6,3,15,11,5,3,3,1,0,0
378,26/05/24,Verona,Inter,2,2,D,2,2,D,,...,8,11,17,8,9,3,1,1,0,0


### Now lets check if there are any missing values that will affect our data analysis

In [3]:
# Check for missing values
print("Serie A Missing Values:\n", bd.isnull().sum(), "\n")
print("Ligue 1 Missing Values:\n", fd.isnull().sum(), "\n")

# Ensure column names are the same
print("Serie A Columns:", bd.columns)
print("Ligue 1 Columns:", fd.columns)

# Check unique teams in both leagues
print("Serie A Teams:", bd["HomeTeam"].unique())
print("Ligue 1 Teams:", fd["HomeTeam"].unique())

Serie A Missing Values:
 Date          0
HomeTeam      0
AwayTeam      0
FTHG          0
FTAG          0
FTR           0
HTHG          0
HTAG          0
HTR           0
Referee     380
HS            0
AS            0
HST           0
AST           0
HF            0
AF            0
HC            0
AC            0
HY            0
AY            0
HR            0
AR            0
dtype: int64 

Ligue 1 Missing Values:
 Date          0
HomeTeam      0
AwayTeam      0
FTHG          0
FTAG          0
FTR           0
HTHG          0
HTAG          0
HTR           0
Referee     306
HS            0
AS            0
HST           0
AST           0
HF            0
AF            0
HC            0
AC            0
HY            0
AY            0
HR            0
AR            0
dtype: int64 

Serie A Columns: Index(['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'HTHG', 'HTAG',
       'HTR', 'Referee', 'HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HC', 'AC',
       'HY', 'AY', 'HR', 'AR'],
      dtype='o

### "Referee" column is entirely missing (100% NaN) in both leagues.We are going to drop the "Referee" column since it's irrelevant for Elo ratings and Also going to convert the Date column to datetime format


In [4]:
# Drop the Referee column
bd.drop(columns=["Referee"],inplace=True)
fd.drop(columns=["Referee"],inplace=True)

# Convert Date to datetime
bd["Date"] = pd.to_datetime(bd["Date"], format="%d/%m/%y")
fd["Date"] = pd.to_datetime(fd["Date"], format="%d/%m/%y")


## Now that we have set up initial Elo ratings for all teams, the next step is to implement the Elo rating update formula for each match in our dataset.
### 1. Expected Score Calculation

        Each team's expected score is computed using:
        
        E_A = 1 / (1 + 10^((R_B - R_A)/400)
        
        Where:
        
            E_A = Expected score of Team A
        
            R_A = Current rating of Team A
        
            R_B = Current rating of Team B
        
            After the match, the new rating is updated as:
        
        R_A_new = R_A + K*(S_A - E_A)
        
        Where:
        
            R_A_new = Updated rating of Team A
        
            K = Adjustment factor (typically 20-40 for football)
        
            S_A = Actual match result:
        
                1 if Team A wins
        
                0.5 if the match is a draw
        
                0 if Team A loses
        
            E_A = Expected score of Team A 

    

In [11]:
def update_elo(rating_A, rating_B, result, K=40, home_advantage=50):
    """Update Elo ratings with home advantage applied."""
    expected_A = 1 / (1 + 10 ** ((rating_B - (rating_A + home_advantage)) / 400))
    expected_B = 1 - expected_A  # Since expected scores sum to 1

    new_rating_A = rating_A + K * (result - expected_A)
    new_rating_B = rating_B + K * ((1 - result) - expected_B)

    return new_rating_A, new_rating_B

    



## **Steps in Elo Rating Calculation**

### **1. Initialize Elo Ratings**
- Each team starts with a default **Elo rating of 1500**.
- Teams are loaded from datasets of **Serie A (`bd`)** and **Ligue 1 (`fd`)**.

### **2. Process Intra-League Matches**
- Matches within **Serie A** and **Ligue 1** are processed separately.
- Elo ratings are updated based on the match outcome:
  - **Win**: 1 point
  - **Draw**: 0.5 points
  - **Loss**: 0 points
- The `update_elo` function recalculates ratings after each match.

### **3. Simulate Inter-League Matches**
- Each **Serie A team** plays home and away matches against each **Ligue 1 team**.
- Elo ratings determine win probabilities using the formula:

  \[
  P_{home} = \frac{1}{1 + 10^{\left(\frac{E_{away} - (E_{home} + H)}{400}\right)}}
  \]

  where \( H \) is the **home advantage** (set to 50 Elo points).
- Match outcomes are simulated randomly based on these probabilities.
- Elo ratings are updated accordingly.

### **4. Final Elo Rankings**
- After processing **both intra-league and inter-league matches**, the final Elo ratings are compiled into a **ranking table**.
- The rankings provide a combined view of **team strength across both leagues**.

---

## **Key Outputs**
- `elo_df`: A DataFrame containing Elo history with match results.
- `final_elo_table`: The final ranking of all teams based on their Elo ratings.




In [18]:
import random
elo_ratings = {}
initial_elo = 1500  # Default Elo rating for new teams

# Load teams from Serie A & Ligue 1 datasets
serie_a_teams = set(bd["HomeTeam"])  # Serie A teams from 'bd'
ligue_1_teams = set(fd["HomeTeam"])  # Ligue 1 teams from 'fd'
all_teams = serie_a_teams.union(ligue_1_teams)

# Assign initial Elo to all teams
for team in all_teams:
    elo_ratings.setdefault(team, initial_elo)

elo_history = []  # Store Elo ratings after each match

# **Process Intra-League Matches (Serie A & Ligue 1 Separately)**
for df in [bd, fd]:  
    for index, row in df.iterrows():
        home_team = row["HomeTeam"]
        away_team = row["AwayTeam"]
        home_goals = row["FTHG"]
        away_goals = row["FTAG"]

        # Determine actual match result
        if home_goals > away_goals:
            result = 1  # Home win
        elif home_goals < away_goals:
            result = 0  # Away win
        else:
            result = 0.5  # Draw

        # Update Elo ratings with home advantage
        elo_ratings[home_team], elo_ratings[away_team] = update_elo(
            elo_ratings[home_team], elo_ratings[away_team], result
        )

        # Store match result
        elo_history.append({
            "Date": row["Date"],
            "HomeTeam": home_team,
            "AwayTeam": away_team,
            "HomeElo": elo_ratings[home_team],
            "AwayElo": elo_ratings[away_team],
            "Result": result
        })

# **Simulate Inter-League Matches (Home & Away)**
inter_league_matches = []

for sa_team in serie_a_teams:
    for l1_team in ligue_1_teams:
        for home, away, home_adv in [(sa_team, l1_team, 50), (l1_team, sa_team, 50)]:
            home_elo = elo_ratings[home]
            away_elo = elo_ratings[away]

            # Calculate win probabilities with home advantage
            home_prob = 1 / (1 + 10 ** ((away_elo - (home_elo + home_adv)) / 400))
            away_prob = 1 - home_prob

            # Simulate match result
            rand = random.random()
            if rand < home_prob:
                result = 1  # Home team wins
            elif rand < home_prob + (1 - home_prob - away_prob):  
                result = 0.5  # Draw
            else:
                result = 0  # Away team wins

            # Update Elo ratings
            elo_ratings[home], elo_ratings[away] = update_elo(
                elo_ratings[home], elo_ratings[away], result, home_advantage=home_adv
            )

            # Store match result
            inter_league_matches.append({
                "Date": "Inter-League",
                "HomeTeam": home,
                "AwayTeam": away,
                "HomeElo": elo_ratings[home],
                "AwayElo": elo_ratings[away],
                "Result": result
            })

# Combine all match results
elo_history.extend(inter_league_matches)

# Convert to DataFrame
elo_df = pd.DataFrame(elo_history)

# **Final Combined Elo Table**
final_elo_table = pd.DataFrame(elo_ratings.items(), columns=["Team", "Elo"])
final_elo_table = final_elo_table.sort_values(by="Elo", ascending=False).reset_index(drop=True)

print(elo_df.head())  # First few matches in Elo history
print(final_elo_table)  # Final Elo ranking across both leagues


                  Date   HomeTeam     AwayTeam      HomeElo      AwayElo  \
0  2023-08-19 00:00:00     Empoli       Verona  1477.141475  1522.858525   
1  2023-08-19 00:00:00  Frosinone       Napoli  1477.141475  1522.858525   
2  2023-08-19 00:00:00      Genoa   Fiorentina  1477.141475  1522.858525   
3  2023-08-19 00:00:00      Inter        Monza  1517.141475  1482.858525   
4  2023-08-20 00:00:00       Roma  Salernitana  1497.141475  1502.858525   

   Result  
0     0.0  
1     0.0  
2     0.0  
3     1.0  
4     0.5  
           Team          Elo
0         Reims  1790.794140
1         Lille  1721.493783
2        Rennes  1703.659373
3        Monaco  1701.248226
4      Juventus  1682.842554
5       Bologna  1663.483553
6          Lens  1648.424564
7         Milan  1632.374573
8         Inter  1622.544033
9          Nice  1595.081762
10     Atalanta  1585.661233
11     Paris SG  1569.391102
12       Verona  1564.229034
13     Clermont  1562.030174
14     Cagliari  1556.071590
15     