This analysis will centre around the 24 teams in the 2024 UEFA European Football Championships. Data will be collected from a variety of sources all of which may have a role in influencing a nation's odds of winning. This data will be merged to create one coherent dataset.

# World ranking

The first metric that could help predict a nation's chance of winning the tournament is their world ranking. This is a metric that ranks each of the countries based on their previous sporting performance. This data is scraped from an online source.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL of the webpage
url = "https://www.sportingnews.com/uk/football/news/fifa-rankings-euro-2024-teams-soccer-uefa-euro-championship/880ff126e23e0e4833ce8e08"

# Fetch the webpage content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the table data
table_data = []
table = soup.find('table')

for row in table.find_all('tr'):
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    if cols:
        table_data.append(cols)

# Create a DataFrame
df = pd.DataFrame(table_data, columns=['Team', 'FIFA Rank'])

# Drop the first row
df = df.drop(0)

# Sort table alphabetically by Team
df_sorted = df.sort_values(by='Team')

#Rename Czechia for consistency across datasets
df_sorted['Team'] = df_sorted['Team'].replace('Czechia', 'Czech Republic')

# Reset index after dropping rows and rearranging order
df_sorted.reset_index(drop=True, inplace=True)
df_sorted

Unnamed: 0,Team,FIFA Rank
0,Albania,66
1,Austria,25
2,Belgium,3
3,Croatia,10
4,Czech Republic,36
5,Denmark,21
6,England,4
7,France,2
8,Georgia,75
9,Germany,16


#Manager Data

Each football team has a manager who are responsible for picking players and organising them into a coherent team. Their experience and time in charge may be meaningful metrics when it comes to tournament odds.

**Note:** Managers with ? in the 'Contract until' column, did not have this information stored in the online dataset. This will be resolved later in the analysis.

In [None]:
#Copied and pasted into excel from https://www.transfermarkt.co.uk/europameisterschaft-2024/trainer/pokalwettbewerb/EM24
managers = pd.read_csv('Managers.csv', encoding = 'ISO-8859-1')
managers

Unnamed: 0,Manager,Country,Age,Installation,Contract until
0,Didier Deschamps,France,55,11 years 11 months 04 days,2026
1,Gareth Southgate,England,53,07 years 08 months 15 days,2024
2,Zlatko Dalic,Croatia,57,06 years 08 months 05 days,2026
3,Marco Rossi,Hungary,59,05 years 11 months 24 days,2025
4,Matjaz Kek,Slovenia,62,05 years 06 months 16 days,2025
5,Steve Clarke,Scotland,60,05 years 23 days,2026
6,Kasper Hjulmand,Denmark,52,03 years 10 months 11 days,2026
7,Willy Sagnol,Georgia,47,03 years 03 months 28 days,2024
8,Dragan Stojkovic,Serbia,59,03 years 03 months 09 days,?
9,Murat Yakin,Switzerland,49,02 years 10 months 03 days,2024


These managers are sorted by the country they managed and the dataset index is reset.

In [None]:
# Sort table alphabetically by Team
managers_sorted = managers.sort_values(by='Country')
# Reset index after dropping rows and rearranging order
managers_sorted.reset_index(drop=True, inplace=True)
managers_sorted

Unnamed: 0,Manager,Country,Age,Installation,Contract until
0,Sylvinho,Albania,50,01 year 05 months 10 days,2024
1,Ralf Rangnick,Austria,65,02 years 11 days,2025
2,Domenico Tedesco,Belgium,38,01 year 04 months 04 days,2026
3,Zlatko Dalic,Croatia,57,06 years 08 months 05 days,2026
4,Ivan Hasek,Czech Republic,60,05 months 08 days,2025
5,Kasper Hjulmand,Denmark,52,03 years 10 months 11 days,2026
6,Gareth Southgate,England,53,07 years 08 months 15 days,2024
7,Didier Deschamps,France,55,11 years 11 months 04 days,2026
8,Willy Sagnol,Georgia,47,03 years 03 months 28 days,2024
9,Julian Nagelsmann,Germany,36,08 months 21 days,2026


Age is renamed as manager age to avoid confusion with player age when data is merged later in the analysis.

In [None]:
managers_sorted = managers_sorted.rename(columns={'Age': 'Manager_Age'})
managers_sorted.head(3)

Unnamed: 0,Manager,Country,Manager_Age,Installation,Contract until
0,Sylvinho,Albania,50,01 year 05 months 10 days,2024
1,Ralf Rangnick,Austria,65,02 years 11 days,2025
2,Domenico Tedesco,Belgium,38,01 year 04 months 04 days,2026


In [None]:
managers_sorted = managers_sorted.rename(columns={'Country': 'Team'})
managers_sorted.head(3)

Unnamed: 0,Manager,Team,Manager_Age,Installation,Contract until
0,Sylvinho,Albania,50,01 year 05 months 10 days,2024
1,Ralf Rangnick,Austria,65,02 years 11 days,2025
2,Domenico Tedesco,Belgium,38,01 year 04 months 04 days,2026


In [None]:
managers_sorted

Unnamed: 0,Manager,Team,Manager_Age,Installation,Contract until
0,Sylvinho,Albania,50,01 year 05 months 10 days,2024
1,Ralf Rangnick,Austria,65,02 years 11 days,2025
2,Domenico Tedesco,Belgium,38,01 year 04 months 04 days,2026
3,Zlatko Dalic,Croatia,57,06 years 08 months 05 days,2026
4,Ivan Hasek,Czech Republic,60,05 months 08 days,2025
5,Kasper Hjulmand,Denmark,52,03 years 10 months 11 days,2026
6,Gareth Southgate,England,53,07 years 08 months 15 days,2024
7,Didier Deschamps,France,55,11 years 11 months 04 days,2026
8,Willy Sagnol,Georgia,47,03 years 03 months 28 days,2024
9,Julian Nagelsmann,Germany,36,08 months 21 days,2026


In [None]:
managers_sorted['Team'] = managers_sorted['Team'].replace('Türkiye', 'Turkey')

#Players

Each nation's squad is made up of a group of generally 20-25 players. Personal data such as name, position, age etc tell us generic information about the player. The number of caps/goals each player has as well as their 'Market Value' could be considered to be more indicative of the players' sporting quality and experience.

In [None]:
#Data downloaded from https://www.kaggle.com/datasets/damirdizdarevic/uefa-euro-2024-players
players = pd.read_csv('euro2024_players.csv', encoding = 'ISO-8859-1')
players

Unnamed: 0,Name,Position,Age,Club,Height,Foot,Caps,Goals,MarketValue,Country
0,Marc-AndrÃ© ter Stegen,Goalkeeper,32,FC Barcelona,187,right,40,0,28000000,Germany
1,Manuel Neuer,Goalkeeper,38,Bayern Munich,193,right,119,0,4000000,Germany
2,Oliver Baumann,Goalkeeper,34,TSG 1899 Hoffenheim,187,right,0,0,3000000,Germany
3,Nico Schlotterbeck,Centre-Back,24,Borussia Dortmund,191,left,12,0,40000000,Germany
4,Jonathan Tah,Centre-Back,28,Bayer 04 Leverkusen,195,right,25,0,30000000,Germany
...,...,...,...,...,...,...,...,...,...,...
618,Adam Hlozek,Second Striker,21,Bayer 04 Leverkusen,188,right,31,2,12000000,Czech Republic
619,Patrik Schick,Centre-Forward,28,Bayer 04 Leverkusen,191,left,37,18,22000000,Czech Republic
620,MojmÃ­r Chytil,Centre-Forward,25,SK Slavia Prague,187,-,12,4,6500000,Czech Republic
621,Jan Kuchta,Centre-Forward,27,AC Sparta Prague,185,right,20,3,5000000,Czech Republic


These players are sorted by the country the play for and the index is updated.

In [None]:
# Sort table alphabetically by Country
players_sorted = players.sort_values(by='Country')
# Reset index after dropping rows and rearranging order
players_sorted.reset_index(drop=True, inplace=True)
players_sorted

Unnamed: 0,Name,Position,Age,Club,Height,Foot,Caps,Goals,MarketValue,Country
0,Elhan Kastrati,Goalkeeper,27,AS Cittadella,189,right,2,0,1300000,Albania
1,Thomas Strakosha,Goalkeeper,29,Brentford FC,193,right,28,0,3000000,Albania
2,Etrit Berisha,Goalkeeper,35,FC Empoli,194,left,80,0,500000,Albania
3,Berat Djimsiti,Centre-Back,31,Atalanta BC,190,right,57,1,10000000,Albania
4,Marash Kumbulla,Centre-Back,24,US Sassuolo,191,right,19,0,4500000,Albania
...,...,...,...,...,...,...,...,...,...,...
618,Mykola Matvienko,Centre-Back,28,Shakhtar Donetsk,182,left,64,0,18000000,Ukraine
619,Ilya Zabarnyi,Centre-Back,21,AFC Bournemouth,189,right,35,1,32000000,Ukraine
620,Georgiy Bushchan,Goalkeeper,30,Dynamo Kyiv,196,right,18,0,7000000,Ukraine
621,Oleksandr Tymchyk,Right-Back,27,Dynamo Kyiv,180,right,17,1,4000000,Ukraine


In [None]:
player_metrics = players_sorted[['Age', 'Height', 'Caps', 'Goals', 'MarketValue', 'Country']]
player_metrics

Unnamed: 0,Age,Height,Caps,Goals,MarketValue,Country
0,27,189,2,0,1300000,Albania
1,29,193,28,0,3000000,Albania
2,35,194,80,0,500000,Albania
3,31,190,57,1,10000000,Albania
4,24,191,19,0,4500000,Albania
...,...,...,...,...,...,...
618,28,182,64,0,18000000,Ukraine
619,21,189,35,1,32000000,Ukraine
620,30,196,18,0,7000000,Ukraine
621,27,180,17,1,4000000,Ukraine


Grouping the players by country will allow for easy comparison between countries about the key metrics when it comes to the playing squad. 'Grouping' the players by country means gathering players into their respective countries and then calculating the mean (of each numeric metric) of each player per country.

This leaves the dataset as one row per country with the metrics representing the average value for the players in that squad.

In [None]:
players_grouped = player_metrics.groupby('Country').mean().reset_index()
players_grouped

Unnamed: 0,Country,Age,Height,Caps,Goals,MarketValue
0,Albania,27.307692,183.615385,26.115385,1.538462,4292308.0
1,Austria,26.807692,183.192308,23.576923,3.576923,9057692.0
2,Belgium,26.88,184.68,37.96,7.08,23380000.0
3,Croatia,27.692308,184.115385,44.307692,5.653846,12603850.0
4,Czech Republic,25.307692,185.538462,15.576923,2.5,7457692.0
5,Denmark,27.692308,186.269231,41.192308,5.192308,15980770.0
6,England,26.076923,182.461538,25.038462,3.846154,58269230.0
7,France,26.88,184.44,33.44,7.68,49360000.0
8,Georgia,27.153846,184.5,28.846154,2.461538,6159615.0
9,Germany,28.115385,185.384615,34.846154,5.153846,32730770.0


In [None]:
players_grouped = players_grouped.rename(columns={'Country': 'Team'})
players_grouped.head(3)

Unnamed: 0,Team,Age,Height,Caps,Goals,MarketValue
0,Albania,27.307692,183.615385,26.115385,1.538462,4292308.0
1,Austria,26.807692,183.192308,23.576923,3.576923,9057692.0
2,Belgium,26.88,184.68,37.96,7.08,23380000.0


In [None]:
# Replace 'Turkiye' with 'Turkey' in the 'Team' column
players_grouped['Team'] = players_grouped['Team'].replace('Turkiye', 'Turkey')

In [None]:
# Convert 'MarketValue' column from scientific notation to numeric
players_grouped['MarketValue'] = players_grouped['MarketValue'].apply(lambda x: '{:.0f}'.format(x))

players_grouped

Unnamed: 0,Team,Age,Height,Caps,Goals,MarketValue
0,Albania,27.307692,183.615385,26.115385,1.538462,4292308
1,Austria,26.807692,183.192308,23.576923,3.576923,9057692
2,Belgium,26.88,184.68,37.96,7.08,23380000
3,Croatia,27.692308,184.115385,44.307692,5.653846,12603846
4,Czech Republic,25.307692,185.538462,15.576923,2.5,7457692
5,Denmark,27.692308,186.269231,41.192308,5.192308,15980769
6,England,26.076923,182.461538,25.038462,3.846154,58269231
7,France,26.88,184.44,33.44,7.68,49360000
8,Georgia,27.153846,184.5,28.846154,2.461538,6159615
9,Germany,28.115385,185.384615,34.846154,5.153846,32730769


#Historical results

Historical results refer to the games the nation has previously played. These can be interesting data points as they point to a team's quality and ability to win in recent times. Data from as early as 1872 clearly isn't particularly relevant to this summer's tournament so results will only be considered if they are after the most recent Eurpoean Football Championships (Summer 2021). Data from teams not competing in this summer's tournament will not be considered. A win percentage of the competing teams will be calculated showing their ability to win games across the last 3 years. The win percentage statistic will give an indicator of the nation's match winning ability over the course of the last 3 years.

In [None]:
#Data was downloaded from: https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017?select=results.csv
results = pd.read_csv('results.csv', encoding = 'ISO-8859-1')
results

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0.0,0.0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4.0,2.0,Friendly,London,England,False
2,1874-03-07,Scotland,England,2.0,1.0,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2.0,2.0,Friendly,London,England,False
4,1876-03-04,Scotland,England,3.0,0.0,Friendly,Glasgow,Scotland,False
...,...,...,...,...,...,...,...,...,...
47320,2024-07-06,,,,,UEFA Euro,DÃ¼sseldorf,Germany,True
47321,2024-07-06,,,,,UEFA Euro,Berlin,Germany,True
47322,2024-07-09,,,,,UEFA Euro,Munich,Germany,True
47323,2024-07-10,,,,,UEFA Euro,Dortmund,Germany,True


In [None]:
# Convert 'Date' column to datetime format
results['date'] = pd.to_datetime(results['date'])
# Define the cutoff date
cutoff_date = pd.Timestamp('2021-07-12') #Day after last euros tournament final
cutoff_date2 = pd.Timestamp('2024-06-12') #Day of data collection
# Filter out rows where the date is past the cutoff date
results_filtered = results[(results['date'] > cutoff_date) & (results['date'] < cutoff_date2)]
results_filtered

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
44159,2021-07-13,Botswana,Zambia,1.0,2.0,COSAFA Cup,Port Elizabeth,South Africa,True
44160,2021-07-13,South Africa,Lesotho,4.0,0.0,COSAFA Cup,Port Elizabeth,South Africa,False
44161,2021-07-13,Senegal,Zimbabwe,2.0,1.0,COSAFA Cup,Port Elizabeth,South Africa,True
44162,2021-07-13,Malawi,Namibia,1.0,1.0,COSAFA Cup,Port Elizabeth,South Africa,True
44163,2021-07-13,Qatar,Panama,3.0,3.0,Gold Cup,Houston,United States,True
...,...,...,...,...,...,...,...,...,...
47262,2024-06-11,Saint Kitts and Nevis,Bahamas,1.0,0.0,FIFA World Cup qualification,Basseterre,Saint Kitts and Nevis,False
47263,2024-06-11,Saint Lucia,Aruba,2.0,2.0,FIFA World Cup qualification,Bridgetown,Barbados,True
47264,2024-06-11,Guyana,Belize,3.0,1.0,FIFA World Cup qualification,Bridgetown,Barbados,True
47265,2024-06-11,Dominican Republic,British Virgin Islands,4.0,0.0,FIFA World Cup qualification,San CristÃ³bal,Dominican Republic,False


In [None]:
# Filter rows where either home_team or away_team is in df_sorted['Team']
filtered_results2 = results_filtered[(results_filtered['home_team'].isin(df_sorted['Team'])) | (results_filtered['away_team'].isin(df_sorted['Team']))]
filtered_results2

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
44204,2021-09-01,Qatar,Serbia,0.0,4.0,Friendly,Debrecen,Hungary,True
44205,2021-09-01,Switzerland,Greece,2.0,1.0,Friendly,Basel,Switzerland,False
44212,2021-09-01,Portugal,Republic of Ireland,2.0,1.0,FIFA World Cup qualification,Faro-LoulÃ©,Portugal,False
44214,2021-09-01,Kazakhstan,Ukraine,2.0,2.0,FIFA World Cup qualification,Nur-Sultan,Kazakhstan,False
44215,2021-09-01,France,Bosnia and Herzegovina,1.0,1.0,FIFA World Cup qualification,Strasbourg,France,False
...,...,...,...,...,...,...,...,...,...
47207,2024-06-10,Czech Republic,North Macedonia,2.0,1.0,Friendly,Hradec KrÃ¡lovÃ©,Czech Republic,False
47208,2024-06-10,Netherlands,Iceland,4.0,0.0,Friendly,Rotterdam,Netherlands,False
47209,2024-06-10,Poland,Turkey,2.0,1.0,Friendly,Warsaw,Poland,False
47226,2024-06-11,Moldova,Ukraine,0.0,4.0,Friendly,ChiÈinÄu,Moldova,False


In [None]:
# Initialize a list to store results
teams_stats_list = []

# Extract unique teams from managers_sorted
selected_teams = managers_sorted['Team'].unique()

# Filter results to include only selected teams
filtered_results = filtered_results2[(filtered_results2['home_team'].isin(selected_teams)) | (filtered_results2['away_team'].isin(selected_teams))]

# Calculate stats for each selected team
for team in selected_teams:
    matches_played = len(filtered_results[(filtered_results['home_team'] == team) | (filtered_results['away_team'] == team)])
    wins = len(filtered_results[((filtered_results['home_team'] == team) & (filtered_results['home_score'] > filtered_results['away_score'])) |
                                ((filtered_results['away_team'] == team) & (filtered_results['away_score'] > filtered_results['home_score']))])
    draws = len(filtered_results[((filtered_results['home_team'] == team) | (filtered_results['away_team'] == team)) &
                                 (filtered_results['home_score'] == filtered_results['away_score'])])
    win_percentage = (wins / matches_played) * 100 if matches_played > 0 else 0

    # Append results to teams_stats DataFrame
    teams_stats_list.append({'Team': team, 'Win Percentage': win_percentage})

# Create DataFrame from teams_stats_list
teams_stats = pd.DataFrame(teams_stats_list)

# Sort the DataFrame by Team
teams_stats = teams_stats.sort_values(by='Team').reset_index(drop=True)

print(teams_stats)

              Team  Win Percentage
0          Albania       38.709677
1          Austria       54.838710
2          Belgium       57.575758
3          Croatia       59.459459
4   Czech Republic       48.387097
5          Denmark       65.625000
6          England       55.882353
7           France       63.888889
8          Georgia       51.724138
9          Germany       50.000000
10         Hungary       54.838710
11           Italy       46.875000
12     Netherlands       67.647059
13          Poland       52.941176
14        Portugal       75.000000
15         Romania       41.935484
16        Scotland       51.612903
17          Serbia       60.606061
18        Slovakia       45.161290
19        Slovenia       50.000000
20           Spain       64.705882
21     Switzerland       44.117647
22          Turkey       51.612903
23         Ukraine       44.827586


#Titles

Titles in this context means the number of Eurpoean Football Championships each of the competing countries have one before. An ability to win these tournaments in the past may be beneficial to a countries chances of winnning it again.

In [None]:
#Data was copied from: https://www.uefa.com/uefaeuro/history/winners/
titles = pd.read_csv('Titles.csv')
titles

Unnamed: 0,Country,Titles
0,Germany,3.0
1,Spain,3.0
2,Italy,2.0
3,France,2.0
4,Netherlands,1.0
5,Czechia,1.0
6,Slovakia,1.0
7,Russia,1.0
8,Greece,1.0
9,Denmark,1.0


In [None]:
titles.drop(titles.tail(1).index,inplace=True)

In [None]:
titles = titles.rename(columns={'Country': 'Team'})
titles.head(3)

Unnamed: 0,Team,Titles
0,Germany,3.0
1,Spain,3.0
2,Italy,2.0


In [None]:
#Drop Russia, Greece from titles as they are not competing this year
titles = titles[~titles['Team'].isin(['Russia', 'Greece'])]
#Rename Czechia as Czech republic
titles.loc[titles['Team'] == 'Czechia', 'Team'] = 'Czech Republic'

In [None]:
titles

Unnamed: 0,Team,Titles
0,Germany,3.0
1,Spain,3.0
2,Italy,2.0
3,France,2.0
4,Netherlands,1.0
5,Czech Republic,1.0
6,Slovakia,1.0
9,Denmark,1.0
10,Portugal,1.0


#Qualifying stats

Qualifying statistics refer to the manner in which each nation qualified for the tournament. The host nation (Germany) automatically qualify so data reffering to the other 23 competing countries will be collected.

In [None]:
#https://footystats.org/international/uefa-euro-qualifiers
quali = pd.read_csv('Qualifying.csv')
quali

Unnamed: 0.1,Unnamed: 0,Team,MP,W,D,L,GF,GA,GD,Last 5,Form,CS,BTTS,xGF,1.5+,2.5+,AVG
0,1.0,Portugal,10.0,10.0,0.0,0.0,36.0,2.0,34.0,W,3.0,90%,10%,2.46,80%,60%,3.8
1,,,,,,,,,,W,,,,,,,
2,,,,,,,,,,W,,,,,,,
3,,,,,,,,,,W,,,,,,,
4,,,,,,,,,,W,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260,53.0,San Marino,10.0,0.0,0.0,10.0,3.0,31.0,-28.0,L,0.0,0%,30%,0.38,100%,80%,3.4
261,,,,,,,,,,L,,,,,,,
262,,,,,,,,,,L,,,,,,,
263,,,,,,,,,,L,,,,,,,


The last 5 column in this instance is causing an unusual pattern in the dataset. As this column is represented in the Form column, the datset can be compacted to one row per country

In [None]:
quali = quali.dropna(subset='Team')
quali

Unnamed: 0.1,Unnamed: 0,Team,MP,W,D,L,GF,GA,GD,Last 5,Form,CS,BTTS,xGF,1.5+,2.5+,AVG
0,1.0,Portugal,10.0,10.0,0.0,0.0,36.0,2.0,34.0,W,3.0,90%,10%,2.46,80%,60%,3.8
5,2.0,Denmark,10.0,7.0,1.0,2.0,19.0,10.0,9.0,W,2.2,30%,60%,1.9,80%,60%,2.9
10,3.0,France,8.0,7.0,1.0,0.0,29.0,3.0,26.0,W,2.75,75%,25%,2.57,75%,63%,4.0
15,4.0,Slovakia,10.0,7.0,1.0,2.0,17.0,8.0,9.0,W,2.2,50%,40%,1.37,60%,50%,2.5
20,5.0,Slovenia,10.0,7.0,1.0,2.0,20.0,9.0,11.0,W,2.2,40%,50%,1.41,90%,60%,2.9
25,6.0,Romania,10.0,6.0,4.0,0.0,16.0,5.0,11.0,W,2.2,60%,40%,1.55,70%,40%,2.1
30,7.0,Spain,8.0,7.0,0.0,1.0,25.0,5.0,20.0,W,2.63,50%,38%,2.29,88%,63%,3.75
35,8.0,England,8.0,6.0,2.0,0.0,22.0,4.0,18.0,W,2.5,50%,50%,1.56,100%,50%,3.25
40,9.0,Belgium,8.0,6.0,2.0,0.0,22.0,4.0,18.0,W,2.5,63%,38%,1.73,88%,63%,3.25
45,10.0,Ukraine,10.0,6.0,2.0,2.0,15.0,10.0,5.0,W,2.0,30%,60%,1.21,80%,50%,2.5


Drop unneccessary columns, Last 5 is represented in the form column as stated previously.

In [None]:
columns_to_drop = ['Unnamed: 0', 'Last 5']
quali = quali.drop(columns=columns_to_drop)
quali.head(3)

Unnamed: 0,Team,MP,W,D,L,GF,GA,GD,Form,CS,BTTS,xGF,1.5+,2.5+,AVG
0,Portugal,10.0,10.0,0.0,0.0,36.0,2.0,34.0,3.0,90%,10%,2.46,80%,60%,3.8
5,Denmark,10.0,7.0,1.0,2.0,19.0,10.0,9.0,2.2,30%,60%,1.9,80%,60%,2.9
10,France,8.0,7.0,1.0,0.0,29.0,3.0,26.0,2.75,75%,25%,2.57,75%,63%,4.0


In [None]:
quali = quali.reset_index(drop=True)
quali.head(3)

Unnamed: 0,Team,MP,W,D,L,GF,GA,GD,Form,CS,BTTS,xGF,1.5+,2.5+,AVG
0,Portugal,10.0,10.0,0.0,0.0,36.0,2.0,34.0,3.0,90%,10%,2.46,80%,60%,3.8
1,Denmark,10.0,7.0,1.0,2.0,19.0,10.0,9.0,2.2,30%,60%,1.9,80%,60%,2.9
2,France,8.0,7.0,1.0,0.0,29.0,3.0,26.0,2.75,75%,25%,2.57,75%,63%,4.0


The form feature encapsulates the team's playing data. In qualifying, a team gets 3 points for a win, 1 for a draw and 0 for a loss. For instance Portugal have played 10 games and won 10 games meaning they end on 30 points. This data can be represented in a single cell by total points (30) / games played (10) = Form (3). This feature shows the number of points per game each team achieved across their qualifying period.

This feature encapsulates all match statistics i,e MP, W, L, D ao they can be dropped.
Features BTTS, 1.5+, 2.5+, AVG, refer to more specific in-game statistics that aren't important in this analysis, so they are also dropped.

In [None]:
#Form encapsulates MP, W, D, L
##Last 3 columns and BTTS refer to team and opponent so not suitable
columns_to_drop = ['MP', 'W', 'D', 'L', 'BTTS', '1.5+', '2.5+', 'AVG']
quali = quali.drop(columns=columns_to_drop)
quali.head(3)

Unnamed: 0,Team,GF,GA,GD,Form,CS,xGF
0,Portugal,36.0,2.0,34.0,3.0,90%,2.46
1,Denmark,19.0,10.0,9.0,2.2,30%,1.9
2,France,29.0,3.0,26.0,2.75,75%,2.57


In [None]:
# Sort table alphabetically by Team
quali_sorted = quali.sort_values(by='Team')
# Reset index after dropping rows and rearranging order
quali_sorted.reset_index(drop=True, inplace=True)
quali_sorted.head(3)

Unnamed: 0,Team,GF,GA,GD,Form,CS,xGF
0,Albania,12.0,4.0,8.0,1.88,50%,1.13
1,Andorra,3.0,20.0,-17.0,0.2,10%,0.46
2,Armenia,9.0,11.0,-2.0,1.0,0%,1.22


In [None]:
# Define new column names
new_column_names = {
    'Team': 'Team',
    'GF': 'Q_GF',
    'GA': 'Q_GA',
    'GD': 'Q_GD',
    'Form': 'Q_PPG_Last_5',
    'CS': 'Q_Clean_Sheets',
    'xGF': 'Q_xGF',
}

# Rename columns
quali_sorted = quali_sorted.rename(columns=new_column_names)

quali_sorted.head(3)

Unnamed: 0,Team,Q_GF,Q_GA,Q_GD,Q_PPG_Last_5,Q_Clean_Sheets,Q_xGF
0,Albania,12.0,4.0,8.0,1.88,50%,1.13
1,Andorra,3.0,20.0,-17.0,0.2,10%,0.46
2,Armenia,9.0,11.0,-2.0,1.0,0%,1.22


In [None]:
quali_sorted

Unnamed: 0,Team,Q_GF,Q_GA,Q_GD,Q_PPG_Last_5,Q_Clean_Sheets,Q_xGF
0,Albania,12.0,4.0,8.0,1.88,50%,1.13
1,Andorra,3.0,20.0,-17.0,0.2,10%,0.46
2,Armenia,9.0,11.0,-2.0,1.0,0%,1.22
3,Austria,17.0,7.0,10.0,2.38,38%,1.91
4,Azerbaijan,7.0,17.0,-10.0,0.88,25%,0.75
5,Belarus,9.0,14.0,-5.0,1.2,40%,1.0
6,Belgium,22.0,4.0,18.0,2.5,63%,1.73
7,Bosnia-Herzegovina,10.0,22.0,-12.0,0.82,18%,1.21
8,Bulgaria,7.0,14.0,-7.0,0.5,0%,1.21
9,Croatia,13.0,4.0,9.0,2.0,63%,2.23


The dataset above contains data about all teams who attempted to qualify for the tournment. This analysis will only focus on those teams competing at the tournament so they are isolated.

In [None]:
#Isolate teams in competition
competing_countries = players_grouped['Team'].unique()
# Filter the quali_sorted dataset to include only rows with countries in competing_countries
quali_sorted_filtered = quali_sorted[quali_sorted['Team'].isin(competing_countries)]
# Print the filtered DataFrame
print(quali_sorted_filtered)

              Team  Q_GF  Q_GA  Q_GD  Q_PPG_Last_5 Q_Clean_Sheets  Q_xGF
0          Albania  12.0   4.0   8.0          1.88            50%   1.13
3          Austria  17.0   7.0  10.0          2.38            38%   1.91
6          Belgium  22.0   4.0  18.0          2.50            63%   1.73
9          Croatia  13.0   4.0   9.0          2.00            63%   2.23
11  Czech Republic  12.0   6.0   6.0          1.88            50%   1.83
12         Denmark  19.0  10.0   9.0          2.20            30%   1.90
13         England  22.0   4.0  18.0          2.50            50%   1.56
18          France  29.0   3.0  26.0          2.75            75%   2.57
19         Georgia  14.0  18.0  -4.0          1.20            30%   1.00
22         Hungary  16.0   7.0   9.0          2.25            38%   1.67
25           Italy  16.0   9.0   7.0          1.75            38%   1.66
35     Netherlands  17.0   7.0  10.0          2.25            63%   2.01
38          Poland  15.0  11.0   4.0          1.50 

In [None]:
quali_sorted_filtered

Unnamed: 0,Team,Q_GF,Q_GA,Q_GD,Q_PPG_Last_5,Q_Clean_Sheets,Q_xGF
0,Albania,12.0,4.0,8.0,1.88,50%,1.13
3,Austria,17.0,7.0,10.0,2.38,38%,1.91
6,Belgium,22.0,4.0,18.0,2.5,63%,1.73
9,Croatia,13.0,4.0,9.0,2.0,63%,2.23
11,Czech Republic,12.0,6.0,6.0,1.88,50%,1.83
12,Denmark,19.0,10.0,9.0,2.2,30%,1.9
13,England,22.0,4.0,18.0,2.5,50%,1.56
18,France,29.0,3.0,26.0,2.75,75%,2.57
19,Georgia,14.0,18.0,-4.0,1.2,30%,1.0
22,Hungary,16.0,7.0,9.0,2.25,38%,1.67


#Odds

Odds is the key, target feature in this analysis. It refers to the odds given to each country of winnning the tournament by British gambling company Bet365. Those with shorter odds are considered more likely to win the tournament.

In [None]:
#https://www.oddschecker.com/football/euro-2024/winner
#First column: BET365 used
odds = pd.read_csv('Odds.csv')
odds

Unnamed: 0,Team,Bet365
0,England,3.5/1
1,France,4.0/1
2,Germany,5.5/1
3,Portugal,6.5/1
4,Spain,8.0/1
5,Italy,14.0/1
6,Netherlands,16.0/1
7,Belgium,16.0/1
8,Croatia,40/1
9,Denmark,40/1


In [None]:
# Sort table alphabetically by Team
odds_sorted = odds.sort_values(by='Team')
# Reset index after dropping rows and rearranging order
odds_sorted.reset_index(drop=True, inplace=True)
odds_sorted

Unnamed: 0,Team,Bet365
0,Albania,500/1
1,Austria,66/1
2,Belgium,16.0/1
3,Croatia,40/1
4,Czech Republic,150/1
5,Denmark,40/1
6,England,3.5/1
7,France,4.0/1
8,Georgia,500/1
9,Germany,5.5/1


#Merge datasets

Once this data has been collected and cleaned partially, it is merged to create one large dataset used for subsequent analysis. Each of the above datasets have been compacted to contain one row per nation meaning they can be merged easily.

In [None]:
merged_df = df_sorted.merge(odds_sorted, on='Team')
merged_df

Unnamed: 0,Team,FIFA Rank,Bet365
0,Albania,66,500/1
1,Austria,25,66/1
2,Belgium,3,16.0/1
3,Croatia,10,40/1
4,Czech Republic,36,150/1
5,Denmark,21,40/1
6,England,4,3.5/1
7,France,2,4.0/1
8,Georgia,75,500/1
9,Germany,16,5.5/1


In [None]:
merged_df2 = merged_df.merge(managers_sorted, on='Team')
merged_df2

Unnamed: 0,Team,FIFA Rank,Bet365,Manager,Manager_Age,Installation,Contract until
0,Albania,66,500/1,Sylvinho,50,01 year 05 months 10 days,2024
1,Austria,25,66/1,Ralf Rangnick,65,02 years 11 days,2025
2,Belgium,3,16.0/1,Domenico Tedesco,38,01 year 04 months 04 days,2026
3,Croatia,10,40/1,Zlatko Dalic,57,06 years 08 months 05 days,2026
4,Czech Republic,36,150/1,Ivan Hasek,60,05 months 08 days,2025
5,Denmark,21,40/1,Kasper Hjulmand,52,03 years 10 months 11 days,2026
6,England,4,3.5/1,Gareth Southgate,53,07 years 08 months 15 days,2024
7,France,2,4.0/1,Didier Deschamps,55,11 years 11 months 04 days,2026
8,Georgia,75,500/1,Willy Sagnol,47,03 years 03 months 28 days,2024
9,Germany,16,5.5/1,Julian Nagelsmann,36,08 months 21 days,2026


Those countries not represented in the titles dataset have not won the Championships. Their record for this row is therefore 0.

In [None]:
merged_df3 = merged_df2.merge(titles, on='Team', how='outer')
# Fill missing values in the 'Titles' column with 0
merged_df3['Titles'] = merged_df3['Titles'].fillna(0)
merged_df3

Unnamed: 0,Team,FIFA Rank,Bet365,Manager,Manager_Age,Installation,Contract until,Titles
0,Albania,66,500/1,Sylvinho,50,01 year 05 months 10 days,2024,0.0
1,Austria,25,66/1,Ralf Rangnick,65,02 years 11 days,2025,0.0
2,Belgium,3,16.0/1,Domenico Tedesco,38,01 year 04 months 04 days,2026,0.0
3,Croatia,10,40/1,Zlatko Dalic,57,06 years 08 months 05 days,2026,0.0
4,Czech Republic,36,150/1,Ivan Hasek,60,05 months 08 days,2025,1.0
5,Denmark,21,40/1,Kasper Hjulmand,52,03 years 10 months 11 days,2026,1.0
6,England,4,3.5/1,Gareth Southgate,53,07 years 08 months 15 days,2024,0.0
7,France,2,4.0/1,Didier Deschamps,55,11 years 11 months 04 days,2026,2.0
8,Georgia,75,500/1,Willy Sagnol,47,03 years 03 months 28 days,2024,0.0
9,Germany,16,5.5/1,Julian Nagelsmann,36,08 months 21 days,2026,3.0


**Manager Contract until solution:** Upon further manual investigation Poland and Serbia's Manager contracts are both expiring in 2026, meaning this information can be passed into the dataset.

In [None]:
# Update contract end year for Poland and Serbia to 2026
merged_df3.loc[merged_df3['Team'] == 'Poland', 'Contract until'] = 2026
merged_df3.loc[merged_df3['Team'] == 'Serbia', 'Contract until'] = 2026
merged_df3

Unnamed: 0,Team,FIFA Rank,Bet365,Manager,Manager_Age,Installation,Contract until,Titles
0,Albania,66,500/1,Sylvinho,50,01 year 05 months 10 days,2024,0.0
1,Austria,25,66/1,Ralf Rangnick,65,02 years 11 days,2025,0.0
2,Belgium,3,16.0/1,Domenico Tedesco,38,01 year 04 months 04 days,2026,0.0
3,Croatia,10,40/1,Zlatko Dalic,57,06 years 08 months 05 days,2026,0.0
4,Czech Republic,36,150/1,Ivan Hasek,60,05 months 08 days,2025,1.0
5,Denmark,21,40/1,Kasper Hjulmand,52,03 years 10 months 11 days,2026,1.0
6,England,4,3.5/1,Gareth Southgate,53,07 years 08 months 15 days,2024,0.0
7,France,2,4.0/1,Didier Deschamps,55,11 years 11 months 04 days,2026,2.0
8,Georgia,75,500/1,Willy Sagnol,47,03 years 03 months 28 days,2024,0.0
9,Germany,16,5.5/1,Julian Nagelsmann,36,08 months 21 days,2026,3.0


Currently, the 'Installation' column is not in a format that will be recognised by regression models i.e. 02 years 10 months 03 days. This needs to be converted into a numeric format. This is done by converting each of these records into total months. This metric won't take into account the days but still gives a reasonably representative metric pertaining to manager's time in charge.

In [None]:
# Convert 'Installation' column to total months
merged_df3['Months_installed'] = 0

# Split the 'Installation' column into components and calculate total months
for index, row in merged_df3.iterrows():
    components = row['Installation'].split()
    total_months = 0

    i = 0
    while i < len(components):
        if components[i].isdigit():  # Check if the component is a number
            value = int(components[i])  # Convert the number to integer
            unit = components[i + 1]  # Get the next component as the unit
            if unit == 'year' or unit == 'years':
                total_months += value * 12  # Convert years to months
            elif unit == 'month' or unit == 'months':
                total_months += value  # Add months directly
            # Skip days by advancing the index
            i += 2
        else:
            # Skip days
            i += 1

    merged_df3.at[index, 'Months_installed'] = total_months

merged_df3

Unnamed: 0,Team,FIFA Rank,Bet365,Manager,Manager_Age,Installation,Contract until,Titles,Months_installed
0,Albania,66,500/1,Sylvinho,50,01 year 05 months 10 days,2024,0.0,17
1,Austria,25,66/1,Ralf Rangnick,65,02 years 11 days,2025,0.0,24
2,Belgium,3,16.0/1,Domenico Tedesco,38,01 year 04 months 04 days,2026,0.0,16
3,Croatia,10,40/1,Zlatko Dalic,57,06 years 08 months 05 days,2026,0.0,80
4,Czech Republic,36,150/1,Ivan Hasek,60,05 months 08 days,2025,1.0,5
5,Denmark,21,40/1,Kasper Hjulmand,52,03 years 10 months 11 days,2026,1.0,46
6,England,4,3.5/1,Gareth Southgate,53,07 years 08 months 15 days,2024,0.0,92
7,France,2,4.0/1,Didier Deschamps,55,11 years 11 months 04 days,2026,2.0,143
8,Georgia,75,500/1,Willy Sagnol,47,03 years 03 months 28 days,2024,0.0,39
9,Germany,16,5.5/1,Julian Nagelsmann,36,08 months 21 days,2026,3.0,8


Now that the 'Installation' has been converted to the numeric months installed column it can be dropped.

In [None]:
#Drop Installation column
merged_df3 = merged_df3.drop(columns=['Installation'])
merged_df3

Unnamed: 0,Team,FIFA Rank,Bet365,Manager,Manager_Age,Contract until,Titles,Months_installed
0,Albania,66,500/1,Sylvinho,50,2024,0.0,17
1,Austria,25,66/1,Ralf Rangnick,65,2025,0.0,24
2,Belgium,3,16.0/1,Domenico Tedesco,38,2026,0.0,16
3,Croatia,10,40/1,Zlatko Dalic,57,2026,0.0,80
4,Czech Republic,36,150/1,Ivan Hasek,60,2025,1.0,5
5,Denmark,21,40/1,Kasper Hjulmand,52,2026,1.0,46
6,England,4,3.5/1,Gareth Southgate,53,2024,0.0,92
7,France,2,4.0/1,Didier Deschamps,55,2026,2.0,143
8,Georgia,75,500/1,Willy Sagnol,47,2024,0.0,39
9,Germany,16,5.5/1,Julian Nagelsmann,36,2026,3.0,8


Currently, the 'Bet365' odds column is in an unlean format. Removing the demoninator, which is consistent across all records, will turn these records from fractions to numbers which is more interpretable.

In [None]:
# Replace '/1' with an empty string in the 'Odds' column
merged_df3['Bet365'] = merged_df3['Bet365'].str.replace('/1', '')
# Rename the 'Odds' column to 'odds_to_one'
merged_df3 = merged_df3.rename(columns={'Bet365': 'Odds_to_One'})

The players dataset referring to each nations playing squad is merged.

In [None]:
merged_df4 = merged_df3.merge(players_grouped, on='Team', how='outer')
merged_df4

Unnamed: 0,Team,FIFA Rank,Odds_to_One,Manager,Manager_Age,Contract until,Titles,Months_installed,Age,Height,Caps,Goals,MarketValue
0,Albania,66,500.0,Sylvinho,50,2024,0.0,17,27.307692,183.615385,26.115385,1.538462,4292308
1,Austria,25,66.0,Ralf Rangnick,65,2025,0.0,24,26.807692,183.192308,23.576923,3.576923,9057692
2,Belgium,3,16.0,Domenico Tedesco,38,2026,0.0,16,26.88,184.68,37.96,7.08,23380000
3,Croatia,10,40.0,Zlatko Dalic,57,2026,0.0,80,27.692308,184.115385,44.307692,5.653846,12603846
4,Czech Republic,36,150.0,Ivan Hasek,60,2025,1.0,5,25.307692,185.538462,15.576923,2.5,7457692
5,Denmark,21,40.0,Kasper Hjulmand,52,2026,1.0,46,27.692308,186.269231,41.192308,5.192308,15980769
6,England,4,3.5,Gareth Southgate,53,2024,0.0,92,26.076923,182.461538,25.038462,3.846154,58269231
7,France,2,4.0,Didier Deschamps,55,2026,2.0,143,26.88,184.44,33.44,7.68,49360000
8,Georgia,75,500.0,Willy Sagnol,47,2024,0.0,39,27.153846,184.5,28.846154,2.461538,6159615
9,Germany,16,5.5,Julian Nagelsmann,36,2026,3.0,8,28.115385,185.384615,34.846154,5.153846,32730769


In [None]:
merged_df4 = merged_df4.round(3)
merged_df4

Unnamed: 0,Team,FIFA Rank,Odds_to_One,Manager,Manager_Age,Contract until,Titles,Months_installed,Age,Height,Caps,Goals,MarketValue
0,Albania,66,500.0,Sylvinho,50,2024,0.0,17,27.308,183.615,26.115,1.538,4292308
1,Austria,25,66.0,Ralf Rangnick,65,2025,0.0,24,26.808,183.192,23.577,3.577,9057692
2,Belgium,3,16.0,Domenico Tedesco,38,2026,0.0,16,26.88,184.68,37.96,7.08,23380000
3,Croatia,10,40.0,Zlatko Dalic,57,2026,0.0,80,27.692,184.115,44.308,5.654,12603846
4,Czech Republic,36,150.0,Ivan Hasek,60,2025,1.0,5,25.308,185.538,15.577,2.5,7457692
5,Denmark,21,40.0,Kasper Hjulmand,52,2026,1.0,46,27.692,186.269,41.192,5.192,15980769
6,England,4,3.5,Gareth Southgate,53,2024,0.0,92,26.077,182.462,25.038,3.846,58269231
7,France,2,4.0,Didier Deschamps,55,2026,2.0,143,26.88,184.44,33.44,7.68,49360000
8,Georgia,75,500.0,Willy Sagnol,47,2024,0.0,39,27.154,184.5,28.846,2.462,6159615
9,Germany,16,5.5,Julian Nagelsmann,36,2026,3.0,8,28.115,185.385,34.846,5.154,32730769


Each teams win percentage since the last tournament is merged to the dataframe. This value represents their ability to win games over the last 3 years (2021-2024).

In [None]:
merged_df5 = merged_df4.merge(teams_stats, on='Team', how='outer')
merged_df5

Unnamed: 0,Team,FIFA Rank,Odds_to_One,Manager,Manager_Age,Contract until,Titles,Months_installed,Age,Height,Caps,Goals,MarketValue,Win Percentage
0,Albania,66,500.0,Sylvinho,50,2024,0.0,17,27.308,183.615,26.115,1.538,4292308,38.709677
1,Austria,25,66.0,Ralf Rangnick,65,2025,0.0,24,26.808,183.192,23.577,3.577,9057692,54.83871
2,Belgium,3,16.0,Domenico Tedesco,38,2026,0.0,16,26.88,184.68,37.96,7.08,23380000,57.575758
3,Croatia,10,40.0,Zlatko Dalic,57,2026,0.0,80,27.692,184.115,44.308,5.654,12603846,59.459459
4,Czech Republic,36,150.0,Ivan Hasek,60,2025,1.0,5,25.308,185.538,15.577,2.5,7457692,48.387097
5,Denmark,21,40.0,Kasper Hjulmand,52,2026,1.0,46,27.692,186.269,41.192,5.192,15980769,65.625
6,England,4,3.5,Gareth Southgate,53,2024,0.0,92,26.077,182.462,25.038,3.846,58269231,55.882353
7,France,2,4.0,Didier Deschamps,55,2026,2.0,143,26.88,184.44,33.44,7.68,49360000,63.888889
8,Georgia,75,500.0,Willy Sagnol,47,2024,0.0,39,27.154,184.5,28.846,2.462,6159615,51.724138
9,Germany,16,5.5,Julian Nagelsmann,36,2026,3.0,8,28.115,185.385,34.846,5.154,32730769,50.0


Each teams qualifying record is added to the dataframe. This data represent each teams performance in qualifying in detail.

In [None]:
merged_df6 = merged_df5.merge(quali_sorted_filtered, on='Team', how='outer')
merged_df6

Unnamed: 0,Team,FIFA Rank,Odds_to_One,Manager,Manager_Age,Contract until,Titles,Months_installed,Age,Height,Caps,Goals,MarketValue,Win Percentage,Q_GF,Q_GA,Q_GD,Q_PPG_Last_5,Q_Clean_Sheets,Q_xGF
0,Albania,66,500.0,Sylvinho,50,2024,0.0,17,27.308,183.615,26.115,1.538,4292308,38.709677,12.0,4.0,8.0,1.88,50%,1.13
1,Austria,25,66.0,Ralf Rangnick,65,2025,0.0,24,26.808,183.192,23.577,3.577,9057692,54.83871,17.0,7.0,10.0,2.38,38%,1.91
2,Belgium,3,16.0,Domenico Tedesco,38,2026,0.0,16,26.88,184.68,37.96,7.08,23380000,57.575758,22.0,4.0,18.0,2.5,63%,1.73
3,Croatia,10,40.0,Zlatko Dalic,57,2026,0.0,80,27.692,184.115,44.308,5.654,12603846,59.459459,13.0,4.0,9.0,2.0,63%,2.23
4,Czech Republic,36,150.0,Ivan Hasek,60,2025,1.0,5,25.308,185.538,15.577,2.5,7457692,48.387097,12.0,6.0,6.0,1.88,50%,1.83
5,Denmark,21,40.0,Kasper Hjulmand,52,2026,1.0,46,27.692,186.269,41.192,5.192,15980769,65.625,19.0,10.0,9.0,2.2,30%,1.9
6,England,4,3.5,Gareth Southgate,53,2024,0.0,92,26.077,182.462,25.038,3.846,58269231,55.882353,22.0,4.0,18.0,2.5,50%,1.56
7,France,2,4.0,Didier Deschamps,55,2026,2.0,143,26.88,184.44,33.44,7.68,49360000,63.888889,29.0,3.0,26.0,2.75,75%,2.57
8,Georgia,75,500.0,Willy Sagnol,47,2024,0.0,39,27.154,184.5,28.846,2.462,6159615,51.724138,14.0,18.0,-4.0,1.2,30%,1.0
9,Germany,16,5.5,Julian Nagelsmann,36,2026,3.0,8,28.115,185.385,34.846,5.154,32730769,50.0,,,,,,


#Final dataset

Once data from all sources has been collected it is ready for further exploration and analysis.

To summarise the new dataset has information regarding:



1.   Nation (Team)
2.   Nation Ranking (FIFA Rank)
3.   Tournament Odds (Odds_to_One)
4.   Manager Statistics (Manager, Manager_Age, Contract until, Months_installed)
5.   Nation Tournament Wins (Titles)
6.   Playing Squad Data (Age, Height, Caps, Goals, MarketValue, Win Percentage)
7.   Qualifying Team Data (Q_GF, Q_GA, Q_PPG_Last_5, Q_Clean_Sheets, Q_xGF)





In [None]:
final_data = merged_df6
final_data

Unnamed: 0,Team,FIFA Rank,Odds_to_One,Manager,Manager_Age,Contract until,Titles,Months_installed,Age,Height,Caps,Goals,MarketValue,Win Percentage,Q_GF,Q_GA,Q_GD,Q_PPG_Last_5,Q_Clean_Sheets,Q_xGF
0,Albania,66,500.0,Sylvinho,50,2024,0.0,17,27.308,183.615,26.115,1.538,4292308,38.709677,12.0,4.0,8.0,1.88,50%,1.13
1,Austria,25,66.0,Ralf Rangnick,65,2025,0.0,24,26.808,183.192,23.577,3.577,9057692,54.83871,17.0,7.0,10.0,2.38,38%,1.91
2,Belgium,3,16.0,Domenico Tedesco,38,2026,0.0,16,26.88,184.68,37.96,7.08,23380000,57.575758,22.0,4.0,18.0,2.5,63%,1.73
3,Croatia,10,40.0,Zlatko Dalic,57,2026,0.0,80,27.692,184.115,44.308,5.654,12603846,59.459459,13.0,4.0,9.0,2.0,63%,2.23
4,Czech Republic,36,150.0,Ivan Hasek,60,2025,1.0,5,25.308,185.538,15.577,2.5,7457692,48.387097,12.0,6.0,6.0,1.88,50%,1.83
5,Denmark,21,40.0,Kasper Hjulmand,52,2026,1.0,46,27.692,186.269,41.192,5.192,15980769,65.625,19.0,10.0,9.0,2.2,30%,1.9
6,England,4,3.5,Gareth Southgate,53,2024,0.0,92,26.077,182.462,25.038,3.846,58269231,55.882353,22.0,4.0,18.0,2.5,50%,1.56
7,France,2,4.0,Didier Deschamps,55,2026,2.0,143,26.88,184.44,33.44,7.68,49360000,63.888889,29.0,3.0,26.0,2.75,75%,2.57
8,Georgia,75,500.0,Willy Sagnol,47,2024,0.0,39,27.154,184.5,28.846,2.462,6159615,51.724138,14.0,18.0,-4.0,1.2,30%,1.0
9,Germany,16,5.5,Julian Nagelsmann,36,2026,3.0,8,28.115,185.385,34.846,5.154,32730769,50.0,,,,,,


Germany has NaNs refeerring to their qualifying record because they are the host nation and therefore do not have to qualify

In [None]:
final_data.to_csv('euros_data.csv', index=False)