## Hi There!

This is just a personal project that I worked on. Was keen on doing some data analysis using pythong and them attempting to predict the Euro 2024 scores.
My Goals:
1. Predict at least 50% of the group stage winners
2. Have the predicted winner make it to atleast the QF

### Extracting tables from the upcoming Euros

In [84]:
#Libraries
import pandas as pd
from string import ascii_uppercase as alphabet
import pickle
from bs4 import BeautifulSoup
import requests

#Import the wikipedia page:
all_tables = pd.read_html('https://en.wikipedia.org/wiki/UEFA_Euro_2024')

#Understanding the correct tables:
all_tables[18]
all_tables[25]
all_tables[32]
all_tables[39]
all_tables[46]
all_tables[53]



Unnamed: 0,Pos,Teamvte,Pld,W,D,L,GF,GA,GD,Pts,Qualification
0,1,Turkey,0,0,0,0,0,0,0,0,Advance to knockout stage
1,2,Georgia,0,0,0,0,0,0,0,0,Advance to knockout stage
2,3,Portugal,0,0,0,0,0,0,0,0,Possible knockout stage based on ranking
3,4,Czech Republic,0,0,0,0,0,0,0,0,


In [85]:
#Create dictionary + assign the group letters to the groups
dict_table = {}
for letter, i in zip(alphabet, range(18,60, 7)):
    df = all_tables[i]
    #Rename from "Teamvte" to Team
    df.rename(columns={df.columns[1]: 'Team'}, inplace=True)
    #Remove the qual column from the group
    df.pop("Qualification")
    dict_table[f'Group {letter}'] = df

In [86]:
dict_table['Group F']

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts
0,1,Turkey,0,0,0,0,0,0,0,0
1,2,Georgia,0,0,0,0,0,0,0,0
2,3,Portugal,0,0,0,0,0,0,0,0
3,4,Czech Republic,0,0,0,0,0,0,0,0


In [87]:
#Use pickle to export the dictionary
with open('dict_table', 'wb') as output:
    pickle.dump(dict_table, output)

### Extracting Football Matches

In [119]:
years = [1960, 1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012, 2016, 2020]

def get_matches(year):
    web = f'https://en.wikipedia.org/wiki/UEFA_Euro_{year}'
    response = requests.get(web)
    content = response.text
    soup = BeautifulSoup(content, 'lxml')

    matches = soup.find_all('div', class_= 'footballbox')

    home = []
    score = []
    away = []


    for match in matches:
        home.append(match.find('th', class_='fhome').get_text())
        score.append(match.find('th', class_='fscore').get_text())
        away.append(match.find('th', class_='faway').get_text())

    dict_football = {'home': home, 'score':score, 'away': away}
    df_football = pd.DataFrame(dict_football)
    df_football['year'] = year
    return df_football

In [116]:
euro = [get_matches(year) for year in years]
df_euro = pd.concat(euro, ignore_index=True)
df_euro.to_csv('euro_historical_data.csv', index=False)

In [118]:
#Fixtures for 2024
df_fixture = get_matches(2024)
df_fixture.to_csv('euro_2024_fixtures.csv', index=False)

### Data Cleaning and Transformation

In [120]:
df_historical_data = pd.read_csv('euro_historical_data.csv')
df_fixture = pd.read_csv('euro_2024_fixtures.csv')

In [None]:
#Cleaning
df_fixture['home'] = df_fixture['home'].str.strip()
df_fixture['away'] = df_fixture['away'].str.strip()

df_historical_data['home'] = df_historical_data['home'].str.strip()
df_historical_data['away'] = df_historical_data['away'].str.strip()

#getting rid of the (a.e.t)
df_historical_data['score'] = df_historical_data['score'].str.replace('[^\d–]', '', regex=True)

In [None]:
#Cleaning the scores and giving them to home and away
df_historical_data[['HomeGoal', 'AwayGoal']] = df_historical_data['score'].str.split('–', expand=True)

In [None]:
df_historical_data.drop('score', axis=1, inplace=True)

In [166]:
df_historical_data.drop('HomeGoal', axis=1, inplace=True )
df_historical_data.drop('AwayGoal', axis=1, inplace=True )

In [173]:
#Renaming Columns + Converting goals from object to int
df_historical_data.rename(columns={'home': 'Home Team', 'away': 'Away Team', 'year': 'Year'}, inplace=True)
df_historical_data = df_historical_data.astype({'Home Goal': int, 'Away Goal': int, 'Year': int})

In [176]:
#Creating new column for total goals
df_historical_data['Total Goals'] =df_historical_data['Home Goal'] + df_historical_data['Away Goal']

### Exporting the Cleaned DF

In [178]:
df_historical_data.to_csv('cleaned_euro_historical_data.csv', index=False)

In [179]:
df_fixture.to_csv('cleaned_euro_2024_fixtures.csv', index=False)

### Time to build the model