## Modeling with a Dataframe

After webscraping all of the Euros data and formatting into a dataframe then csv in my [euros_scraping.ipynb](http://localhost:8888/files/Documents/Coding/Notebooks/repos/euros-WebScraping/euros_scraping.ipynb?_xsrf=2%7C8c3fecd8%7C02d37b9b4c7c7af0dd7bca94af94aa7f%7C1716421226) notebook, I will now attempt to perform Modeling and predictions

THe hope is I scraped the relevant data, having all results recorded from Euro's and Euro's qualifiers into the csv.

In this notebook, I will use pandas as the main method of modeling this data.

In [111]:
import pandas as pd

In [112]:
matches = pd.read_csv("euros1960_2024q.csv", index_col=0)
matches["mDate"] = pd.to_datetime(matches["mDate"], format="%d.%m.%y")
matches.loc[matches["mDate"].dt.year > 2024, "mDate"] -= pd.DateOffset(years=100) 

# necesssary to offset pre-2000 mDates being interpreted as 20%yy (10.05.59 as 1959)

In [113]:
matches["hTeam"].value_counts().head()

hTeam
Italy       84
Germany     84
Spain       83
Denmark     81
Portugal    80
Name: count, dtype: int64

In [114]:
matches["aTeam"].value_counts().head()

aTeam
Spain          88
Italy          79
Netherlands    77
Denmark        76
Portugal       76
Name: count, dtype: int64

### Next Steps 

Currently, a single National team's data is split between home and away sections. For example, when we scraped the page data, sometimes Italy was listed as the left team and sometimes it was listed as the right team.

For brevity, I selected to set Italy as the home team whenever it appeared on the left and designated it as the away team when it appeared on the right.

It will be better to reorganize the data, coalescing all of Italy's matches together, re-stiching the data from

> hTeam --> team <BR>
> hGoals --> gScored <BR>
> hPenalties --> pScored <BR>
> hResult --> result <BR>
> aTeam --> opponent <BR>
> aGoals --> gAllowed <BR>
> aPenalties --> pAllowed <BR>

This will be difficult and require duplicate row data <br>
(i.e Italy as home team, Denmark as away, recording same match data for both in their own "team" as opposite's "opponent" so we can focus on a Nation's individual performance)

Instead of manipulating our current dataframe, it may be best to update into a new dataframe

In [115]:
# comp | mType | team | opponent | result | goals | gAllowed | penalties | pAllowed | mDate
mInfo = {"team": ["filler"], "opponent": ["fller"],
        "result": ['f'], "goals": [1.1], "gAllowed": [1.1], 
        "penalties": [1.1], "pAllowed": [1.1], "mDate": [pd.NA], 
        "SoC": ["filler"], 'mType': ["filler"], 'comp': ["filler"],}
cMatches = pd.DataFrame(data=mInfo)
cMatches[["mDate"]] = cMatches[["mDate"]].astype("datetime64[ns]")
cMatches = cMatches.drop(index=0)
cMatches

Unnamed: 0,team,opponent,result,goals,gAllowed,penalties,pAllowed,mDate,SoC,mType,comp


In [116]:
mType_corrections = { "Group Stage" : "Group Stage",             
                    "Quarter-finals": "Quarter-finals",       
                    "Round of 16": "Round of 16",                
                    "Semi-finals": "Semi-finals",                
                    "Qualifying Round": "Qualifying Round",
                    "Third place play-off": "Third place play-off",      
                    "Finals": "Final",                     
                    "Quater-finals": "Quarter-finals",   
                    "Final":"Final",     
                    "Head to head round": "Head-To-Head",                             
                    "Head to Head": "Head-To-Head"}

SoC_corrections = { "Group Stage" : "Group Stage",             
                    "Quarter-finals": "Knockout Round",       
                    "Round of 16": "Knockout Round",                
                    "Semi-finals": "Knockout Round",                
                    "Qualifying Round": "Knockout Round",
                    "Third place play-off": "Knockout Round",      
                    "Finals": "Knockout Round",                     
                    "Quater-finals": "Knockout Round",   
                    "Final":"Knockout Round",     
                    "Head to head round": "Head-To-Head",                             
                    "Head to Head": "Head-To-Head"}          
           
for index, row in matches.iterrows():
    mType = mType_corrections[row["mType"]] 
    SoC = SoC_corrections[row["mType"]] 
    cMatches.loc[-1] = [row["hTeam"], row["aTeam"], row["hResult"], row["hGoals"], row["aGoals"], row["hPenalties"], row["aPenalties"], row["mDate"], SoC, mType, row["comp"], ]
    cMatches.index += 1
    cMatches.loc[-1] = [row["aTeam"], row["hTeam"], row["aResult"], row["aGoals"], row["hGoals"], row["aPenalties"], row["hPenalties"], row["mDate"], SoC, mType, row["comp"]]
    cMatches.index += 1
cMatches.index = list(range(len(cMatches["team"])))

In [117]:
cMatches.head()

Unnamed: 0,team,opponent,result,goals,gAllowed,penalties,pAllowed,mDate,SoC,mType,comp
0,Georgia,Greece,W,0,0,4.0,2.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers"
1,Greece,Georgia,L,0,0,2.0,4.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers"
2,Wales,Poland,L,0,0,4.0,5.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers"
3,Poland,Wales,W,0,0,5.0,4.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers"
4,Ukraine,Iceland,W,2,1,,,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers"


In [118]:
cMatches.tail()

Unnamed: 0,team,opponent,result,goals,gAllowed,penalties,pAllowed,mDate,SoC,mType,comp
6089,Denmark,ČSSR,L,1,5,,,1959-10-18,Knockout Round,Round of 16,Euros 1960
6090,Ireland,ČSSR,W,2,0,,,1959-04-05,Knockout Round,Qualifying Round,Euros 1960
6091,ČSSR,Ireland,L,0,2,,,1959-04-05,Knockout Round,Qualifying Round,Euros 1960
6092,ČSSR,Ireland,W,4,0,,,1959-05-10,Knockout Round,Qualifying Round,Euros 1960
6093,Ireland,ČSSR,L,0,4,,,1959-05-10,Knockout Round,Qualifying Round,Euros 1960


In [119]:
cMatches["mType"].value_counts()

mType
Group Stage             5590
Quarter-finals           136
Round of 16               96
Semi-finals               88
Head-To-Head              70
Qualifying Round          54
Final                     48
Third place play-off      12
Name: count, dtype: int64

In [120]:
cMatches["SoC"].value_counts()

SoC
Group Stage       5590
Knockout Round     434
Head-To-Head        70
Name: count, dtype: int64

## Setting category predictors

Below, we are adding additional columns as metrics for predicting a match. A team may perform differently based on the match type of the competition and their competitor 

The target column is if a team achieved its desired result, 

> a "W" was desired, equating to a 3 <br>
> a "D" equating to a 1 <br>
> a "L" equating to a 0

In [121]:
cMatches["team"] = cMatches["team"].astype("category")
cMatches["opponent"] = cMatches["opponent"].astype("category")
cMatches["SoC"] = cMatches["SoC"].astype("category")
cMatches["mType"] = cMatches["mType"].astype("category")

dT_keys = cMatches["team"].to_list() 
dO_keys = cMatches["opponent"].to_list()
dS_keys = cMatches["SoC"].to_list()
dM_keys = cMatches["mType"].to_list()

dT_values = cMatches["team"].cat.codes
dO_values = cMatches["opponent"].cat.codes
dS_values = cMatches["SoC"].cat.codes
dM_values = cMatches["mType"].cat.codes

d_Teams = dict(zip(dT_keys, dT_values)) # functionally same dict as d_opp, maybe overlap 
d_Opp = dict(zip(dO_keys, dO_values)) 
d_SoC = dict(zip(dS_keys, dS_values))
d_Matches = dict(zip(dM_keys, dM_values))

cMatches["tCode"] = cMatches["team"].cat.codes
cMatches["oppCode"] = cMatches["opponent"].cat.codes
cMatches["socCode"] = cMatches["SoC"].cat.codes
cMatches["mCode"] = cMatches["mType"].cat.codes

In [122]:
cMatches.loc[(cMatches["team"] == "Georgia") | (cMatches["opponent"] == "Georgia")].head()
    # dataframe.loc with multiple conditionals prefers 
    # () to isolate each coniditional and 
    # uses & for conditional and
    # uses | for conditional or 

Unnamed: 0,team,opponent,result,goals,gAllowed,penalties,pAllowed,mDate,SoC,mType,comp,tCode,oppCode,socCode,mCode
0,Georgia,Greece,W,0,0,4.0,2.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers",19,22,2,0
1,Greece,Georgia,L,0,0,2.0,4.0,2024-03-26,Knockout Round,Final,"Euros 2024, Qualifiers",22,19,2,0
6,Georgia,Luxembourg,W,2,0,,,2024-03-21,Knockout Round,Semi-finals,"Euros 2024, Qualifiers",19,33,2,6
7,Luxembourg,Georgia,L,0,2,,,2024-03-21,Knockout Round,Semi-finals,"Euros 2024, Qualifiers",33,19,2,6
22,Georgia,Norway,D,1,1,,,2023-03-28,Group Stage,Group Stage,"Euros 2024, Qualifiers",19,40,0,1


In [123]:
points = {"W": 3, "D": 1, "L": 0} 
pointsR = {3: "W", 1: "D", 0: "L"}

cMatches["target"] = [ points[row["result"]] for index, row in cMatches.iterrows()]
cMatches["result"].value_counts()

result
W    2460
L    2460
D    1174
Name: count, dtype: int64

## Model 
With a RandomForestClassifer, we can run simulations with our category codes non-linearity,

We then split from a "trained" and "test" dataframe 

> A Train dataframe to . . . train our rf model on our selected predictors (opponent and match type) to see what the expected result (target) would be in our test model <br>
> A Test data frame to see how our predictions actually compare with that of the outcome 

In [124]:
from sklearn.ensemble import RandomForestClassifier

# associates non-linearity 
# i.e oppCode 22 doesn't imply numerical difference between oppCode 23
# just categorical differences

rf = RandomForestClassifier(n_estimators=100, min_samples_split=10, random_state=42)

train_set = cMatches[cMatches["mDate"] < "2000-01-01"] 
test_set = cMatches[cMatches["mDate"] > "2000-01-01"]

predictors = ["tCode", "oppCode", "socCode", "mCode"]

rf.fit(train_set[predictors], train_set["target"])
# runs our forest model on known oppCode mCode combos, then we provide the result that came out of it "target"

preds = rf.predict(test_set[predictors])
# then with our test set, we produce a precition column on what result would be given oppCode and mCode

## Comparisons

With our trained rf, we then create predictions based on the same metrics for the test data set

Using the accuracy_score, we're able to see how often our trained model guessed correctlty

In [125]:
from sklearn.metrics import accuracy_score

# measures our predictions 
acc = accuracy_score(test_set["target"], preds)
acc

# compares the likeness of each prediction to the actual result (target) and reports our accuracy

0.5600797266514806

In [126]:
combined = pd.DataFrame(dict(actual=test_set["target"], prediction=preds))
pd.crosstab(index=combined["actual"], columns=combined["prediction"])

prediction,0,1,3
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,956,133,358
1,274,72,272
3,352,156,939


In [127]:
from sklearn.metrics import precision_score

"""
Precision Score then defined as 

    ps = tp / (tp + fp) 

    tp = true positive 
    fp = false positive 

Essentially reports the score "how often did we correctly call a winning result
"""

precision_score(test_set["target"], preds, average=None)

array([0.60429836, 0.19944598, 0.59847036])

## Predictions 

With our model, we will now predict the results of Group Stage Fixtures with our training model and move on from there until a winner is decided

Here's what happening below 

1. Web-scraping for all the Group Stages in the Euros 2024
2. Isolating to just the fixtures
3. Cleaning up given fixture table into our new tables
4. Repeating the data twice per table (with swapped team and opponent)

Now en_tables should be an array of our tables cleaned up with doubled match data


In [128]:
import requests
import io 
from bs4 import BeautifulSoup

e_2024 = "https://web.archive.org/web/20240516182458/https://terrikon.com/en/euro-2024"
e_2024_page = requests.get(e_2024)
e_html = e_2024_page.text

e_tables = pd.read_html(io.StringIO(e_html))
group_tables = [table for table in e_tables if table.shape[0] == 4]
group_tables = [table.drop(["Unnamed: 0", "G", "S", "M"], axis=1) for table in group_tables]
group_tables = [table.rename(columns={"Unnamed: 1": "Team"}) for table in group_tables]
lineup_tables = [table for table in e_tables if table.shape[0] == 6]
en_tables = [] 

mInfo = {"team": ["filler"], "opponent": ["fller"],  "mDate": [pd.NA], 
        'mType': ["filler"], "SoC": ["filler"], 'comp': ["filler"], "tCode": "filler", "oppCode": ["filler"], "mCode": ["filler"], "socCode": ["filler"]}
for table in lineup_tables:
    f_table = pd.DataFrame(data=mInfo)
    f_table = f_table.drop(index=0)
    table[5] = table[5].apply(lambda x: x.split(" ")[0])
    table[5] = pd.to_datetime(table[5], format="%d.%m.%y")
    for index, row in table.iterrows():
        f_table.loc[-1] = [row[1], row[3], row[5], "Group Stage", "Group Stage"," Euros 2024", d_Teams[row[1]], d_Opp[row[3]], d_Matches["Group Stage"], d_SoC["Group Stage"]]
        f_table.index += 1
        f_table.loc[-1] = [row[3], row[1], row[5], "Group Stage", "Group Stage", "Euros 2024", d_Teams[row[3]], d_Opp[row[1]], d_Matches["Group Stage"], d_SoC["Group Stage"]]
        f_table.index += 1
    f_table.index = range(f_table.shape[0])
    en_tables.append(f_table)

## Predicting Group Stage 

Per table, we are now going to produce a prediction column to see who the result of the match.

Should the same-matchup produce different results, we will chalk it up as a tie



In [129]:
for table in group_tables:
  table["W"] = [0 for num in range(table.shape[0])]
  table["D"] = [0 for num in range(table.shape[0])]
  table["L"] = [0 for num in range(table.shape[0])] 
  table["Pts"] = [0 for num in range(table.shape[0])]

# needed as matches are being played

In [130]:
group_tables[0]

Unnamed: 0,Team,W,D,L,-,Pts
0,Switzerland,0,0,0,-,0
1,Hungary,0,0,0,-,0
2,Scotland,0,0,0,-,0
3,Germany,0,0,0,-,0


In [131]:
en_tables[0]

Unnamed: 0,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode
0,Germany,Scotland,2024-06-14,Group Stage,Group Stage,Euros 2024,20,46,1,0
1,Scotland,Germany,2024-06-14,Group Stage,Group Stage,Euros 2024,46,20,1,0
2,Hungary,Switzerland,2024-06-15,Group Stage,Group Stage,Euros 2024,23,52,1,0
3,Switzerland,Hungary,2024-06-15,Group Stage,Group Stage,Euros 2024,52,23,1,0
4,Germany,Hungary,2024-06-19,Group Stage,Group Stage,Euros 2024,20,23,1,0
5,Hungary,Germany,2024-06-19,Group Stage,Group Stage,Euros 2024,23,20,1,0
6,Scotland,Switzerland,2024-06-19,Group Stage,Group Stage,Euros 2024,46,52,1,0
7,Switzerland,Scotland,2024-06-19,Group Stage,Group Stage,Euros 2024,52,46,1,0
8,Switzerland,Germany,2024-06-23,Group Stage,Group Stage,Euros 2024,52,20,1,0
9,Germany,Switzerland,2024-06-23,Group Stage,Group Stage,Euros 2024,20,52,1,0


### Making a prediction 

In [132]:
def run_prediction(uMatches, kMatches, metrics): 
    comp_set = kMatches[cMatches["mDate"] > "1980-01-01"] 
    training_set = kMatches # add conditional here if desired to select the training set
    forest = RandomForestClassifier(n_estimators=100, min_samples_split=10, random_state=42) 

    forest.fit(training_set[metrics], training_set["target"])

    cPreds = forest.predict(comp_set[metrics])
    uPreds = forest.predict(uMatches[metrics])

    ps = precision_score(comp_set["target"], cPreds, average=None)
    ac = accuracy_score(comp_set["target"], cPreds)
    
    return uPreds, ps, ac  

def decide_result(resultA, resultB): 
    match resultA:
        case "W":
            match resultB:
                case "W": return "D", "D"
                case "D": return "W", "L"
                case "L": return "W", "L"
        case "D":
            match resultB:
                case "W": return "L", "W"
                case "D": return "D", "D"
                case "L": return "W", "L"
        case "L":
            match resultB:
                case "W": return "L", "W"
                case "D": return "L", "W"
                case "L": return "D", "D"

In [133]:
for g, group in enumerate(en_tables): 
    preds, confidence, acc = run_prediction(group, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
    group["prediction"] = [ pointsR[pred] for pred in preds]
    print(preds, confidence, acc)
    corrected_results = []
    for index in range(group.shape[0]):
        if index % 2 == 0: 
            team_a = group.loc[index]["team"]
            team_b = group.loc[index]["opponent"]
            result_a = group.loc[index]["prediction"] 
            result_b = group.loc[index+1]["prediction"]
            result, result_opp = decide_result(result_a, result_b)

            corrected_results.append(result)
            corrected_results.append(result_opp)

            i_a= group_tables[g].loc[group_tables[g]["Team"] == team_a].index
            i_b= group_tables[g].loc[group_tables[g]["Team"] == team_b].index
            
            group_tables[g].loc[i_a, result] += 1
            group_tables[g].loc[i_a, "Pts"] += points[result]

            group_tables[g].loc[i_b, result_opp] += 1
            group_tables[g].loc[i_b, "Pts"] += points[result_opp]
    group["correctedPred"] = corrected_results
    group_tables[g] = group_tables[g].sort_values(by="Pts", ascending=False)

[3 0 0 3 1 1 0 3 3 3 0 3] [0.74037213 0.62586605 0.73758562] 0.7293307086614174
[3 3 3 0 1 1 1 1 0 3 1 1] [0.74037213 0.62586605 0.73758562] 0.7293307086614174
[0 3 0 3 3 0 0 3 3 0 3 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174
[3 0 0 3 3 0 3 0 3 0 1 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174
[3 1 3 3 1 3 0 3 0 3 3 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174
[3 0 3 0 3 0 0 3 0 3 0 3] [0.74037213 0.62586605 0.73758562] 0.7293307086614174


In [134]:
for table in group_tables:
    table.index = range(table.shape[0])

In [135]:
group_tables[0]

Unnamed: 0,Team,W,D,L,-,Pts
0,Switzerland,2,1,0,-,7
1,Germany,1,2,0,-,5
2,Hungary,1,1,1,-,4
3,Scotland,0,0,3,-,0


In [136]:
en_tables[0]

Unnamed: 0,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode,prediction,correctedPred
0,Germany,Scotland,2024-06-14,Group Stage,Group Stage,Euros 2024,20,46,1,0,W,W
1,Scotland,Germany,2024-06-14,Group Stage,Group Stage,Euros 2024,46,20,1,0,L,L
2,Hungary,Switzerland,2024-06-15,Group Stage,Group Stage,Euros 2024,23,52,1,0,L,L
3,Switzerland,Hungary,2024-06-15,Group Stage,Group Stage,Euros 2024,52,23,1,0,W,W
4,Germany,Hungary,2024-06-19,Group Stage,Group Stage,Euros 2024,20,23,1,0,D,D
5,Hungary,Germany,2024-06-19,Group Stage,Group Stage,Euros 2024,23,20,1,0,D,D
6,Scotland,Switzerland,2024-06-19,Group Stage,Group Stage,Euros 2024,46,52,1,0,L,L
7,Switzerland,Scotland,2024-06-19,Group Stage,Group Stage,Euros 2024,52,46,1,0,W,W
8,Switzerland,Germany,2024-06-23,Group Stage,Group Stage,Euros 2024,52,20,1,0,W,D
9,Germany,Switzerland,2024-06-23,Group Stage,Group Stage,Euros 2024,20,52,1,0,W,D


In [137]:
groups = ["A", "B", "C", "D", "E", "F"] 

In [138]:
for t in range(len(group_tables)): 
    group_tables[t].to_csv(f"./predictions/group_preds/group_{groups[t]}.csv")

## Predicting Knockout Stage

The prediction algorithm is all set, the next steps is formatting a table for the rest of the tournament with the knockout stage, including 

> Round of 16 <br>
> Quarter-Finals <br>
> Semi-Finals <br>
> Finals

Luckily, there is a posted [bracket page](https://www.uefa.com/euro2024/fixtures-results/bracket/) so we can organize the matches by team better closer to what the tournmanent may fold out

BUT Eufa doesn't like page requests from scripts, so I will be using the correspodning [Wiki page](https://en.wikipedia.org/wiki/UEFA_Euro_2024#Knockout_stage) with the same info


In [139]:
import requests
wiki24_url = "https://web.archive.org/web/20240613021538/https://en.wikipedia.org/wiki/UEFA_Euro_2024#Knockout_stage"
wiki24_page = requests.get(wiki24_url)
wiki24_soup = BeautifulSoup(wiki24_page.text)
wiki24_tables = pd.read_html(io.StringIO(wiki24_page.text))

In [140]:
ko_lineups = wiki24_soup.find_all("div", class_="footballbox")

ro16_lineups = ko_lineups[36:44]
qf_lineups = ko_lineups[44:48]
sf_lineups = ko_lineups[48:50]
f_lineup = ko_lineups[50]

In [141]:
ro16_matches = pd.DataFrame(data=mInfo)
ro16_matches.insert(0, "mNumber", [0])
ro16_matches = ro16_matches.drop(index=0)

In [142]:
sel_plac = {"Winner": 0, "Runner-Up": 1, "3rd": 2} 
    # for indexing the group 

In [143]:
import random
third_round_selects = [] 
for lineup in ro16_lineups: 
    mDate = lineup.find("div", class_="fdate").find("span", class_="bday").text
    parA = lineup.find("th", class_="fhome").span.text.split(" ")
    mNum = int(lineup.find("th", class_="fscore").a.text[-2:])
    parB = lineup.find("th", class_="faway").span.text.split(" ")

    teamA = "err"
    teamB = "err"
    
    if "Winner" in parA:
        teamA = group_tables[groups.index(parA[2])].loc[0, "Team"]  
    elif "Runner-up" in parA:
        teamA = group_tables[groups.index(parA[2])].loc[1, "Team"]
        
    if "Winner" in parB:
        teamB = group_tables[groups.index(parB[2])].loc[0, "Team"]  
    elif "Runner-up" in parB:
        teamB = group_tables[groups.index(parB[2])].loc[1, "Team"]  
    elif "3rd" in parB:
        options = parB[2].split("/")
        random.shuffle(options)
        for option in options: 
            holder = group_tables[groups.index(option)].loc[2, "Team"]
            if holder not in third_round_selects:
                teamB = holder
                third_round_selects.append(teamB)
                break

    ro16_matches.loc[-1] = [mNum, teamA, teamB, mDate, "Round of 16", "Knockout Round", "Euros 2024", d_Teams[teamA], d_Opp[teamB], d_Matches["Round of 16"], d_SoC["Knockout Round"]]
    ro16_matches.index += 1
    ro16_matches.loc[-1] = [mNum, teamB, teamA, mDate, "Round of 16", "Knockout Round", "Euros 2024", d_Teams[teamB], d_Opp[teamA], d_Matches["Round of 16"], d_SoC["Knockout Round"]]
    ro16_matches.index += 1

In [144]:
ro16_matches.head()

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode
15,38,Germany,Spain,2024-06-29,Round of 16,Knockout Round,Euros 2024,20,50,5,2
14,38,Spain,Germany,2024-06-29,Round of 16,Knockout Round,Euros 2024,50,20,5,2
13,37,Switzerland,Denmark,2024-06-29,Round of 16,Knockout Round,Euros 2024,52,12,5,2
12,37,Denmark,Switzerland,2024-06-29,Round of 16,Knockout Round,Euros 2024,12,52,5,2
11,40,England,Georgia,2024-06-30,Round of 16,Knockout Round,Euros 2024,13,19,5,2


In [145]:
ro16_preds, ro16_ps, ro16_acc = run_prediction(ro16_matches, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
print(ro16_preds, ro16_ps, ro16_acc)
ro16_matches["pred"] = [ pointsR[pred] for pred in ro16_preds]

[3 3 3 0 0 0 3 0 0 3 3 0 0 3 0 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174


In [146]:
def correct_draws(ko):
    for r in range(ko.shape[0]):
        if r % 2 == 0:
            res_a = ko.loc[r, "pred"]
            res_b = ko.loc[r+1, "pred"]
            if res_a == res_b: 
                teamA = ko.loc[r, "team"]
                teamB = ko.loc[r+1, "team"]
                mCode = ko.loc[r, "mCode"]
                socCode = ko.loc[r, "socCode"]
                
                res_a, res_b = head_to_head(teamA, teamB, mCode, socCode)
                ko.loc[r, "pred"] = res_a 
                ko.loc[r+1, "pred"] = res_b

def head_to_head(teamA, teamB, mCode, socCode): 
    dInfo = {"team": ["filler"], "opponent": ["fller"], "tCode": "filler", "oppCode": ["filler"], "mCode": ["filler"], "socCode": ["filler"]}
    dummy = pd.DataFrame(dInfo) 
    dummy = dummy.drop(index=0) 

    for i in range(6):
        dummy.loc[-1] = [teamA, teamB, d_Teams[teamA], d_Opp[teamB], d_Matches["Head-To-Head"], socCode]
        dummy.index+= 1
        dummy.loc[-1] = [teamB, teamA, d_Teams[teamB], d_Opp[teamA], d_Matches["Head-To-Head"], socCode]
        dummy.index+= 1
    dummy.loc[-1] = [teamA, teamB, d_Teams[teamA], d_Opp[teamB], mCode, socCode]
    dummy.index+= 1
    dummy.loc[-1] = [teamB, teamA, d_Teams[teamB], d_Opp[teamA], mCode, socCode]
    dummy.index+= 1

    dPreds, d_ps, d_acc = run_prediction(dummy, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
    dummy["pred"] = [pointsR[pred] for pred in dPreds]

    a_wins = dummy.loc[(dummy["team"] == teamA) & (dummy["pred"] == "W")].shape[0]
    b_wins = dummy.loc[(dummy["team"] == teamB) & (dummy["pred"] == "W")].shape[0]

    a_res = "err"
    b_res = "err" 

    if a_wins > b_wins: 
        a_res = "W" 
        b_res = "L"
    elif a_wins < b_wins: 
        a_res = "L" 
        b_res = "W"
    else: 
        print("ERR. head to head produced more draws") 

    return a_res, b_res 


In [147]:
ro16_matches.index = range(ro16_matches.shape[0])
correct_draws(ro16_matches)
ro16_matches

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode,pred
0,38,Germany,Spain,2024-06-29,Round of 16,Knockout Round,Euros 2024,20,50,5,2,W
1,38,Spain,Germany,2024-06-29,Round of 16,Knockout Round,Euros 2024,50,20,5,2,L
2,37,Switzerland,Denmark,2024-06-29,Round of 16,Knockout Round,Euros 2024,52,12,5,2,W
3,37,Denmark,Switzerland,2024-06-29,Round of 16,Knockout Round,Euros 2024,12,52,5,2,L
4,40,England,Georgia,2024-06-30,Round of 16,Knockout Round,Euros 2024,13,19,5,2,L
5,40,Georgia,England,2024-06-30,Round of 16,Knockout Round,Euros 2024,19,13,5,2,W
6,39,Italy,Netherlands,2024-06-30,Round of 16,Knockout Round,Euros 2024,27,38,5,2,W
7,39,Netherlands,Italy,2024-06-30,Round of 16,Knockout Round,Euros 2024,38,27,5,2,L
8,42,France,Ukraine,2024-07-01,Round of 16,Knockout Round,Euros 2024,17,55,5,2,L
9,42,Ukraine,France,2024-07-01,Round of 16,Knockout Round,Euros 2024,55,17,5,2,W


### Predciting Quarter-finals



In [148]:
qf_matches = pd.DataFrame(data=mInfo)
qf_matches.insert(0, "mNumber", [0])
qf_matches = qf_matches.drop(index=0)

In [149]:
for lineup in qf_lineups: 
    mDate = lineup.find("div", class_="fdate").find("span", class_="bday").text
    parA = int(lineup.find("th", class_="fhome").text[-2:])
    mNum = int(lineup.find("th", class_="fscore").a.text[-2:])
    parB = int(lineup.find("th", class_="faway").text[-2:])
    
    teamA = ro16_matches.loc[(ro16_matches["mNumber"] == parA) & (ro16_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]
    teamB = ro16_matches.loc[(ro16_matches["mNumber"] == parB) & (ro16_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]

    qf_matches.loc[-1] = [mNum, teamA, teamB, mDate, "Quarter-finals", "Knockout Round", "Euros 2024", d_Teams[teamA], d_Opp[teamB], d_Matches["Quarter-finals"], d_SoC["Knockout Round"]]
    qf_matches.index += 1
    qf_matches.loc[-1] = [mNum, teamB, teamA, mDate, "Quarter-finals", "Knockout Round", "Euros 2024", d_Teams[teamB], d_Opp[teamA], d_Matches["Quarter-finals"], d_SoC["Knockout Round"]]
    qf_matches.index += 1

In [150]:
qf_matches

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode
7,45,Italy,Switzerland,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,27,52,4,2
6,45,Switzerland,Italy,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,52,27,4,2
5,46,Portugal,Ukraine,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,42,55,4,2
4,46,Ukraine,Portugal,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,55,42,4,2
3,48,Georgia,Germany,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,19,20,4,2
2,48,Germany,Georgia,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,20,19,4,2
1,47,Slovenia,Poland,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,49,41,4,2
0,47,Poland,Slovenia,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,41,49,4,2


In [151]:
qf_preds, qf_ps, qf_acc = run_prediction(qf_matches, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
print(ro16_preds, qf_ps, qf_acc)
qf_matches["pred"] = [ pointsR[pred] for pred in qf_preds]

[3 3 3 0 0 0 3 0 0 3 3 0 0 3 0 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174


In [152]:
qf_matches.index = range(qf_matches.shape[0])
correct_draws(qf_matches)
qf_matches

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode,pred
0,45,Italy,Switzerland,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,27,52,4,2,W
1,45,Switzerland,Italy,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,52,27,4,2,L
2,46,Portugal,Ukraine,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,42,55,4,2,W
3,46,Ukraine,Portugal,2024-07-05,Quarter-finals,Knockout Round,Euros 2024,55,42,4,2,L
4,48,Georgia,Germany,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,19,20,4,2,L
5,48,Germany,Georgia,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,20,19,4,2,W
6,47,Slovenia,Poland,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,49,41,4,2,W
7,47,Poland,Slovenia,2024-07-06,Quarter-finals,Knockout Round,Euros 2024,41,49,4,2,L


### Predicting the Semi's 

In [153]:
sf_matches = pd.DataFrame(data=mInfo)
sf_matches.insert(0, "mNumber", [0])
sf_matches = sf_matches.drop(index=0)

In [154]:
for lineup in sf_lineups: 
    mDate = lineup.find("div", class_="fdate").find("span", class_="bday").text
    parA = int(lineup.find("th", class_="fhome").text[-2:])
    mNum = int(lineup.find("th", class_="fscore").a.text[-2:])
    parB = int(lineup.find("th", class_="faway").text[-2:])
    
    teamA = qf_matches.loc[(qf_matches["mNumber"] == parA) & (qf_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]
    teamB = qf_matches.loc[(qf_matches["mNumber"] == parB) & (qf_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]

    sf_matches.loc[-1] = [mNum, teamA, teamB, mDate, "Semi-finals", "Knockout Round", "Euros 2024", d_Teams[teamA], d_Opp[teamB], d_Matches["Semi-finals"], d_SoC["Knockout Round"]]
    sf_matches.index += 1
    sf_matches.loc[-1] = [mNum, teamB, teamA, mDate, "Semi-finals", "Knockout Round", "Euros 2024", d_Teams[teamB], d_Opp[teamA], d_Matches["Semi-finals"], d_SoC["Knockout Round"]]
    sf_matches.index += 1

In [155]:
sf_matches

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode
3,49,Italy,Portugal,2024-07-09,Semi-finals,Knockout Round,Euros 2024,27,42,6,2
2,49,Portugal,Italy,2024-07-09,Semi-finals,Knockout Round,Euros 2024,42,27,6,2
1,50,Slovenia,Germany,2024-07-10,Semi-finals,Knockout Round,Euros 2024,49,20,6,2
0,50,Germany,Slovenia,2024-07-10,Semi-finals,Knockout Round,Euros 2024,20,49,6,2


In [156]:
sf_preds, sf_ps, sf_acc = run_prediction(sf_matches, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
print(sf_preds, sf_ps, sf_acc)
sf_matches["pred"] = [pointsR[pred] for pred in sf_preds]

[3 0 0 0] [0.74037213 0.62586605 0.73758562] 0.7293307086614174


In [157]:
sf_matches.index = range(sf_matches.shape[0])
correct_draws(sf_matches)
sf_matches

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode,pred
0,49,Italy,Portugal,2024-07-09,Semi-finals,Knockout Round,Euros 2024,27,42,6,2,W
1,49,Portugal,Italy,2024-07-09,Semi-finals,Knockout Round,Euros 2024,42,27,6,2,L
2,50,Slovenia,Germany,2024-07-10,Semi-finals,Knockout Round,Euros 2024,49,20,6,2,L
3,50,Germany,Slovenia,2024-07-10,Semi-finals,Knockout Round,Euros 2024,20,49,6,2,W


In [158]:
f_match = pd.DataFrame(data=mInfo)
f_match.insert(0, "mNumber", [0])
f_match = f_match.drop(index = 0)

In [159]:
fDate = f_lineup.find("div", class_="fdate").find("span", class_="bday").text
fparA = int(f_lineup.find("th", class_="fhome").text[-2:])
fmNum = int(f_lineup.find("th", class_="fscore").a.text[-2:])
fparB = int(f_lineup.find("th", class_="faway").text[-2:])

finalistA = sf_matches.loc[(sf_matches["mNumber"] == fparA) & (sf_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]
finalistB = sf_matches.loc[(sf_matches["mNumber"] == fparB) & (sf_matches["pred"] == "W")]["team"].to_string().split(" ")[-1]

f_match.loc[-1] = [fmNum, finalistA, finalistB, fDate, "Final", "Knockout Round", "Euros 2024", d_Teams[finalistA], d_Opp[finalistB], d_Matches["Final"], d_SoC["Knockout Round"]]
f_match.index += 1
f_match.loc[-1] = [fmNum, finalistB, finalistA, fDate, "Final", "Knockout Round", "Euros 2024", d_Teams[finalistB], d_Opp[finalistA], d_Matches["Final"], d_SoC["Knockout Round"]]
f_match.index += 1

In [160]:
f_match

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode
1,51,Italy,Germany,2024-07-14,Final,Knockout Round,Euros 2024,27,20,0,2
0,51,Germany,Italy,2024-07-14,Final,Knockout Round,Euros 2024,20,27,0,2


In [161]:
f_pred, f_ps, f_acc = run_prediction(f_match, cMatches, ["tCode", "oppCode", "mCode", "socCode"])
print(f_pred, f_ps, f_acc)
f_match["pred"] = [pointsR[pred] for pred in f_pred]

[3 3] [0.74037213 0.62586605 0.73758562] 0.7293307086614174


In [162]:
f_match.index = range(f_match.shape[0])
correct_draws(f_match)
f_match

Unnamed: 0,mNumber,team,opponent,mDate,mType,SoC,comp,tCode,oppCode,mCode,socCode,pred
0,51,Italy,Germany,2024-07-14,Final,Knockout Round,Euros 2024,27,20,0,2,L
1,51,Germany,Italy,2024-07-14,Final,Knockout Round,Euros 2024,20,27,0,2,W


## The Euros 2024 Reported 

Now, all that's left is to formatt them into a pretty csv 

In [163]:
dummytable = {"stage": ["filler"],
              "team_a": ["filler"], 
              "res_a": ["filler"],
              "res_b": ["filler"],
              "team_b": ["filler"],
              "date": [pd.NA]} 

euros_2024 = pd.DataFrame(data=dummytable) 
euros_2024 = euros_2024.drop(index=0)

## group stage
for g, table in enumerate(en_tables): 
    for index in range(table.shape[0]):
        if index % 2 == 0: 
            euros_2024.loc[-1] = [f"Group {groups[g]}", table.loc[index, "team"], table.loc[index, "correctedPred"], table.loc[index+1, "correctedPred"], table.loc[index, "opponent"], table.loc[index,"mDate"]]
            euros_2024.index += 1
        index += 1

## ro16 
for r, row in ro16_matches.iterrows(): 
    if r % 2 == 0: 
        euros_2024.loc[-1] = ["Round of 16", row["team"], row["pred"], ro16_matches.loc[r+1, "pred"], row["opponent"], row["mDate"]]
        euros_2024.index += 1
## qf
for r, row in qf_matches.iterrows(): 
    if r % 2 == 0: 
        euros_2024.loc[-1] = ["Quarter-finals", row["team"], row["pred"], qf_matches.loc[r+1, "pred"], row["opponent"], row["mDate"]]
        euros_2024.index += 1

## sf
for r, row in sf_matches.iterrows(): 
    if r % 2 == 0: 
        euros_2024.loc[-1] = ["Semi-finals", row["team"], row["pred"], sf_matches.loc[r+1, "pred"], row["opponent"], row["mDate"]]
        euros_2024.index += 1

## f
for r, row in f_match.iterrows(): 
    if r % 2 == 0: 
        euros_2024.loc[-1] = ["Final", row["team"], row["pred"], f_match.loc[r+1, "pred"], row["opponent"], row["mDate"]]
        euros_2024.index += 1

euros_2024.index = range(euros_2024.shape[0])
euros_2024["date"] = pd.to_datetime(euros_2024["date"])

In [164]:
euros_2024 = euros_2024.groupby("stage").apply(lambda x: x, include_groups=False)
euros_2024 = euros_2024.sort_index(level=1)
euros_2024

Unnamed: 0_level_0,Unnamed: 1_level_0,team_a,res_a,res_b,team_b,date
stage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Group A,0,Germany,W,L,Scotland,2024-06-14
Group A,1,Hungary,L,W,Switzerland,2024-06-15
Group A,2,Germany,D,D,Hungary,2024-06-19
Group A,3,Scotland,L,W,Switzerland,2024-06-19
Group A,4,Switzerland,D,D,Germany,2024-06-23
Group A,5,Scotland,L,W,Hungary,2024-06-23
Group B,6,Spain,D,D,Croatia,2024-06-15
Group B,7,Italy,W,L,Albania,2024-06-15
Group B,8,Croatia,D,D,Albania,2024-06-19
Group B,9,Spain,D,D,Italy,2024-06-20


In [165]:
import os

trial = 0 
e_file = f"./predictions/match_preds/t{trial}_euros2024.csv"
while(os.path.isfile(e_file)):
    e_file = f"./predictions/match_preds/t{trial}_euros2024.csv"
    trial+= 1
euros_2024.to_csv(e_file)