# Soccer World Cup Qatar 2022 Predictions
Get ready to witness the future of sports predictions with our cutting-edge project. Using the power of artificial intelligence and the renowned Naive Bayes algorithm, we have developed a groundbreaking model that will predict the outcomes of the highly anticipated World Cup in Qatar 2022. By leveraging extensive historical data, we have trained our model to forecast the results of thrilling matches. Join us on this exciting journey as we combine the realms of sports and AI to unlock unparalleled insights and anticipate the exhilarating moments that await us in this amazing tournament.

## Extracting Data Missing 
In the Extracting_data.py file we already extracted all the historical data and fixture information that the model will need but we forgot to get information about the groups, so we are going to get it now.

In [17]:
import pandas as pd
import numpy as np
from string import ascii_uppercase as alphabet

tables = pd.read_html("https://en.wikipedia.org/wiki/2022_FIFA_World_Cup")
tables[16]

Unnamed: 0,Pos,Teamvte,Pld,W,D,L,GF,GA,GD,Pts,Qualification
0,1,England,3,2,1,0,9,2,+7,7,Advanced to knockout stage
1,2,United States,3,1,2,0,2,1,+1,5,Advanced to knockout stage
2,3,Iran,3,1,0,2,4,7,−3,3,
3,4,Wales,3,0,1,2,1,6,−5,1,


In [18]:
# Groups since A -> H
dict_tables = {}
for letter,_ in zip(alphabet, range(8)):
    df = tables[9+_*7]
    df.rename(columns={df.columns[1]: 'Team'}, inplace=True)
    df["Pts"] = np.zeros(len(df))
    df.pop('Qualification')
    dict_tables[f'Group {letter}'] = df

In [19]:
dict_tables['Group B']

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts
0,1,England,3,2,1,0,9,2,+7,0.0
1,2,United States,3,1,2,0,2,1,+1,0.0
2,3,Iran,3,1,0,2,4,7,−3,0.0
3,4,Wales,3,0,1,2,1,6,−5,0.0


## Building the model
As we have all the data that will need it's time to build the model to predict the world cup's results.

In [8]:
# Loading data extracted
import pandas as pd
from scipy.stats import poisson 

df_historical_data = pd.read_csv('D:\Proyectos\Soccer_World_Championship\data\clean_historical_data_world_cups.csv')
df_fixture = pd.read_csv('D:\Proyectos\Soccer_World_Championship\data\clean_fifa_worldcup_fixture.csv')
df_fixture

Unnamed: 0,home,score,away,year
0,Qatar,Match 1,Ecuador,2022
1,Senegal,Match 2,Netherlands,2022
2,Qatar,Match 18,Senegal,2022
3,Netherlands,Match 19,Ecuador,2022
4,Ecuador,Match 35,Senegal,2022
...,...,...,...,...
59,Winners Match 51,Match 59,Winners Match 52,2022
60,Winners Match 57,Match 61,Winners Match 58,2022
61,Winners Match 59,Match 62,Winners Match 60,2022
62,Losers Match 61,Match 63,Losers Match 62,2022


In [9]:
# We are gonna calculate the strength of each team
df_home = df_historical_data[['HomeTeam','HomeGoals', 'AwayGoals']]
df_away = df_historical_data[['AwayTeam','HomeGoals', 'AwayGoals']]

In [10]:
df_home.rename(columns={'HomeTeam': 'Team','HomeGoals':'GoalsScored', 'AwayGoals':'GoalsConceded'},inplace=True)
df_away.rename(columns={'AwayTeam': 'Team','HomeGoals':'GoalsConceded', 'AwayGoals':'GoalsScored'},inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_home.rename(columns={'HomeTeam': 'Team','HomeGoals':'GoalsScored', 'AwayGoals':'GoalsConceded'},inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_away.rename(columns={'AwayTeam': 'Team','HomeGoals':'GoalsConceded', 'AwayGoals':'GoalsScored'},inplace=True)


In [11]:
df_team_strength =pd.concat([df_home, df_away], ignore_index=True).groupby('Team').mean()
df_team_strength

Unnamed: 0_level_0,GoalsScored,GoalsConceded
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,1.000000,1.461538
Angola,0.333333,0.666667
Argentina,1.691358,1.148148
Australia,0.812500,1.937500
Austria,1.482759,1.620690
...,...,...
Uruguay,1.553571,1.321429
Wales,0.800000,0.800000
West Germany,2.112903,1.241935
Yugoslavia,1.666667,1.272727


As the data must be predicted based on previous matches in each phase and historical data from other world cups, it was decided to use the Naive Bayes algorithm, which works very well with this type of data. Placing 3 points to the winner based on the maximum points that are placed in each match.

In [12]:
def predict_points(home,away):
    if home in df_team_strength.index and away in df_team_strength.index:
        #goals_scored * goals_conceded
        lamb_home = df_team_strength.at[home,'GoalsScored'] * df_team_strength.at[away,'GoalsConceded']
        lamb_away = df_team_strength.at[away,'GoalsScored'] * df_team_strength.at[home,'GoalsConceded']
        
        prob_home, prob_away, prob_draw = 0,0,0

        for x in range(0,11):
            for y in range(0,11):
                p = poisson.pmf(x, lamb_home) * poisson.pmf(y, lamb_away)
                if x == y:
                    prob_draw += p
                elif x > y:
                    prob_home += p
                else:
                    prob_away += p
        
        points_home = 3 * prob_home + prob_draw
        points_away = 3 * prob_away + prob_draw
        return (points_home, points_away)
    else:
        return (0,0)


In [13]:
# Testing the model

predict_points('England','United States')

(2.2356147635326007, 0.5922397535606193)

## Making Predictions

In [50]:
# Now we're gonna to divide the fixture data frame into the distinct fases of the world cup
df_fixture_group_48 = df_fixture[:48].copy()
df_fixture_knockout = df_fixture[48:56].copy()
df_fixture_quarter = df_fixture[56:60].copy()
df_fixture_semifinal = df_fixture[60:62].copy()
df_fixture_final = df_fixture[62:].copy()
df_fixture_group_48

Unnamed: 0,home,score,away,year
0,Qatar,Match 1,Ecuador,2022
1,Senegal,Match 2,Netherlands,2022
2,Qatar,Match 18,Senegal,2022
3,Netherlands,Match 19,Ecuador,2022
4,Ecuador,Match 35,Senegal,2022
5,Netherlands,Match 36,Qatar,2022
6,England,Match 3,Iran,2022
7,United States,Match 4,Wales,2022
8,Wales,Match 17,Iran,2022
9,England,Match 20,United States,2022


In [51]:
# Making predictions for group
for group in dict_tables:
    teams_in_group = dict_tables[group]['Team'].values
    df_fixture_group_6 = df_fixture_group_48[df_fixture_group_48['home'].isin(teams_in_group)]

    for index, row in df_fixture_group_6.iterrows():
        home,away = row['home'], row['away']
        points_home, points_away = predict_points(home,away)
        dict_tables[group].loc[dict_tables[group]['Team'] == home, 'Pts'] += points_home
        dict_tables[group].loc[dict_tables[group]['Team'] == away, 'Pts'] += points_away

    dict_tables[group] = dict_tables[group].sort_values('Pts', ascending=False).reset_index()
    dict_tables[group] = dict_tables[group][['Team','Pts']]
    dict_tables[group] = dict_tables[group].round(0)


In [52]:
# Showing ponints in each group
for group in dict_tables:
    print(group)
    print(dict_tables[group])

Group A
          Team  Pts
0  Netherlands  8.0
1      Senegal  4.0
2      Ecuador  4.0
3    Qatar (H)  0.0
Group B
            Team   Pts
0        England  12.0
1          Wales  10.0
2  United States   6.0
3           Iran   4.0
Group C
           Team   Pts
0     Argentina  14.0
1        Poland  12.0
2        Mexico   8.0
3  Saudi Arabia   2.0
Group D
        Team   Pts
0     France  14.0
1    Denmark  12.0
2    Tunisia   6.0
3  Australia   4.0
Group E
         Team   Pts
0     Germany  14.0
1       Spain  10.0
2       Japan   6.0
3  Costa Rica   4.0
Group F
      Team   Pts
0  Croatia  14.0
1  Belgium  12.0
2  Morocco   8.0
3   Canada   0.0
Group G
          Team   Pts
0       Brazil  16.0
1  Switzerland   8.0
2       Serbia   6.0
3     Cameroon   4.0
Group H
          Team   Pts
0     Portugal  12.0
1      Uruguay  10.0
2        Ghana   8.0
3  South Korea   4.0


In [53]:
# Round of 16
df_fixture_knockout

Unnamed: 0,home,score,away,year
48,Winners Group A,Match 49,Runners-up Group B,2022
49,Winners Group C,Match 50,Runners-up Group D,2022
50,Winners Group D,Match 52,Runners-up Group C,2022
51,Winners Group B,Match 51,Runners-up Group A,2022
52,Winners Group E,Match 53,Runners-up Group F,2022
53,Winners Group G,Match 54,Runners-up Group H,2022
54,Winners Group F,Match 55,Runners-up Group E,2022
55,Winners Group H,Match 56,Runners-up Group G,2022


### Round of 16 Phase Results

In [54]:
# Updating the Round of 16 fixture with the winners 
for group in dict_tables:
    group_winner = dict_tables[group].loc[0, 'Team']
    runners_up = dict_tables[group].loc[1, 'Team']
    df_fixture_knockout.replace({f'Winners {group}': group_winner,
                                 f'Runners-up {group}': runners_up},inplace=True)

df_fixture_knockout['winner'] = '?'
df_fixture_knockout['loser'] = '?'
df_fixture_knockout

Unnamed: 0,home,score,away,year,winner,loser
48,Netherlands,Match 49,Wales,2022,?,?
49,Argentina,Match 50,Denmark,2022,?,?
50,France,Match 52,Poland,2022,?,?
51,England,Match 51,Senegal,2022,?,?
52,Germany,Match 53,Belgium,2022,?,?
53,Brazil,Match 54,Uruguay,2022,?,?
54,Croatia,Match 55,Spain,2022,?,?
55,Portugal,Match 56,Switzerland,2022,?,?


In [55]:
# Function to get the winner
def get_winner(df_fixture_updated):
    for index,row in df_fixture_updated.iterrows():
        home, away = row['home'], row['away']
        points_home, points_away = predict_points(home,away)
        if points_home>points_away:
            winner = home
            loser = away
        elif points_away>points_home:
            winner = away
            loser = home
        df_fixture_updated.loc[index, 'winner'] = winner
        df_fixture_updated.loc[index, 'loser'] = loser

    return df_fixture_updated


In [56]:
# Function to update the fixture
print(df_fixture_quarter.head())

def update_table(df_fixture_round_1,df_fixture_round_2):
    for index, row in df_fixture_round_1.iterrows():
        winner = df_fixture_round_1.loc[index, 'winner']
        loser = df_fixture_round_1.loc[index, 'loser']
        match =  df_fixture_round_1.loc[index, 'score']
        df_fixture_round_2.replace({f'Winners {match}':winner}, inplace = True)
        df_fixture_round_2.replace({f'Losers {match}':loser}, inplace = True)
    df_fixture_round_2['winner'] = '?'
    df_fixture_round_2['loser'] = '?'
    return df_fixture_round_2

                home     score              away  year
56  Winners Match 53  Match 58  Winners Match 54  2022
57  Winners Match 49  Match 57  Winners Match 50  2022
58  Winners Match 55  Match 60  Winners Match 56  2022
59  Winners Match 51  Match 59  Winners Match 52  2022


In [57]:
# Getting the winners for Round of 16
df_fixture_knockout_updated = get_winner(df_fixture_knockout)
df_fixture_knockout_updated


Unnamed: 0,home,score,away,year,winner,loser
48,Netherlands,Match 49,Wales,2022,Netherlands,Wales
49,Argentina,Match 50,Denmark,2022,Argentina,Denmark
50,France,Match 52,Poland,2022,France,Poland
51,England,Match 51,Senegal,2022,England,Senegal
52,Germany,Match 53,Belgium,2022,Germany,Belgium
53,Brazil,Match 54,Uruguay,2022,Brazil,Uruguay
54,Croatia,Match 55,Spain,2022,Spain,Croatia
55,Portugal,Match 56,Switzerland,2022,Portugal,Switzerland


In [59]:
# Updating the quater fixture with the winners of Round of 16
fixture_quarter_updated = update_table(df_fixture_knockout_updated,df_fixture_quarter)
fixture_quarter_updated

Unnamed: 0,home,score,away,year,winner,loser
56,Germany,Match 58,Brazil,2022,?,?
57,Netherlands,Match 57,Argentina,2022,?,?
58,Spain,Match 60,Portugal,2022,?,?
59,England,Match 59,France,2022,?,?


### Quater Finals Phase Results

In [60]:
# Getting the winners for Quarter Finals
fixture_quarter_updated = get_winner(fixture_quarter_updated)
fixture_quarter_updated

Unnamed: 0,home,score,away,year,winner,loser
56,Germany,Match 58,Brazil,2022,Brazil,Germany
57,Netherlands,Match 57,Argentina,2022,Netherlands,Argentina
58,Spain,Match 60,Portugal,2022,Portugal,Spain
59,England,Match 59,France,2022,France,England


In [61]:
# Updating the semifinals fixture
fixture_semifinals_updated = update_table(fixture_quarter_updated, df_fixture_semifinal)
fixture_semifinals_updated

Unnamed: 0,home,score,away,year,winner,loser
60,Netherlands,Match 61,Brazil,2022,?,?
61,France,Match 62,Portugal,2022,?,?


### SemiFinals Phase Results

In [62]:
# Getting the winners for Semifinals
fixture_semifinals_updated = get_winner(fixture_semifinals_updated)
fixture_semifinals_updated

Unnamed: 0,home,score,away,year,winner,loser
60,Netherlands,Match 61,Brazil,2022,Brazil,Netherlands
61,France,Match 62,Portugal,2022,France,Portugal


In [63]:
# Updating the finals fixture
fixture_finals_updated = update_table(fixture_semifinals_updated, df_fixture_final)
fixture_finals_updated

Unnamed: 0,home,score,away,year,winner,loser
62,Netherlands,Match 63,Portugal,2022,?,?
63,Brazil,Match 64,France,2022,?,?


# Finals Results

In [64]:
# Getting the winners for Finals
fixture_finals_updated = get_winner(fixture_finals_updated)
fixture_finals_updated

Unnamed: 0,home,score,away,year,winner,loser
62,Netherlands,Match 63,Portugal,2022,Netherlands,Portugal
63,Brazil,Match 64,France,2022,Brazil,France


In [70]:
# Creating a DataFrame with the final Results
final_results_df = pd.DataFrame({"Team":[fixture_finals_updated.iloc[1]["winner"],
                                        fixture_finals_updated.iloc[1]["loser"],
                                        fixture_finals_updated.iloc[0]["winner"],
                                        fixture_finals_updated.iloc[0]["loser"]],
                                "Place":["Chapion", "Runner-up","Third place","Fourth place"]})
final_results_df

Unnamed: 0,Team,Place
0,Brazil,Chapion
1,France,Runner-up
2,Netherlands,Third place
3,Portugal,Fourth place
