<a href="https://colab.research.google.com/github/Andy14py/my_projects/blob/main/DSProject_ModeloPredictivo_Mundial_Qatar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center> **PREDECIR CAMPEON DEL MUNDIAL QATAR** <center>

# FASE 3: Modelo predictivo
luego de la Fase 1(Extraccion_Data) y Fase2(Limpieza_Transformacion) estamos listos para modelar el codigo para predecir al campeon del mundial, este sera por medio de una distribucion de poisson pues se ajusta a los parametros de nuestros datos.

In [None]:
import pandas as pd
import pickle
import requests
from scipy.stats import poisson
# Este metodo descarga el archivo desde GitHub y carga el objeto dict_table
url = 'https://github.com/Andy14py/my_projects/raw/a986c7eb1ab77cb27e1cbbe6ecabcc3f6373858f/data_base_final/dict_table'
dict_table = pickle.loads(requests.get(url).content)
# este metodo pickle solo sirve si el archivo esta en tu pc
dict_table2=pickle.load(open('dict_table','rb'))
df_data_historica=pd.read_csv('https://raw.githubusercontent.com/Andy14py/my_projects/main/data_base_final/clean_data_historica_fifa_worldcup.csv')
df_fixture=pd.read_csv('https://raw.githubusercontent.com/Andy14py/my_projects/main/data_base_final/clean_fifa_worldcup_fixture.csv')

## Calcular la fortaleza de un equipo
Este parametro se vera influenciado por la cantidad de goles marcados y la cantidad de goles concedidos al rival o recibidos


In [None]:
# dividimos el df historico en df_home y df_away
df_home=df_data_historica[['HomeTeam', 'HomeGoals', 'AwayGoals']]
df_away=df_data_historica[['AwayTeam', 'HomeGoals', 'AwayGoals']]
#Cambiamos nombres a las columnas renombramos con rename
#Si nos damos cuenta y lo vemos solo desde la perspectiva del equipo local(HomeTeam) para 
#el equipo local los GoalsScore son los goles que anota al equipo visitante(AwayTeam) y para
#el equipo visitante(AwayTeam) estos goles son los GoalsConceded por ello el nombre en el df_away cambia
df_home=df_home.rename(columns={'HomeTeam':'Team','HomeGoals':'GoalsScored', 'AwayGoals':'GoalsConceded'})
df_away=df_away.rename(columns={'AwayTeam':'Team','HomeGoals':'GoalsConceded', 'AwayGoals':'GoalsScored'})
#Concatenamos los dos(df_home y df_away) 
df_team_strength=pd.concat([df_home,df_away],ignore_index=True).groupby('Team').mean()
df_team_strength

Unnamed: 0_level_0,GoalsScored,GoalsConceded
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,1.000000,1.461538
Angola,0.333333,0.666667
Argentina,1.691358,1.148148
Australia,0.812500,1.937500
Austria,1.482759,1.620690
...,...,...
Uruguay,1.553571,1.321429
Wales,0.800000,0.800000
West Germany,2.098361,1.213115
Yugoslavia,1.666667,1.272727


## Funcion de prediccion

In [None]:
# el lamb_home es el Landa(λ) el promedio de goles en 90 minutos o por partido 
# realizado por el equipo local y visitante, estamos usando una distribucion tipo Poison
# x es el numero de goles que podria marcar el equipo A o B
# si un partido acaba 2-3 el X del equipo A seria 2 y del equipo B seria 3
def predict_points(home, away):
    if home in df_team_strength.index and away in df_team_strength.index:
        # goals_scored * goals_conceded
        lamb_home = df_team_strength.at[home,'GoalsScored'] * df_team_strength.at[away,'GoalsConceded']
        lamb_away = df_team_strength.at[away,'GoalsScored'] * df_team_strength.at[home,'GoalsConceded']
        prob_home, prob_away, prob_draw = 0, 0, 0
# Este bucle nos da los posibles resultados desde 0-1,0-2,0-3 todas las 121 combinaciones
        for x in range(0,11): #number of goals home team
            for y in range(0, 11): #number of goals away team
                p = poisson.pmf(x, lamb_home) * poisson.pmf(y, lamb_away)
                if x == y:
                    prob_draw += p
                elif x > y:
                    prob_home += p
                else:
                    prob_away += p
# Aqui si la prob_home fuera =100% la prob_draw seria 0 y el equipo home obtendria 3 puntos
# caso contrario el equipo home solo obtendria 1 punto
        points_home = 3 * prob_home + prob_draw
        points_away = 3 * prob_away + prob_draw
        return (points_home, points_away)
    else:
        return (0, 0)

## Test de funcionamiento

In [None]:
predict_points('Argentina', 'Mexico')

(2.3129151525530505, 0.5378377125059863)

In [None]:
print(predict_points('England', 'United States'))
print(predict_points('Argentina', 'Mexico'))
print(predict_points('Qatar (H)', 'Ecuador')) # Qatar vs Team X -> 0 points to both

(2.2356147635326007, 0.5922397535606193)
(2.3129151525530505, 0.5378377125059863)
(0, 0)


## Fase de Grupos

In [None]:
df_fixture_group_48 = df_fixture[:48].copy()
df_fixture_knockout = df_fixture[48:56].copy()
df_fixture_quarter = df_fixture[56:60].copy()
df_fixture_semi = df_fixture[60:62].copy()
df_fixture_final = df_fixture[62:].copy()

In [None]:
# El siguiente bucle itera los grupos crea un nuevo conjunto para determinar si los paises estan dentro del grupo y luego
# determina el ganador y lo agrega a la tabla correspondiente
for group in dict_table:
    teams_in_group = dict_table[group]['Team'].values
    df_fixture_group_6 = df_fixture_group_48[df_fixture_group_48['home'].isin(teams_in_group)]
    for index, row in df_fixture_group_6.iterrows():
        home, away = row['home'], row['away']
        points_home, points_away = predict_points(home, away)
        dict_table[group].loc[dict_table[group]['Team'] == home, 'Pts'] += points_home
        dict_table[group].loc[dict_table[group]['Team'] == away, 'Pts'] += points_away

    dict_table[group] = dict_table[group].sort_values('Pts', ascending=False).reset_index()
    dict_table[group] = dict_table[group][['Team', 'Pts']]
    dict_table[group] = dict_table[group].round(0)

In [None]:
dict_table['Group A']

Unnamed: 0,Team,Pts
0,Netherlands,4.0
1,Senegal,2.0
2,Ecuador,2.0
3,Qatar (H),0.0


## Octavos de final

In [None]:

df_fixture_knockout

Unnamed: 0,home,score,away,year
48,Winners Group A,Match 49,Runners-up Group B,2022
49,Winners Group C,Match 50,Runners-up Group D,2022
50,Winners Group D,Match 52,Runners-up Group C,2022
51,Winners Group B,Match 51,Runners-up Group A,2022
52,Winners Group E,Match 53,Runners-up Group F,2022
53,Winners Group G,Match 54,Runners-up Group H,2022
54,Winners Group F,Match 55,Runners-up Group E,2022
55,Winners Group H,Match 56,Runners-up Group G,2022


In [None]:
for group in dict_table:
    group_winner = dict_table[group].loc[0, 'Team']
    runners_up = dict_table[group].loc[1, 'Team']
    df_fixture_knockout.replace({f'Winners {group}':group_winner,
                                 f'Runners-up {group}':runners_up}, inplace=True)

df_fixture_knockout['winner'] = '?'
df_fixture_knockout

Unnamed: 0,home,score,away,year,winner
48,Netherlands,Match 49,Wales,2022,?
49,Argentina,Match 50,Denmark,2022,?
50,France,Match 52,Poland,2022,?
51,England,Match 51,Senegal,2022,?
52,Germany,Match 53,Belgium,2022,?
53,Brazil,Match 54,Uruguay,2022,?
54,Croatia,Match 55,Spain,2022,?
55,Portugal,Match 56,Switzerland,2022,?


In [None]:
# la funcion siguiente actualiza la tabla con los ganadores de las rondas
def get_winner(df_fixture_updated):
    for index, row in df_fixture_updated.iterrows():
        home, away = row['home'], row['away']
        points_home, points_away = predict_points(home, away)
        if points_home > points_away:
            winner = home
        else:
            winner = away
        df_fixture_updated.loc[index, 'winner'] = winner
    return df_fixture_updated

In [None]:
get_winner(df_fixture_knockout)

Unnamed: 0,home,score,away,year,winner
48,Netherlands,Match 49,Wales,2022,Netherlands
49,Argentina,Match 50,Denmark,2022,Argentina
50,France,Match 52,Poland,2022,France
51,England,Match 51,Senegal,2022,England
52,Germany,Match 53,Belgium,2022,Germany
53,Brazil,Match 54,Uruguay,2022,Brazil
54,Croatia,Match 55,Spain,2022,Spain
55,Portugal,Match 56,Switzerland,2022,Portugal


## Cuartos de  Final

In [None]:
# la funcion siguiente actualiza la tabla con los ganadores de las rondas
def update_table(df_fixture_round_1, df_fixture_round_2):
    for index, row in df_fixture_round_1.iterrows():
        winner = df_fixture_round_1.loc[index, 'winner']
        match = df_fixture_round_1.loc[index, 'score']
        df_fixture_round_2.replace({f'Winners {match}':winner}, inplace=True)
    df_fixture_round_2['winner'] = '?'
    return df_fixture_round_2

In [None]:
update_table(df_fixture_knockout, df_fixture_quarter)

Unnamed: 0,home,score,away,year,winner
56,Germany,Match 58,Brazil,2022,?
57,Netherlands,Match 57,Argentina,2022,?
58,Spain,Match 60,Portugal,2022,?
59,England,Match 59,France,2022,?


In [None]:
get_winner(df_fixture_quarter)

Unnamed: 0,home,score,away,year,winner
56,Germany,Match 58,Brazil,2022,Brazil
57,Netherlands,Match 57,Argentina,2022,Netherlands
58,Spain,Match 60,Portugal,2022,Portugal
59,England,Match 59,France,2022,France


## Semifinal

In [None]:

update_table(df_fixture_quarter, df_fixture_semi)

Unnamed: 0,home,score,away,year,winner
60,Netherlands,Match 61,Brazil,2022,?
61,France,Match 62,Portugal,2022,?


In [None]:
get_winner(df_fixture_semi)

Unnamed: 0,home,score,away,year,winner
60,Netherlands,Match 61,Brazil,2022,Brazil
61,France,Match 62,Portugal,2022,France


## Final del Mundial



In [None]:

update_table(df_fixture_semi, df_fixture_final)

Unnamed: 0,home,score,away,year,winner
62,Losers Match 61,Match 63,Losers Match 62,2022,?
63,Brazil,Match 64,France,2022,?


In [None]:
get_winner(df_fixture_final)

Unnamed: 0,home,score,away,year,winner
62,Losers Match 61,Match 63,Losers Match 62,2022,Losers Match 62
63,Brazil,Match 64,France,2022,Brazil


Como resultado obtuvimos que la final se disputaria entre Francia y Brasil y el ganador final seria Brasil