# Calculando a Probabilidade **Premier League 2023-2024**
_________________

### Fazendo Cálculo de Probabilidade dos Jogos da **Premier League temporada 2023 e 2024**.

### Ao baixar um novo arquivo o código pode ser rodado novamente e obter novas probabilidades.

- Fonte dos Dados: https://www.football-data.co.uk/books.php
_____________________

# 1. Carregando os dados:
________________

In [1]:
# Importando bibliotecas
import pandas as pd
import warnings 
warnings.filterwarnings("ignore") 

# Carregando a tabela de dados
df = pd.read_csv('E0.csv')

# Visualizando uma amostra dos dados
df.head(3)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,11/08/2023,20:00,Burnley,Man City,0,3,A,0,2,...,2.28,1.5,1.95,1.98,1.95,1.97,,,1.92,1.95
1,E0,12/08/2023,12:30,Arsenal,Nott'm Forest,2,1,H,2,0,...,2.63,-2.0,1.95,1.98,1.93,1.97,2.01,2.09,1.95,1.92
2,E0,12/08/2023,15:00,Bournemouth,West Ham,1,1,D,0,0,...,2.12,0.0,2.02,1.91,2.01,1.92,2.06,1.96,1.96,1.91


# 2. Criando uma tabela de classificação
________________

In [2]:
# Selecionando as colunas objeto da análise

# Dicionários:
    # HomeTeam - Time da Casa
    # AwayTeam - Time Visitante
    # FTHG     - Total de Gols Marcados pelo time da Casa
    # FTAG     - Total de Gols Marcados pelo time Visitante
    # FTR      - Resultado do Jogo ( Home, Draw, Away)

df = df[['HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']]
df['Match'] = 1
pl = df.copy()
pl.head(3)

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,Match
0,Burnley,Man City,0,3,A,1
1,Arsenal,Nott'm Forest,2,1,H,1
2,Bournemouth,West Ham,1,1,D,1


In [3]:
# Construindo tabela de Goals Marcados em casa e fora
TeamHomeGoalsFor = pl.groupby('HomeTeam')[['Match','FTHG']].sum()
TeamAwayGoalsFor = pl.groupby('AwayTeam')[['Match','FTAG']].sum()
# Agrupando as tabelas gols marcadas
total_goals_for = TeamHomeGoalsFor.merge(TeamAwayGoalsFor, left_index=True, right_index=True)
# Visualizando uma amostra
total_goals_for.head(3)

Unnamed: 0_level_0,Match_x,FTHG,Match_y,FTAG
HomeTeam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Arsenal,10,22,10,15
Aston Villa,10,29,11,14
Bournemouth,9,11,10,17


In [4]:
# Criando uma coluna com gols marcados em casa e fora
total_goals_for['GF'] = total_goals_for['FTHG'] + total_goals_for['FTAG']
# Criandndo uma coluna com o número de Jogos realizado com o total de jogos Home e Away
total_goals_for['MATCH'] = total_goals_for['Match_x'] + total_goals_for['Match_y']
total_goals_for.drop(columns=['Match_x', 'Match_y'], inplace=True)
# Renomeando as novas colunas
total_goals_for.rename(columns={'FTHG':'GFH',
                                 'FTAG':'GFA'}, inplace=True)
# Tabela de Gols Marcadas em Casa e Fora e o Total de Partidas
total_goals_for.head(3)

Unnamed: 0_level_0,GFH,GFA,GF,MATCH
HomeTeam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Arsenal,22,15,37,20
Aston Villa,29,14,43,21
Bournemouth,11,17,28,19


In [5]:
# Contruindo tabela de Goals sofridos em casa e fora                                               
TeamHomeGoalsAgainst = pl.groupby('HomeTeam')['FTAG'].sum()                          
TeamAwayGoalsAgainst = pl.groupby('AwayTeam')['FTHG'].sum()
# Construindo um dataframe de gols sofridos
total_goals_against = pd.DataFrame({'GAH':TeamHomeGoalsAgainst,
                                 'GAA':TeamAwayGoalsAgainst})
# Criando uma coluna com Gols Sofridos somando os gols fora e em casa
total_goals_against['GA'] = total_goals_against['GAH'] + total_goals_against['GAA']
# Visualizando uma amostra da tabela de gols sofridos
total_goals_against.head(3)

Unnamed: 0,GAH,GAA,GA
Arsenal,10,10,20
Aston Villa,8,19,27
Bournemouth,12,23,35


In [6]:
# Agruapando as tabelas de resultados de gols em casa e fora
result_goals = total_goals_for.merge(total_goals_against, left_index=True, right_index=True)
# Criando a tabela de Saldo de Gols
result_goals['GD'] = result_goals['GF'] - result_goals['GA']

# Visualizando a tabela construida
# Dicionario de termos: 

#       GFH   - Goals For Home, 
#       GFA   - Goals For Away, 
#       Match - Played,  
#       GF    - Goals For
#       GAH   - Goals Against Home, 
#       GAA   -  Goals Against Awqy, 
#       GA    - Goals Away, 
#       GD    - Goals Diference

# Visaulizando uma amostra dos dados
result_goals.head(3)

Unnamed: 0_level_0,GFH,GFA,GF,MATCH,GAH,GAA,GA,GD
HomeTeam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Arsenal,22,15,37,20,10,10,20,17
Aston Villa,29,14,43,21,8,19,27,16
Bournemouth,11,17,28,19,12,23,35,-7


In [7]:
# Construindo uma tabela de classificação  a partir da tabela anterior ( pontuação, vitorias, derroatas e empates)
pl.head(3)

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,Match
0,Burnley,Man City,0,3,A,1
1,Arsenal,Nott'm Forest,2,1,H,1
2,Bournemouth,West Ham,1,1,D,1


In [8]:
# Criando uma função para calcular a pontuação jogo a Jogo
def home_pts(resultado):
    if resultado =='H':
        return 3
    elif resultado=='D':
        return 1
    else:
        return 0
def away_pts(resultado):
    if resultado =='A':
        return 3
    elif resultado =='D':
        return 1
    else:
        return 0
# Construindo as colunas com as funções    
pl['HomePoints'] = pl['FTR'].apply(home_pts)
pl['AwayPoints'] = pl['FTR'].apply(away_pts)
# Visualizando uma amostra dos dados
pl.head(3)


Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,Match,HomePoints,AwayPoints
0,Burnley,Man City,0,3,A,1,0,3
1,Arsenal,Nott'm Forest,2,1,H,1,3,0
2,Bournemouth,West Ham,1,1,D,1,1,1


In [9]:
# resultados em Home
result_home = pl.groupby('HomeTeam')['HomePoints'].value_counts().unstack().reset_index().fillna(0)
result_home.columns.name = None
result_home.columns=['HomeTeam', 'LH','DH','WH']
result_home.head(3)

Unnamed: 0,HomeTeam,LH,DH,WH
0,Arsenal,1.0,2.0,7.0
1,Aston Villa,0.0,1.0,9.0
2,Bournemouth,3.0,3.0,3.0


In [10]:
# resultados Away
result_away = pl.groupby('AwayTeam')['AwayPoints'].value_counts().unstack().reset_index().fillna(0)
result_away.columns.name = None
result_away.columns=['AwayTeam', 'LA','DA','WA']
result_away.head(3)

Unnamed: 0,AwayTeam,LA,DA,WA
0,Arsenal,3.0,2.0,5.0
1,Aston Villa,4.0,3.0,4.0
2,Bournemouth,5.0,1.0,4.0


In [11]:
# Agrupando as duas tabelas
results = result_home.merge(result_away, left_on= 'HomeTeam', right_on='AwayTeam')
# Visualizando uma amostra
results.head(3)

Unnamed: 0,HomeTeam,LH,DH,WH,AwayTeam,LA,DA,WA
0,Arsenal,1.0,2.0,7.0,Arsenal,3.0,2.0,5.0
1,Aston Villa,0.0,1.0,9.0,Aston Villa,4.0,3.0,4.0
2,Bournemouth,3.0,3.0,3.0,Bournemouth,5.0,1.0,4.0


In [12]:
# Criando as colunas com o total de W, D, e L
results['W'] = results['WH'] + results['WA']
results['D'] = results['DH'] + results['DA']
results['L'] = results['LH'] + results['LA']
# removendo colunas desnecessaria
results.drop(columns='AwayTeam', inplace=True)
# Visualizando os dados
results.head(3)

Unnamed: 0,HomeTeam,LH,DH,WH,LA,DA,WA,W,D,L
0,Arsenal,1.0,2.0,7.0,3.0,2.0,5.0,12.0,4.0,4.0
1,Aston Villa,0.0,1.0,9.0,4.0,3.0,4.0,13.0,4.0,4.0
2,Bournemouth,3.0,3.0,3.0,5.0,1.0,4.0,7.0,4.0,8.0


In [13]:
# Formatando as colunas para o tipo inteiro
cols = results.columns[1:]
results[cols] = results[cols].astype(int)
# Renomeando a coluna HomeTeam
results.rename(columns={'HomeTeam':'TEAM'}, inplace=True)
# Criando a coluna de pontuação
results['P'] = results['DH']  + results['DA'] + (results['WH']*3) + (results['WA']*3) 
# Organizando a visualização por pontuação
results = results.sort_values(by='P', ascending=False)
results = results.reset_index(drop=True)
# results.index = results.index + 1 
results = results.set_index('TEAM')
results.head(3)

Unnamed: 0_level_0,LH,DH,WH,LA,DA,WA,W,D,L,P
TEAM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Liverpool,0,2,8,1,4,5,13,6,1,45
Aston Villa,0,1,9,4,3,4,13,4,4,43
Man City,0,3,6,3,1,7,13,4,3,43


In [14]:
# Adicionando informações de gols a tabela para finalizar a tabela de classificação
# agrupando a tabela result_goals e result
tb_classification = pd.merge(result_goals, results, left_index=True, right_index=True)
# Formatando o tipo de dados das colunas e filtrando as colunas alvo
tb_classification = tb_classification[['MATCH','W', 'D', 'L','GF','GA','GD','P']].astype(int)
# Ordenando as colunas pelas regras da competicao
tb_classification = tb_classification.sort_values(by=['P','W','GD'], ascending=[False, False, False])
# Resetando o index para criar a classifica numérica
tb_classification = tb_classification.reset_index()
# Ajustando ara iniciar em 1
tb_classification.index = tb_classification.index + 1
# # renomeando as colunas index para team
tb_classification.rename(columns={'index':'TEAM'}, inplace=True)
# Visualizando as amostras
tb_classification

Unnamed: 0,TEAM,MATCH,W,D,L,GF,GA,GD,P
1,Liverpool,20,13,6,1,43,18,25,45
2,Man City,20,13,4,3,48,23,25,43
3,Aston Villa,21,13,4,4,43,27,16,43
4,Arsenal,20,12,4,4,37,20,17,40
5,Tottenham,21,12,4,5,44,31,13,40
6,West Ham,20,10,4,6,33,30,3,34
7,Man United,21,10,2,9,24,29,-5,32
8,Chelsea,21,9,4,8,35,31,4,31
9,Brighton,20,8,7,5,38,33,5,31
10,Newcastle,21,9,2,10,41,32,9,29


# 3. Criando a tabela dos gols médios
_____________

In [15]:
# Filtrando apenas gols  em casa e fora
pl = pl[['HomeTeam','AwayTeam','FTHG','FTAG']]
pl.head(3)

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG
0,Burnley,Man City,0,3
1,Arsenal,Nott'm Forest,2,1
2,Bournemouth,West Ham,1,1


In [16]:
# Criando tabela de media de gols marcados HomeTeam e sofridos HomeTeam
Home = pl.groupby('HomeTeam').mean(numeric_only=True)
Home.columns=['HomeGoalsFor', 'HomeGoalsAgainst']
Home.head(3)

Unnamed: 0_level_0,HomeGoalsFor,HomeGoalsAgainst
HomeTeam,Unnamed: 1_level_1,Unnamed: 2_level_1
Arsenal,2.2,1.0
Aston Villa,2.9,0.8
Bournemouth,1.222222,1.333333


In [17]:
# Criando tabela de media de gols sofridos AwayTeam em gols marcados AwayTeam
Away = pl.groupby('AwayTeam').mean(numeric_only=True)
Away.columns=['AwayGoalsAgainst', 'AwayGoalsFor']
Away.head(3)

Unnamed: 0_level_0,AwayGoalsAgainst,AwayGoalsFor
AwayTeam,Unnamed: 1_level_1,Unnamed: 2_level_1
Arsenal,1.0,1.5
Aston Villa,1.727273,1.272727
Bournemouth,2.3,1.7


In [18]:
# Agrupando as tabelas
tb_data = Home.merge(Away, left_index=True, right_index=True)
tb_data = tb_data.sort_index().reset_index()
tb_data.rename(columns={'HomeTeam':'Team'}, inplace=True)
tb_data

Unnamed: 0,Team,HomeGoalsFor,HomeGoalsAgainst,AwayGoalsAgainst,AwayGoalsFor
0,Arsenal,2.2,1.0,1.0,1.5
1,Aston Villa,2.9,0.8,1.727273,1.272727
2,Bournemouth,1.222222,1.333333,2.3,1.7
3,Brentford,1.7,1.8,1.444444,1.0
4,Brighton,2.2,1.4,1.9,1.6
5,Burnley,1.0,2.272727,1.7,1.0
6,Chelsea,1.636364,1.272727,1.7,1.7
7,Crystal Palace,1.1,1.4,1.5,1.1
8,Everton,1.0,1.090909,1.6,1.3
9,Fulham,1.9,1.2,2.181818,0.818182


# 4. Calculando a probabilidade dos resultados (utilizando a distribuição de Poisson)
___________

Vamos fazer o cálculo da probabilidade de Vitória da Casa, Empate e Vitória do Visitante.

1. P_Draw - Empate
2. P_Home - Vitória Casa
3. P_Away - Vitória Visitante

___________

A distribuição de Poisson é uma distribuição de probabilidade discreta que modela o número de eventos raros que ocorrem em um intervalo fixo de tempo ou espaço, dado um taxa média de ocorrência desses eventos. Ela é nomeada em homenagem ao matemático francês Siméon Denis Poisson.

A lógica por trás da distribuição de Poisson pode ser explicada pelos seguintes pontos-chave:

Eventos Raros e Independentes: A distribuição de Poisson é apropriada para modelar eventos que são raros e independentes um do outro. Eventos raros significam que a probabilidade de mais de um evento ocorrer em um intervalo muito curto de tempo é praticamente nula. Além disso, a ocorrência de um evento não afeta a ocorrência de outros eventos.

Taxa Média de Ocorrência (λ): A distribuição de Poisson é caracterizada por um único parâmetro, denotado por λ (lambda), que representa a taxa média de ocorrência dos eventos no intervalo considerado. Se λ é a média de eventos por unidade de tempo ou espaço, então a distribuição de Poisson modela o número de eventos que ocorrem em uma unidade de tempo ou espaço.

Probabilidade de Ocorrência de k Eventos: A função de massa de probabilidade (PMF) da distribuição de Poisson é dada por:

P(X = k) = (e^(-λ) * λ^k) / k!,

onde:

P(X = k) é a probabilidade de exatamente k eventos ocorrerem,
λ é a taxa média de ocorrência,
e é a base do logaritmo natural (aproximadamente 2.71828),
k! é o fatorial de k.
Essa fórmula expressa a probabilidade de exatamente k eventos ocorrerem em um dado intervalo de tempo ou espaço, dado a taxa média de ocorrência λ.

Esperança e Variância: A esperança (valor médio) e a variância de uma distribuição de Poisson são ambos iguais a λ. Isso significa que, em média, espera-se que o número de eventos observados seja igual à taxa média de ocorrência.

A distribuição de Poisson é amplamente utilizada em diversas áreas, como estatística, teoria das filas, teoria de confiabilidade e modelagem de fenômenos naturais, onde eventos raros e independentes são comuns.

In [19]:
# Prevendo resultados para dodos os jogos com base nas estatístas até o momento
# importando poisson
from scipy.stats import poisson
lista_team = list(tb_data['Team'].unique())
result_list = []
for home_team in lista_team:
    for away_team in lista_team:
        # calculando o lambda home e lambda away
        lambda_home = tb_data.loc[tb_data['Team'] == home_team, 'HomeGoalsFor'].iloc[0] * tb_data.loc[tb_data['Team'] == away_team, 'AwayGoalsAgainst'].iloc[0]
        lambda_away = tb_data.loc[tb_data['Team'] == away_team, 'AwayGoalsFor'].iloc[0] * tb_data.loc[tb_data['Team'] == home_team, 'HomeGoalsAgainst'].iloc[0]
        # Variáveis home, draw e away
        pv_home = 0
        pv_draw = 0
        pv_away = 0

        # Calculando a probabilidade de vitoria do time da casa, empate e do time visitante
        for i in range(9):
            for j in range(9):
                p_result = poisson.pmf(i, lambda_home) * poisson.pmf(j, lambda_away)
                if i == j:
                    pv_draw += p_result
                elif i > j:
                    pv_home += p_result
                elif i < j:
                    pv_away += p_result

        # Adicionando os resultados a lista
        result_list.append({
            'HomeTeam': home_team,
            'AwayTeam': away_team,
            'P_Draw': pv_draw,
            'P_Home': pv_home,
            'P_Away': pv_away
        })
        
# Criando uma previsão para todos os jogos possíveis
result_df = pd.DataFrame(result_list)
# Filtrando para remover confrontos do memso time
result_df = result_df[result_df['HomeTeam']!= result_df['AwayTeam']]
# resetando o index
result_df.reset_index(drop=True, inplace=True)
# visualizando uma amostra dosdos
result_df


Unnamed: 0,HomeTeam,AwayTeam,P_Draw,P_Home,P_Away
0,Arsenal,Aston Villa,0.100165,0.804265,0.079578
1,Arsenal,Bournemouth,0.068968,0.798144,0.060741
2,Arsenal,Brentford,0.119523,0.791811,0.083193
3,Arsenal,Brighton,0.098164,0.782636,0.091938
4,Arsenal,Burnley,0.088140,0.840708,0.056566
...,...,...,...,...,...
375,Wolves,Newcastle,0.149419,0.615069,0.224768
376,Wolves,Nott'm Forest,0.139088,0.721593,0.133589
377,Wolves,Sheffield United,0.059023,0.892683,0.026930
378,Wolves,Tottenham,0.169055,0.342830,0.482298


# 5. Calculando a probabilidade do número de gols
____________

In [20]:
import pandas as pd
from scipy.stats import poisson

lista_team = list(tb_data['Team'].unique())

goals_list = []

for home_team in lista_team:
    for away_team in lista_team:
        # Select the Team
        lambda_home = tb_data.loc[tb_data['Team'] == home_team, 'HomeGoalsFor'].iloc[0] * tb_data.loc[
            tb_data['Team'] == away_team, 'AwayGoalsAgainst'].iloc[0]

        lambda_away = tb_data.loc[tb_data['Team'] == away_team, 'AwayGoalsFor'].iloc[0] * tb_data.loc[
            tb_data['Team'] == home_team, 'HomeGoalsAgainst'].iloc[0]

        # Criar colunas para previsão de 0 a 9 gols
        for goals in range(10):
            p_goals = 0
            for i in range(goals + 1):
                j = goals - i
                if j < 9:
                    p_goals += poisson.pmf(i, lambda_home) * poisson.pmf(j, lambda_away)
            
            # Adicionar à lista de resultados
            goals_list.append({
                'HomeTeam': home_team,
                'AwayTeam': away_team,
                'TotalGoals': goals,
                'P_TotalGoals': p_goals
            })

# Criando um DataFrame a partir da lista de resultados
data_gols = pd.DataFrame(goals_list)

# Utilizando unstack para mover 'TotalGoals' e 'P_TotalGoals' para índices
data_gols = data_gols.set_index(['HomeTeam', 'AwayTeam', 'TotalGoals']).unstack().reset_index()
# Renomear as colunas
data_gols.columns = data_gols.columns.droplevel(1)
# Renomeando as colunas
data_gols.columns=['HomeTeam','AwayTeam','PG_0', 'PG_1','PG_2','PG_3','PG_4','PG_5','PG_6','PG_7','PG_8','PG_9'] 
# Filtrando para remover confrontos do mesmo time
data_gols = data_gols[data_gols['HomeTeam']!= data_gols['AwayTeam']]
# resetando o index
data_gols.reset_index(drop=True, inplace=True)
# visualizando uma amostra dosdos
data_gols

Unnamed: 0,HomeTeam,AwayTeam,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
0,Arsenal,Aston Villa,0.006265,0.031782,0.080611,0.136306,0.172861,0.175375,0.148272,0.107449,0.068132,0.038402
1,Arsenal,Bournemouth,0.001159,0.007836,0.026487,0.059684,0.100866,0.136371,0.153644,0.148377,0.125378,0.094173
2,Arsenal,Brentford,0.015333,0.064056,0.133806,0.186337,0.194619,0.162615,0.113228,0.067577,0.035290,0.016382
3,Arsenal,Brighton,0.003089,0.017853,0.051595,0.099405,0.143641,0.166049,0.159960,0.132082,0.095429,0.061286
4,Arsenal,Burnley,0.008739,0.041421,0.098168,0.155106,0.183800,0.174243,0.137652,0.093210,0.055227,0.029086
...,...,...,...,...,...,...,...,...,...,...,...,...
375,Wolves,Newcastle,0.003151,0.018150,0.052273,0.100364,0.144525,0.166493,0.159833,0.131520,0.094694,0.060592
376,Wolves,Nott'm Forest,0.010052,0.046238,0.106348,0.163068,0.187528,0.172526,0.132270,0.086920,0.049979,0.025544
377,Wolves,Sheffield United,0.009095,0.042748,0.100457,0.157383,0.184925,0.173830,0.136167,0.091426,0.053713,0.028050
378,Wolves,Tottenham,0.003438,0.019506,0.055325,0.104614,0.148362,0.168323,0.159142,0.128967,0.091449,0.057421


# 6. Agrupando a tabela de probabilidades para resultados e número de gols
__________

In [21]:
# Agrupando as duas tabelas
tb_data_prob = result_df.merge(data_gols, how='left', on=['HomeTeam','AwayTeam'])
tb_data_prob

Unnamed: 0,HomeTeam,AwayTeam,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
0,Arsenal,Aston Villa,0.100165,0.804265,0.079578,0.006265,0.031782,0.080611,0.136306,0.172861,0.175375,0.148272,0.107449,0.068132,0.038402
1,Arsenal,Bournemouth,0.068968,0.798144,0.060741,0.001159,0.007836,0.026487,0.059684,0.100866,0.136371,0.153644,0.148377,0.125378,0.094173
2,Arsenal,Brentford,0.119523,0.791811,0.083193,0.015333,0.064056,0.133806,0.186337,0.194619,0.162615,0.113228,0.067577,0.035290,0.016382
3,Arsenal,Brighton,0.098164,0.782636,0.091938,0.003089,0.017853,0.051595,0.099405,0.143641,0.166049,0.159960,0.132082,0.095429,0.061286
4,Arsenal,Burnley,0.088140,0.840708,0.056566,0.008739,0.041421,0.098168,0.155106,0.183800,0.174243,0.137652,0.093210,0.055227,0.029086
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,Wolves,Newcastle,0.149419,0.615069,0.224768,0.003151,0.018150,0.052273,0.100364,0.144525,0.166493,0.159833,0.131520,0.094694,0.060592
376,Wolves,Nott'm Forest,0.139088,0.721593,0.133589,0.010052,0.046238,0.106348,0.163068,0.187528,0.172526,0.132270,0.086920,0.049979,0.025544
377,Wolves,Sheffield United,0.059023,0.892683,0.026930,0.009095,0.042748,0.100457,0.157383,0.184925,0.173830,0.136167,0.091426,0.053713,0.028050
378,Wolves,Tottenham,0.169055,0.342830,0.482298,0.003438,0.019506,0.055325,0.104614,0.148362,0.168323,0.159142,0.128967,0.091449,0.057421


# 7. Construindo a tabela de jogos realizado e não realizados
_________

In [22]:
# Obtendo a tabela principal
pl = df.copy()
# Selecionando as colunas
pl_cols = ['HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']
# Filtrando as colunas
tb_final = pl[pl_cols]
# Agrupando as tabelas de tb_data_prob ao de resultados
tb_final_stats = tb_final.merge(tb_data_prob, how='right', on=['HomeTeam','AwayTeam'])
# Identificando jogos não realizados
tb_final_stats_not_realized = tb_final_stats.loc[tb_final_stats['FTR'].isna()]
tb_final_stats_not_realized = tb_final_stats_not_realized.reset_index(drop=True)
# Identificando Jogos realizados
tb_final_stats_realized = tb_final_stats.loc[~tb_final_stats['FTR'].isna()]
tb_final_stats_realized = tb_final_stats_realized.reset_index(drop=True)

In [23]:
# Tabela de Jogos realizados
tb_final_stats_realized

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
0,Arsenal,Brighton,2.0,0.0,H,0.098164,0.782636,0.091938,0.003089,0.017853,0.051595,0.099405,0.143641,0.166049,0.159960,0.132082,0.095429,0.061286
1,Arsenal,Burnley,3.0,1.0,H,0.088140,0.840708,0.056566,0.008739,0.041421,0.098168,0.155106,0.183800,0.174243,0.137652,0.093210,0.055227,0.029086
2,Arsenal,Fulham,2.0,2.0,D,0.039767,0.885450,0.018966,0.003631,0.020401,0.057308,0.107322,0.150739,0.169376,0.158598,0.127290,0.089392,0.055803
3,Arsenal,Man City,1.0,0.0,H,0.177685,0.515217,0.304224,0.006862,0.034183,0.085147,0.141396,0.176102,0.175462,0.145686,0.103683,0.064566,0.035719
4,Arsenal,Man United,3.0,1.0,H,0.169687,0.708158,0.121242,0.036153,0.120027,0.199245,0.220498,0.183014,0.121521,0.067242,0.031892,0.013235,0.004882
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,Wolves,Liverpool,1.0,3.0,A,0.194221,0.254958,0.549962,0.018686,0.074369,0.147994,0.196339,0.195357,0.155504,0.103151,0.058649,0.029178,0.012777
199,Wolves,Man City,2.0,1.0,H,0.165811,0.246158,0.583496,0.006152,0.031321,0.079727,0.135295,0.172193,0.175324,0.148760,0.108189,0.068848,0.038552
200,Wolves,Newcastle,2.0,2.0,D,0.149419,0.615069,0.224768,0.003151,0.018150,0.052273,0.100364,0.144525,0.166493,0.159833,0.131520,0.094694,0.060592
201,Wolves,Nott'm Forest,1.0,1.0,D,0.139088,0.721593,0.133589,0.010052,0.046238,0.106348,0.163068,0.187528,0.172526,0.132270,0.086920,0.049979,0.025544


In [24]:
# tabela de Jogos Não realizados
tb_final_stats_not_realized

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
0,Arsenal,Aston Villa,,,,0.100165,0.804265,0.079578,0.006265,0.031782,0.080611,0.136306,0.172861,0.175375,0.148272,0.107449,0.068132,0.038402
1,Arsenal,Bournemouth,,,,0.068968,0.798144,0.060741,0.001159,0.007836,0.026487,0.059684,0.100866,0.136371,0.153644,0.148377,0.125378,0.094173
2,Arsenal,Brentford,,,,0.119523,0.791811,0.083193,0.015333,0.064056,0.133806,0.186337,0.194619,0.162615,0.113228,0.067577,0.035290,0.016382
3,Arsenal,Chelsea,,,,0.122840,0.731424,0.131081,0.004339,0.023607,0.064210,0.116435,0.158352,0.172287,0.156206,0.121395,0.082548,0.049895
4,Arsenal,Crystal Palace,,,,0.117898,0.787149,0.088039,0.012277,0.054020,0.118845,0.174305,0.191736,0.168728,0.123734,0.077775,0.042776,0.020913
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172,Wolves,Fulham,,,,0.109486,0.798311,0.082479,0.009693,0.044940,0.104178,0.161003,0.186617,0.173045,0.133717,0.088566,0.051328,0.026441
173,Wolves,Luton,,,,0.135973,0.716368,0.139908,0.007447,0.036488,0.089396,0.146014,0.178867,0.175290,0.143153,0.100207,0.061377,0.033415
174,Wolves,Man United,,,,0.234047,0.491457,0.274396,0.048801,0.147380,0.222543,0.224027,0.169140,0.102161,0.051421,0.022184,0.008375,0.002809
175,Wolves,Sheffield United,,,,0.059023,0.892683,0.026930,0.009095,0.042748,0.100457,0.157383,0.184925,0.173830,0.136167,0.091426,0.053713,0.028050


# 8. Analisando o resultado das probablidades dos jogos
________

In [25]:
# Analisando a estatistica prevista para jogos já realizados
import numpy as np

# Criando a função para avaliar
def validar_previsao(row):
    if row['FTR'] == 'D':
        return 'Acerto' if row['P_Draw'] > row['P_Home'] and row['P_Draw'] > row['P_Away'] else 'Erro'
    elif row['FTR'] == 'A':
        return 'Acerto' if row['P_Away'] > row['P_Draw'] and row['P_Away'] > row['P_Home'] else 'Erro'
    elif row['FTR'] == 'H':
        return 'Acerto' if row['P_Home'] > row['P_Draw'] and row['P_Home'] > row['P_Away'] else 'Erro'
    else:
        return np.nan

# Aplicar a função à linha do DataFrame
tb_final_stats_realized['AVALIACAO'] = tb_final_stats_realized.apply(validar_previsao, axis=1)

In [26]:
# Exibir o resultado
(tb_final_stats_realized.groupby('FTR')['AVALIACAO'].value_counts(normalize=1)*100).reset_index(name='ACERTO/ERRO (%)')

Unnamed: 0,FTR,AVALIACAO,ACERTO/ERRO (%)
0,A,Acerto,52.238806
1,A,Erro,47.761194
2,D,Erro,100.0
3,H,Acerto,91.666667
4,H,Erro,8.333333


## 8.1 Avaliação
________

#### 1. Quando o **HomeTeam** é indicado com **maior probabilidade**, houve um **acerto de 91.66%.**
#### 2. Quando o **AwayTeam** é indicado com **maior probabilidade**, houve um **acerto de 52.23%.**
#### 3. Quando a probabilidade maior é o **empate**, houve um **erro de 100%.**
_______

# 9. Selecionando um jogo para verificar a probabilidade
_________

In [27]:
# Lista de times da tabela de Classificação
print(tb_classification.TEAM.unique())

['Liverpool' 'Man City' 'Aston Villa' 'Arsenal' 'Tottenham' 'West Ham'
 'Man United' 'Chelsea' 'Brighton' 'Newcastle' 'Wolves' 'Everton'
 'Bournemouth' 'Fulham' 'Crystal Palace' "Nott'm Forest" 'Brentford'
 'Luton' 'Burnley' 'Sheffield United']


In [28]:
# Informando os times para o Resultado
time_A = 'Chelsea'
time_B = 'Liverpool'

# Verificando o Resultado
print('\nJOGO REALIZADO:')
display(tb_final_stats_realized.query(f"HomeTeam=='{time_A}' & AwayTeam=='{time_B}'"))
print('\nJOGO NÃO REALIZADO:')
display(tb_final_stats_not_realized.query(f"HomeTeam=='{time_A}' & AwayTeam=='{time_B}'"))


JOGO REALIZADO:


Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9,AVALIACAO
66,Chelsea,Liverpool,1.0,1.0,D,0.206316,0.293605,0.499607,0.022371,0.085009,0.161517,0.204588,0.194359,0.147713,0.093551,0.050785,0.024123,0.010121,Erro



JOGO NÃO REALIZADO:


Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9


In [29]:
# Informando os times para o Resultado
time_A = 'Liverpool'
time_B = 'Chelsea'

# Verificando o Resultado
print('\nJOGO REALIZADO:')
display(tb_final_stats_realized.query(f"HomeTeam=='{time_A}' & AwayTeam=='{time_B}'"))
print('\nJOGO NÃO REALIZADO:')
display(tb_final_stats_not_realized.query(f"HomeTeam=='{time_A}' & AwayTeam=='{time_B}'"))


JOGO REALIZADO:


Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9,AVALIACAO



JOGO NÃO REALIZADO:


Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
90,Liverpool,Chelsea,,,,0.077198,0.82625,0.059874,0.003089,0.017853,0.051595,0.099405,0.143641,0.166049,0.15996,0.132082,0.095429,0.061286


# 10. Calculando a pontuação final do campeonato
_________

In [30]:
# Verificando a tabela de Classificação com a pontuação atual
tb_classification_atualizada = tb_classification[['TEAM', 'P']]
tb_classification_atualizada

Unnamed: 0,TEAM,P
1,Liverpool,45
2,Man City,43
3,Aston Villa,43
4,Arsenal,40
5,Tottenham,40
6,West Ham,34
7,Man United,32
8,Chelsea,31
9,Brighton,31
10,Newcastle,29


In [31]:
# Verificando os jogos não realizados
tb_final_stats_not_realized.head(3)

Unnamed: 0,HomeTeam,AwayTeam,FTHG,FTAG,FTR,P_Draw,P_Home,P_Away,PG_0,PG_1,PG_2,PG_3,PG_4,PG_5,PG_6,PG_7,PG_8,PG_9
0,Arsenal,Aston Villa,,,,0.100165,0.804265,0.079578,0.006265,0.031782,0.080611,0.136306,0.172861,0.175375,0.148272,0.107449,0.068132,0.038402
1,Arsenal,Bournemouth,,,,0.068968,0.798144,0.060741,0.001159,0.007836,0.026487,0.059684,0.100866,0.136371,0.153644,0.148377,0.125378,0.094173
2,Arsenal,Brentford,,,,0.119523,0.791811,0.083193,0.015333,0.064056,0.133806,0.186337,0.194619,0.162615,0.113228,0.067577,0.03529,0.016382


In [32]:
# Calculando a pontuação prevista a partir das probabilidades em casa
pontuacao_casa = tb_final_stats_not_realized.groupby('HomeTeam')[['P_Home', 'P_Draw']].sum().reset_index()
pontuacao_casa['Pts_casa'] = 3 * pontuacao_casa['P_Home'] + pontuacao_casa['P_Draw']
pontuacao_casa = pontuacao_casa.set_index('HomeTeam')
pontuacao_casa.head(3)

Unnamed: 0_level_0,P_Home,P_Draw,Pts_casa
HomeTeam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arsenal,6.85014,0.979757,21.530178
Aston Villa,7.235817,0.57076,22.278213
Bournemouth,4.66242,1.963665,15.950925


In [33]:
# Calculando a pontuação prevista a partir das probabilidades fora
pontuacao_fora = tb_final_stats_not_realized.groupby('AwayTeam')[['P_Away','P_Draw']].sum().reset_index()
pontuacao_fora['Pts_fora'] = 3 * pontuacao_fora['P_Away'] + pontuacao_fora['P_Draw']
pontuacao_fora = pontuacao_fora.set_index('AwayTeam')
pontuacao_fora.head(3)

Unnamed: 0_level_0,P_Away,P_Draw,Pts_fora
AwayTeam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arsenal,4.62781,1.611635,15.495064
Aston Villa,1.939919,1.150692,6.970448
Bournemouth,1.825442,0.959604,6.43593


In [34]:
# criando uma função para atualizar a pontuação
def atualizar_pontuacao_previsao(row):
    time = row['TEAM']
    pontuacao = row['P'] + pontuacao_casa.loc[time, 'Pts_casa'] + pontuacao_fora.loc[time, 'Pts_fora']
    return pontuacao

# Aplicando a função na tabela
tb_classification_atualizada['P'] = tb_classification_atualizada.apply(atualizar_pontuacao_previsao, axis=1)
# Ordenando a classificação com base na pontuação
tb_classification_atualizada = tb_classification_atualizada.sort_values(by='P', ascending=False)
# reiniciando o index para ajustar a nova classificação
tb_classification_atualizada = tb_classification_atualizada.reset_index(drop=True)
# Ajustando para  número de classificação começar em 1
tb_classification_atualizada.index = tb_classification_atualizada.index+1
# Visualizando a tabela da probabilidade Prevista
tb_classification_atualizada


Unnamed: 0,TEAM,P
1,Liverpool,82.550063
2,Man City,81.885171
3,Arsenal,77.025242
4,Aston Villa,72.24866
5,Tottenham,68.31367
6,West Ham,62.773506
7,Brighton,59.615262
8,Newcastle,57.233085
9,Chelsea,55.651518
10,Man United,53.341559


# 11. Visualizando as tabelas e comparando: Atual / Previsto
_____________

In [35]:
# Visualizando os dados das duas tabelas
pd.concat([tb_classification, tb_classification_atualizada], axis=1)

Unnamed: 0,TEAM,MATCH,W,D,L,GF,GA,GD,P,TEAM.1,P.1
1,Liverpool,20,13,6,1,43,18,25,45,Liverpool,82.550063
2,Man City,20,13,4,3,48,23,25,43,Man City,81.885171
3,Aston Villa,21,13,4,4,43,27,16,43,Arsenal,77.025242
4,Arsenal,20,12,4,4,37,20,17,40,Aston Villa,72.24866
5,Tottenham,21,12,4,5,44,31,13,40,Tottenham,68.31367
6,West Ham,20,10,4,6,33,30,3,34,West Ham,62.773506
7,Man United,21,10,2,9,24,29,-5,32,Brighton,59.615262
8,Chelsea,21,9,4,8,35,31,4,31,Newcastle,57.233085
9,Brighton,20,8,7,5,38,33,5,31,Chelsea,55.651518
10,Newcastle,21,9,2,10,41,32,9,29,Man United,53.341559


# 12. Análise Final
___________

### O presente projeto tem por objetivo:

#### 1. Calcular a probabilidade de resultado dos jogos (vitória da casa, empate e vitória do visitante).
#### 2. A probabilidade de número de gols do jogo (de 0 a 9 gols).
#### 3. A probabilidade da pontuação final da esquipes da Premier League 2023-2024.
___________