# Bayesian Bivariate Model

In [1]:
import sys

sys.path.append("../../")

import penaltyblog as pb



## Get data from football-data.co.uk

In [2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()

df.head()

Unnamed: 0_level_0,date,datetime,season,competition,div,time,team_home,team_away,fthg,ftag,...,b365_cahh,b365_caha,pcahh,pcaha,max_cahh,max_caha,avg_cahh,avg_caha,goals_home,goals_away
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1565308800---liverpool---norwich,2019-08-09,2019-08-09 20:00:00,2019-2020,ENG Premier League,E0,20:00,Liverpool,Norwich,4,1,...,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99,4,1
1565395200---bournemouth---sheffield_united,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Bournemouth,Sheffield United,1,1,...,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92,1,1
1565395200---burnley---southampton,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Burnley,Southampton,3,0,...,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02,3,0
1565395200---crystal_palace---everton,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Crystal Palace,Everton,0,0,...,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93,0,0
1565395200---tottenham---aston_villa,2019-08-10,2019-08-10 17:30:00,2019-2020,ENG Premier League,E0,17:30,Tottenham,Aston Villa,3,1,...,2.1,1.7,2.18,1.77,2.21,1.87,2.08,1.8,3,1


## Train the Model

In [3]:
clf = pb.models.BayesianBivariateGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()

Only 312 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (8 chains in 8 jobs)
NUTS: [tau_att, atts_star, tau_def, def_star, tau_rho, rho, mu, eta]


Sampling 8 chains for 2_000 tune and 312 draw iterations (16_000 + 2_496 draws total) took 14 seconds.


## The model's parameters

In [4]:
clf

Module: Penaltyblog

Model: Bayesian Random Intercept

Number of parameters: 62
Team                 Attack               Defence              rho                 
--------------------------------------------------------------------------------
Arsenal              0.358                -0.106               -0.146              
Aston Villa          -0.167               0.541                -0.207              
Bournemouth          -0.829               0.182                0.044               
Brighton             -0.368               0.239                -0.251              
Burnley              -0.004               0.255                -0.37               
Chelsea              0.354                -0.07                0.117               
Crystal Palace       -0.618               0.219                -0.412              
Everton              -0.485               -0.024               -0.009              
Leicester            0.736                -0.219               -0.221              

## Predict Match Outcomes

In [5]:
probs = clf.predict("Liverpool", "Wolves")
probs

Module: Penaltyblog

Class: FootballProbabilityGrid

Home Goal Expectation: 1.5061418732653107
Away Goal Expectation: 0.9006395259292537

Home Win: 0.5140247310308917
Draw: 0.260587847680428
Away Win: 0.22538742120150582

### 1x2 Probabilities

In [6]:
probs.home_draw_away

[0.5140247310308917, 0.260587847680428, 0.22538742120150582]

In [7]:
probs.home_win

0.5140247310308917

In [8]:
probs.draw

0.260587847680428

In [9]:
probs.away_win

0.22538742120150582

### Probablity of Total Goals >1.5

In [10]:
probs.total_goals("over", 1.5)

0.6930325074761895

### Probability of Asian Handicap 1.5

In [11]:
probs.asian_handicap("home", 1.5)

0.2629398273490633

## Probability of both teams scoring

In [12]:
probs.both_teams_to_score

0.4620311858837887

## Train the model with more recent data weighted to be more important

In [13]:
weights = pb.models.dixon_coles_weights(df["date"], 0.001)

clf = pb.models.BayesianBivariateGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weights
)
clf.fit()

Only 312 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (8 chains in 8 jobs)
NUTS: [tau_att, atts_star, tau_def, def_star, tau_rho, rho, mu, eta]


Sampling 8 chains for 2_000 tune and 312 draw iterations (16_000 + 2_496 draws total) took 13 seconds.


In [14]:
clf

Module: Penaltyblog

Model: Bayesian Random Intercept

Number of parameters: 62
Team                 Attack               Defence              rho                 
--------------------------------------------------------------------------------
Arsenal              0.395                -0.119               -0.138              
Aston Villa          -0.239               0.501                -0.208              
Bournemouth          -0.824               0.136                0.072               
Brighton             -0.405               0.249                -0.246              
Burnley              -0.076               0.197                -0.33               
Chelsea              0.362                -0.05                0.114               
Crystal Palace       -0.63                0.229                -0.38               
Everton              -0.431               -0.022               -0.031              
Leicester            0.636                -0.213               -0.162              