An R package for soccer modelling
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R Clarify docs Nov 18, 2018
data Save sample data as tibble Nov 9, 2018
man Make logo a reasonable size... Dec 6, 2018
tests
.Rbuildignore Use pkgdown Dec 5, 2018
.gitattributes
.gitignore Use logo Dec 6, 2018
.travis.yml
DESCRIPTION
LICENSE Add license Apr 24, 2018
NAMESPACE
NEWS.md Use pkgdown Dec 5, 2018
README.Rmd Remove title from README Dec 6, 2018
README.md Remove title from README Dec 6, 2018
appveyor.yml
codecov.yml
regista.Rproj Document on check Aug 14, 2018

README.md

regista

Build Status AppVeyor Build Status Coverage status Lifecycle: experimental

Overview

regista is a package for performing some of the common modelling tasks in soccer analytics.

Installation

regista is not currently available on CRAN but can be downloaded from github like so:

# install.packages("devtools")
devtools::install_github("torvaney/regista")

Examples

Dixon-Coles

The “Dixon-Coles model” is a modified poisson model, specifically designed for estimating teams’ strengths and for predicting football matches.

Regista provides an implementation of this model:

library(regista)

fit <- dixoncoles(hgoal, agoal, home, away, data = premier_league_2010)

print(fit)
#> 
#> Dixon-Coles model with specification:
#> 
#> Home goals: hgoal ~ off(home) + def(away) + hfa + 0
#> Away goals: agoal ~ off(away) + def(home) + 0
#> Weights   : 1

The Dixon-Coles model provides estimates of each team’s offensive and defensive strength, along with an estimate of home-field advantage (hfa):

parameters <- tibble::tibble(
  parameter = names(fit$par),
  value     = fit$par
)

parameters
#> # A tibble: 42 x 2
#>    parameter                value
#>    <chr>                    <dbl>
#>  1 off___Arsenal           0.297 
#>  2 off___Aston Villa      -0.102 
#>  3 off___Birmingham City  -0.345 
#>  4 off___Blackburn Rovers -0.133 
#>  5 off___Blackpool         0.0696
#>  6 off___Bolton Wanderers -0.0303
#>  7 off___Chelsea           0.232 
#>  8 off___Everton          -0.0604
#>  9 off___Fulham           -0.0807
#> 10 off___Liverpool         0.0901
#> # ... with 32 more rows

Regista also comes with a predict method, to predict the goalscoring rate of either team, or the probabilities of different possible scorelines or match outcomes:

# Create a copy of the original data and attach predictions
to_predict <- premier_league_2010
to_predict$predictions <- predict(fit, newdata = premier_league_2010)

to_predict
#> # A tibble: 380 x 8
#>    date      home    away          hgoal agoal result hfa   predictions   
#>    <chr>     <fct>   <fct>         <dbl> <dbl> <fct>  <lgl> <list>        
#>  1 2011-05-… Arsenal Aston Villa       1     2 A      TRUE  <tibble [2 × …
#>  2 2010-10-… Arsenal Birmingham C…     2     1 H      TRUE  <tibble [2 × …
#>  3 2011-04-… Arsenal Blackburn Ro…     0     0 D      TRUE  <tibble [2 × …
#>  4 2010-08-… Arsenal Blackpool         6     0 H      TRUE  <tibble [2 × …
#>  5 2010-09-… Arsenal Bolton Wande…     4     1 H      TRUE  <tibble [2 × …
#>  6 2010-12-… Arsenal Chelsea           3     1 H      TRUE  <tibble [2 × …
#>  7 2011-02-… Arsenal Everton           2     1 H      TRUE  <tibble [2 × …
#>  8 2010-12-… Arsenal Fulham            2     1 H      TRUE  <tibble [2 × …
#>  9 2011-04-… Arsenal Liverpool         1     1 D      TRUE  <tibble [2 × …
#> 10 2011-01-… Arsenal Manchester C…     0     0 D      TRUE  <tibble [2 × …
#> # ... with 370 more rows

The regista package is designed to work fluidly with the tidyverse and tidy principles. For instance, predictions can be handled easily with the broom package.

To get predictions of Home/Draw/Away probabilities as columns in a dataframe, you can use the broom::augment function:

library(tidyverse)
library(broom)

fit %>% 
  augment(newdata = premier_league_2010, type = "outcomes") %>% 
  unnest() %>%
  mutate(prob = scales::percent(prob, 2)) %>%  # Prettify output
  spread(outcome, prob) %>% 
  select(home, away, home_win, draw, away_win, result)
#> # A tibble: 380 x 6
#>    home                  away               home_win draw  away_win result
#>    <fct>                 <fct>              <chr>    <chr> <chr>    <fct> 
#>  1 Aston Villa           West Ham United    56%      24%   20%      H     
#>  2 Blackburn Rovers      Everton            34%      30%   36%      H     
#>  3 Bolton Wanderers      Fulham             38%      30%   30%      D     
#>  4 Chelsea               West Bromwich Alb… 78%      16%   8%       H     
#>  5 Sunderland            Birmingham City    50%      30%   20%      D     
#>  6 Tottenham Hotspur     Manchester City    32%      32%   36%      D     
#>  7 Wigan Athletic        Blackpool          46%      26%   28%      A     
#>  8 Wolverhampton Wander… Stoke City         36%      30%   34%      H     
#>  9 Liverpool             Arsenal            40%      28%   32%      D     
#> 10 Manchester United     Newcastle United   72%      18%   10%      H     
#> # ... with 370 more rows

Or to get model parameters in a table format:

tidy(fit)
#> # A tibble: 42 x 3
#>    parameter team               value
#>    <chr>     <chr>              <dbl>
#>  1 off       Arsenal           0.297 
#>  2 off       Aston Villa      -0.102 
#>  3 off       Birmingham City  -0.345 
#>  4 off       Blackburn Rovers -0.133 
#>  5 off       Blackpool         0.0696
#>  6 off       Bolton Wanderers -0.0303
#>  7 off       Chelsea           0.232 
#>  8 off       Everton          -0.0604
#>  9 off       Fulham           -0.0807
#> 10 off       Liverpool         0.0901
#> # ... with 32 more rows

(Note that team parameter estimates are in log space).

A more flexible api is provided with dixoncoles_ext, which allows the base Dixon-Coles model to be extended arbitrarily.

There are some more extensive examples and analyses using regista available at the following links:

Other options

  • The goalmodel R package contains an implementation of the Dixon-Coles model, along with some additional method for modelling the number of goals scored in sports games.