# Betting strategies 
## _England Premiere League Results & Odds Dataset_

### I. Introduction

> You do not know anything about sports betting or you want to implement a new strategy? This notebook will help you understand this world, and also provide you a **betting strategy** that you will be able to apply on your own.

In this report, we will take the example of Football, the sport with the highest number of bets. More precisely, we will concentrate on **England Premiere League results**, because its great number of matches will allow us to test and build a strong strategy. 

The dataset we chose groups the results of these matches since 2008 with data like date time, Home team/Away team, the goals, the match result, the referee for the match etc. There are also odds set on matches' results or number of goals, taken from different betting sites. For more explanations on betting vocabulary, read the following chapter.

In [2]:
# Useful starting lines
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import collections  as mc
%load_ext autoreload
%autoreload 2
import pandas as pd 
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
sns.set_style("white")

In [3]:
data = pd.read_csv('https://raw.githubusercontent.com/abdul232/DMML_Team_Rolex/master/data/England%202008%202018%20Premiere%20League%20Clean%20DATA.csv', sep=';')
# view the first 10 rows 
data.head(10)

Unnamed: 0,Match_ID,Date,HomeTeam,AwayTeam,Home Team Goals,Away Team Goals,Match Result,Referee,Home Team Shots,Away Team Shots,...,Betbrain Average Home,Betbrain Maximum Draw,Betbrain Average Draw,Betbrain Maximum Away,Betbrain Average Away,Betbrain Numbers of bookmakers Goals,Betbrain Max > 2.5 Goals,Betbrain Average > 2.5 Goals,Betbrain Max < 2.5 Goals,Betbrain Average < 2.5 Goals
0,1.0,16.08.2008,Arsenal,West Brom,1.0,0.0,H,H Webb,24.0,5.0,...,1.22,6.25,5.6,17.0,13.52,37.0,1.71,1.65,2.25,2.14
1,2.0,16.08.2008,Bolton,Stoke,3.0,1.0,H,C Foy,14.0,8.0,...,1.81,3.51,3.35,5.3,4.54,37.0,2.33,2.16,1.7,1.64
2,3.0,16.08.2008,Everton,Blackburn,2.0,3.0,A,A Marriner,10.0,15.0,...,1.98,3.39,3.25,4.21,3.86,37.0,2.34,2.17,1.7,1.63
3,4.0,16.08.2008,Hull,Fulham,2.0,1.0,H,P Walton,11.0,12.0,...,2.56,3.38,3.23,2.89,2.68,37.0,2.32,2.15,1.69,1.64
4,5.0,16.08.2008,Middlesbrough,Tottenham,2.0,1.0,H,M Atkinson,14.0,8.0,...,3.19,3.55,3.31,2.3,2.2,37.0,1.9,1.81,2.12,1.92
5,6.0,16.08.2008,Sunderland,Liverpool,0.0,1.0,A,A Wiley,6.0,14.0,...,5.24,3.75,3.5,1.76,1.68,37.0,2.22,2.08,1.76,1.69
6,7.0,16.08.2008,West Ham,Wigan,2.0,1.0,H,S Bennett,15.0,22.0,...,1.94,3.4,3.27,4.4,3.98,37.0,2.22,2.1,1.75,1.68
7,8.0,17.08.2008,Aston Villa,Man City,4.0,2.0,H,P Dowd,14.0,13.0,...,1.84,3.55,3.32,5.05,4.36,34.0,2.33,2.12,1.73,1.66
8,9.0,17.08.2008,Chelsea,Portsmouth,4.0,0.0,H,M Dean,18.0,12.0,...,1.32,5.2,4.69,12.0,10.05,36.0,2.01,1.9,1.93,1.83
9,10.0,17.08.2008,Man United,Newcastle,1.0,1.0,D,M Riley,18.0,11.0,...,1.28,5.5,5.03,13.0,11.0,36.0,1.83,1.72,2.2,2.03


In [4]:
data.shape

(4182, 47)

In [5]:
#We have to drop the two lasts rows because they are NaN
data = data.drop([4180, 4181])

#We have a new dimension
data.shape

(4180, 47)

This dataset counts **4'182** rows for **47** columns.

##### Explanations on betting's vocabulary
First, let's set up the vocabulary. While betting,  _odds_ are linked to each bet you make. Betting odds tell you how likely an event is to happen, and represents how much money you could win if your bet realizes itself.


There is the possibility to bet on different type of results before a match. Here, we will take into account these type of bets: the one on the number of goals, the other on the match result (Home team wins, Away team wins, Draw match).

In [6]:
data.dtypes

Match_ID                                float64
Date                                     object
HomeTeam                                 object
AwayTeam                                 object
Home Team Goals                         float64
Away Team Goals                         float64
Match Result                             object
Referee                                  object
Home Team Shots                         float64
Away Team Shots                         float64
Home Team Shots on Target               float64
Away Team Shots on Target               float64
Home Fouls Committed                    float64
Away Fouls Committed                    float64
Home Corners                            float64
Away Cornners                           float64
Home Yellow Cards                       float64
Away Yellow Cards                       float64
Home Red Cards                          float64
Away Red Cards                          float64
B365 Home                               

In [7]:
data['Match_ID'] = data.Match_ID.astype(int)

In [8]:
#We would like to do 3 different regression, one for the Home team wins, one for the draws and ondre for the Away team wins
data = pd.get_dummies(data, columns=['Match Result'])

In [9]:
#We have to change the type of some variable
data['Date'] = pd.to_datetime(data['Date'], format="%d.%m.%Y")


In [10]:
data[['Home Team Goals', 'Away Team Goals', 'Home Team Shots','Away Team Shots', 'Home Team Shots on Target', 'Away Team Shots on Target', 'Home Fouls Committed', 'Away Fouls Committed', 'Home Corners', 'Away Cornners', 'Home Yellow Cards', 'Away Yellow Cards', 'Home Red Cards', 'Away Red Cards']]= data[['Home Team Goals', 'Away Team Goals', 'Home Team Shots','Away Team Shots', 'Home Team Shots on Target', 'Away Team Shots on Target', 'Home Fouls Committed', 'Away Fouls Committed', 'Home Corners', 'Away Cornners', 'Home Yellow Cards', 'Away Yellow Cards', 'Home Red Cards', 'Away Red Cards']].astype(int)

In [11]:
data.dtypes

Match_ID                                         int32
Date                                    datetime64[ns]
HomeTeam                                        object
AwayTeam                                        object
Home Team Goals                                  int32
Away Team Goals                                  int32
Referee                                         object
Home Team Shots                                  int32
Away Team Shots                                  int32
Home Team Shots on Target                        int32
Away Team Shots on Target                        int32
Home Fouls Committed                             int32
Away Fouls Committed                             int32
Home Corners                                     int32
Away Cornners                                    int32
Home Yellow Cards                                int32
Away Yellow Cards                                int32
Home Red Cards                                   int32
Away Red C

In [12]:
#We have to change the type of some variable
data['Date'] = pd.to_datetime(data['Date'], format="%d.%m.%Y")

In [13]:
from sklearn import preprocessing
# separate the data from the target attributes
#X = data['B365 Home','B365 Draw','B365 Away','Bet&Win Home','Bet&Win Draw','Bet&Win Away','Interwetten Home','Iterwetten Draw','Interwetten Away','William Hill Home','William Hill Draw','William Hill Away','VC Bet Home','VC Bet Draw','VC Bet Away']
# normalisation par formule (x - x.min()) / (x.max() - x.min())
cols_to_norm = ['B365 Home','B365 Draw','B365 Away','Bet&Win Home','Bet&Win Draw','Bet&Win Away','Interwetten Home','Iterwetten Draw','Interwetten Away','William Hill Home','William Hill Draw','William Hill Away','VC Bet Home','VC Bet Draw','VC Bet Away']
data[cols_to_norm] = data[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min())) 
data[cols_to_norm]

Unnamed: 0,B365 Home,B365 Draw,B365 Away,Bet&Win Home,Bet&Win Draw,Bet&Win Away,Interwetten Home,Iterwetten Draw,Interwetten Away,William Hill Home,William Hill Draw,William Hill Away,VC Bet Home,VC Bet Draw,VC Bet Away
0,0.006381,0.250000,0.348044,0.008521,0.234234,0.254373,0.007916,0.248619,0.354376,0.006015,0.264706,0.242424,0.009112,0.280,0.237866
1,0.035096,0.035714,0.084754,0.035088,0.036036,0.093777,0.036939,0.038674,0.110473,0.037594,0.029412,0.064171,0.036446,0.072,0.073406
2,0.042844,0.021429,0.072217,0.040100,0.031532,0.082306,0.050132,0.027624,0.078192,0.045113,0.029412,0.053030,0.045558,0.064,0.057361
3,0.070191,0.014286,0.042126,0.070175,0.022523,0.043590,0.071240,0.027624,0.053085,0.072682,0.009804,0.033422,0.070615,0.072,0.033293
4,0.097539,0.028571,0.028335,0.087719,0.027027,0.033553,0.097625,0.027624,0.038737,0.092732,0.039216,0.022950,0.097950,0.072,0.023265
5,0.202370,0.042857,0.013791,0.197995,0.040541,0.015486,0.182058,0.060773,0.019010,0.223058,0.049020,0.010027,0.202733,0.104,0.010229
6,0.038742,0.028571,0.077232,0.042607,0.027027,0.076570,0.039578,0.038674,0.096126,0.039098,0.039216,0.058601,0.038724,0.080,0.064380
7,0.038742,0.028571,0.080491,0.045113,0.027027,0.070835,0.050132,0.027624,0.078192,0.043108,0.039216,0.053030,0.038724,0.080,0.064380
8,0.012306,0.142857,0.222668,0.012531,0.166667,0.204187,0.013193,0.160221,0.264706,0.012531,0.137255,0.164439,0.012756,0.200,0.197754
9,0.010483,0.178571,0.247743,0.012531,0.166667,0.204187,0.013193,0.160221,0.264706,0.010025,0.166667,0.197861,0.010478,0.240,0.217810


In [14]:
round(data[cols_to_norm],3)

Unnamed: 0,B365 Home,B365 Draw,B365 Away,Bet&Win Home,Bet&Win Draw,Bet&Win Away,Interwetten Home,Iterwetten Draw,Interwetten Away,William Hill Home,William Hill Draw,William Hill Away,VC Bet Home,VC Bet Draw,VC Bet Away
0,0.006,0.250,0.348,0.009,0.234,0.254,0.008,0.249,0.354,0.006,0.265,0.242,0.009,0.280,0.238
1,0.035,0.036,0.085,0.035,0.036,0.094,0.037,0.039,0.110,0.038,0.029,0.064,0.036,0.072,0.073
2,0.043,0.021,0.072,0.040,0.032,0.082,0.050,0.028,0.078,0.045,0.029,0.053,0.046,0.064,0.057
3,0.070,0.014,0.042,0.070,0.023,0.044,0.071,0.028,0.053,0.073,0.010,0.033,0.071,0.072,0.033
4,0.098,0.029,0.028,0.088,0.027,0.034,0.098,0.028,0.039,0.093,0.039,0.023,0.098,0.072,0.023
5,0.202,0.043,0.014,0.198,0.041,0.015,0.182,0.061,0.019,0.223,0.049,0.010,0.203,0.104,0.010
6,0.039,0.029,0.077,0.043,0.027,0.077,0.040,0.039,0.096,0.039,0.039,0.059,0.039,0.080,0.064
7,0.039,0.029,0.080,0.045,0.027,0.071,0.050,0.028,0.078,0.043,0.039,0.053,0.039,0.080,0.064
8,0.012,0.143,0.223,0.013,0.167,0.204,0.013,0.160,0.265,0.013,0.137,0.164,0.013,0.200,0.198
9,0.010,0.179,0.248,0.013,0.167,0.204,0.013,0.160,0.265,0.010,0.167,0.198,0.010,0.240,0.218


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score


#tips = sns.load_dataset(data)
ax = sns.scatterplot(x="B365 Home", y="Match Result_H", data=data)

#pour faire la régression on peut prendre qu'une seule variable à la fois
#feature_home_wins = ['B365 Home','Bet&Win Home','Interwetten Home','William Hill Home','VC Bet Home']
#feature_home_wins = ['B365 Home']
#X = data[feature_home_wins]
#y = data['Match Result_H']
#We create the linear model.


#model = LinearRegression(fit_intercept=True)
#model.fit(X, y)

#predicted = model.predict(X)
#print(predicted)

#plt.scatter(X, y)
#plt.plot([min(X), max(X)], [min(predicted), max(predicted)], color='red') # predicted
#plt.show()


#mae = mean_absolute_error(predicted, y)
 
#print("MAE = %.2f" % mae)