# Betting strategies 
## _England Premiere League Results & Odds Dataset_

### I. Introduction

> You do not know anything about sports betting or you want to implement a new strategy? This notebook will help you understand this world, and also provide you a **betting strategy** that you will be able to apply on your own.

In this report, we will take the example of Football, the sport with the highest number of bets. More precisely, we will concentrate on **England Premiere League results**, because its great number of matches will allow us to test and build a strong strategy. 

The dataset we chose groups the results of these matches since 2008 with data like date time, Home team/Away team, the goals, the match result, the referee for the match etc. There are also odds set on matches' results or number of goals, taken from different betting sites. For more explanations on betting vocabulary, read the following chapter.

In [32]:
# Useful starting lines
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import collections  as mc
%load_ext autoreload
%autoreload 2
import pandas as pd 
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
sns.set_style("white")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [33]:
data = pd.read_csv('https://raw.githubusercontent.com/abdul232/DMML_Team_Rolex/master/data/England%202008%202018%20Premiere%20League%20Clean%20DATA.csv', sep=';')
# view the first 10 rows 
data.head(10)

Unnamed: 0,Match_ID,Date,HomeTeam,AwayTeam,Home Team Goals,Away Team Goals,Match Result,Referee,Home Team Shots,Away Team Shots,...,Betbrain Average Home,Betbrain Maximum Draw,Betbrain Average Draw,Betbrain Maximum Away,Betbrain Average Away,Betbrain Numbers of bookmakers Goals,Betbrain Max > 2.5 Goals,Betbrain Average > 2.5 Goals,Betbrain Max < 2.5 Goals,Betbrain Average < 2.5 Goals
0,1.0,16.08.2008,Arsenal,West Brom,1.0,0.0,H,H Webb,24.0,5.0,...,1.22,6.25,5.6,17.0,13.52,37.0,1.71,1.65,2.25,2.14
1,2.0,16.08.2008,Bolton,Stoke,3.0,1.0,H,C Foy,14.0,8.0,...,1.81,3.51,3.35,5.3,4.54,37.0,2.33,2.16,1.7,1.64
2,3.0,16.08.2008,Everton,Blackburn,2.0,3.0,A,A Marriner,10.0,15.0,...,1.98,3.39,3.25,4.21,3.86,37.0,2.34,2.17,1.7,1.63
3,4.0,16.08.2008,Hull,Fulham,2.0,1.0,H,P Walton,11.0,12.0,...,2.56,3.38,3.23,2.89,2.68,37.0,2.32,2.15,1.69,1.64
4,5.0,16.08.2008,Middlesbrough,Tottenham,2.0,1.0,H,M Atkinson,14.0,8.0,...,3.19,3.55,3.31,2.3,2.2,37.0,1.9,1.81,2.12,1.92
5,6.0,16.08.2008,Sunderland,Liverpool,0.0,1.0,A,A Wiley,6.0,14.0,...,5.24,3.75,3.5,1.76,1.68,37.0,2.22,2.08,1.76,1.69
6,7.0,16.08.2008,West Ham,Wigan,2.0,1.0,H,S Bennett,15.0,22.0,...,1.94,3.4,3.27,4.4,3.98,37.0,2.22,2.1,1.75,1.68
7,8.0,17.08.2008,Aston Villa,Man City,4.0,2.0,H,P Dowd,14.0,13.0,...,1.84,3.55,3.32,5.05,4.36,34.0,2.33,2.12,1.73,1.66
8,9.0,17.08.2008,Chelsea,Portsmouth,4.0,0.0,H,M Dean,18.0,12.0,...,1.32,5.2,4.69,12.0,10.05,36.0,2.01,1.9,1.93,1.83
9,10.0,17.08.2008,Man United,Newcastle,1.0,1.0,D,M Riley,18.0,11.0,...,1.28,5.5,5.03,13.0,11.0,36.0,1.83,1.72,2.2,2.03


In [34]:
data.shape

(4182, 47)

In [35]:
#We have to drop the two lasts rows because they are NaN
data = data.drop([4180, 4181])

#We have a new dimension
data.shape

(4180, 47)

This dataset counts **4'182** rows for **47** columns.

##### Explanations on betting's vocabulary
First, let's set up the vocabulary. While betting,  _odds_ are linked to each bet you make. Betting odds tell you how likely an event is to happen, and represents how much money you could win if your bet realizes itself.


There is the possibility to bet on different type of results before a match. Here, we will take into account these type of bets: the one on the number of goals, the other on the match result (Home team wins, Away team wins, Draw match).

In [42]:
data.dtypes

Match_ID                                  int64
Date                                     object
HomeTeam                                 object
AwayTeam                                 object
Home Team Goals                         float64
Away Team Goals                         float64
Referee                                  object
Home Team Shots                         float64
Away Team Shots                         float64
Home Team Shots on Target               float64
Away Team Shots on Target               float64
Home Fouls Committed                    float64
Away Fouls Committed                    float64
Home Corners                            float64
Away Cornners                           float64
Home Yellow Cards                       float64
Away Yellow Cards                       float64
Home Red Cards                          float64
Away Red Cards                          float64
B365 Home                               float64
B365 Draw                               

In [37]:
data['Match_ID'] = data.Match_ID.astype(int)

In [41]:
#We would like to do 3 different regression, one for the Home team wins, one for the draws and ondre for the Away team wins
data = pd.get_dummies(data, columns=['Match Result'])

KeyError: "None of [Index(['Match Result'], dtype='object')] are in the [columns]"

In [43]:
#We have to change the type of some variable
data['Date'] = pd.to_datetime(data['Date'], format="%d.%m.%Y")


In [58]:
data[['Home Team Goals', 'Away Team Goals', 'Home Team Shots','Away Team Shots', 'Home Team Shots on Target', 'Away Team Shots on Target', 'Home Fouls Committed', 'Away Fouls Committed', 'Home Corners', 'Away Cornners', 'Home Yellow Cards', 'Away Yellow Cards', 'Home Red Cards', 'Away Red Cards']]= data[['Home Team Goals', 'Away Team Goals', 'Home Team Shots','Away Team Shots', 'Home Team Shots on Target', 'Away Team Shots on Target', 'Home Fouls Committed', 'Away Fouls Committed', 'Home Corners', 'Away Cornners', 'Home Yellow Cards', 'Away Yellow Cards', 'Home Red Cards', 'Away Red Cards']].astype(int)

In [59]:
data.dtypes

Match_ID                                         int64
Date                                    datetime64[ns]
HomeTeam                                        object
AwayTeam                                        object
Home Team Goals                                  int64
Away Team Goals                                  int64
Referee                                         object
Home Team Shots                                  int64
Away Team Shots                                  int64
Home Team Shots on Target                        int64
Away Team Shots on Target                        int64
Home Fouls Committed                             int64
Away Fouls Committed                             int64
Home Corners                                     int64
Away Cornners                                    int64
Home Yellow Cards                                int64
Away Yellow Cards                                int64
Home Red Cards                                   int64
Away Red C