# Predicting the Winning Football Team

Can we design a predictive model capable of accurately predicting if the home team will win a football match? 

![alt text](https://6544-presscdn-0-22-pagely.netdna-ssl.com/wp-content/uploads/2017/04/English-Premier-League.jpg "Logo Title Text 1")

## Steps

1. We will clean our dataset
2. Split it into training and testing data (12 features & 1 target (winning team (Home/Away/Draw))
3. Train 3 different classifiers on the data 
  -Logistic Regression
  -Support Vector Machine 
  -XGBoost
4. Use the best Classifer to predict who will win given an away team and a home team

## History

Sports betting is a 500 billion dollar market (Sydney Herald)

![alt text](https://static1.squarespace.com/static/506a95bbc4aa0491a951c141/t/51a55d97e4b00f4428967e64/1369791896526/sports-620x349.jpg "Logo Title Text 1")

Kaggle hosts a yearly competiton called March Madness 

https://www.kaggle.com/c/march-machine-learning-mania-2017/kernels

Several Papers on this 

https://arxiv.org/pdf/1511.05837.pdf

"It is possible to predict the winner of English county twenty twenty cricket games in almost two thirds of instances."

https://arxiv.org/pdf/1411.1243.pdf

"Something that becomes clear from the results is that Twitter contains enough information to be useful for
predicting outcomes in the Premier League"

https://qz.com/233830/world-cup-germany-argentina-predictions-microsoft/

For the 2014 World Cup, Bing correctly predicted the outcomes for all of the 15 games in the knockout round.

So the right questions to ask are

-What model should we use?
-What are the features (the aspects of a game) that matter the most to predicting a team win? Does being the home team give a team the advantage? 

## Dataset

- Football is played by 250 million players in over 200 countries (most popular sport globally)
- The English Premier League is the most popular domestic team in the world
- Retrived dataset from http://football-data.co.uk/data.php

![alt text](http://i.imgur.com/YRIctyo.png "Logo Title Text 1")

- Football is a team sport, a cheering crowd helps morale
- Familarity with pitch and weather conditions helps
- No need to travel (less fatigue)

Acrononyms- https://rstudio-pubs-static.s3.amazonaws.com/179121_70eb412bbe6c4a55837f2439e5ae6d4e.html

## Other repositories

- https://github.com/rsibi/epl-prediction-2017 (EPL prediction)
- https://github.com/adeshpande3/March-Madness-2017 (NCAA prediction)

## Import Dependencies

In [1]:
#data preprocessing
import pandas as pd
#produces a prediction model in the form of an ensemble of weak prediction models, typically decision tree
import xgboost as xgb
#the outcome (dependent variable) has only a limited number of possible values. 
#Logistic Regression is used when response variable is categorical in nature.
from sklearn.linear_model import LogisticRegression
#A random forest is a meta estimator that fits a number of decision tree classifiers 
#on various sub-samples of the dataset and use averaging to improve the predictive 
#accuracy and control over-fitting.
from sklearn.ensemble import RandomForestClassifier
#a discriminative classifier formally defined by a separating hyperplane.
from sklearn.svm import SVC
#displayd data
from IPython.display import display
%matplotlib inline

In [3]:
# Read data and drop redundant column.
data = pd.read_csv('dataset.csv')

# Preview data.
display(data.head())


#Full Time Result (H=Home Win, D=Draw, A=Away Win)
#HTGD - Home team goal difference
#ATGD - away team goal difference
#HTP - Home team points
#ATP - Away team points
#DiffFormPts Diff in points
#DiffLP - Differnece in last years prediction

#Input - 12 other features (fouls, shots, goals, misses,corners, red card, yellow cards)
#Output - Full Time Result (H=Home Win, D=Draw, A=Away Win) 

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73
0,E0,18/08/01,Charlton,Everton,1.0,2.0,A,0.0,0.0,D,...,,,,,,,,,,
1,E0,18/08/01,Derby,Blackburn,2.0,1.0,H,1.0,0.0,H,...,,,,,,,,,,
2,E0,18/08/01,Leeds,Southampton,2.0,0.0,H,0.0,0.0,D,...,,,,,,,,,,
3,E0,18/08/01,Leicester,Bolton,0.0,5.0,A,0.0,4.0,A,...,,,,,,,,,,
4,E0,18/08/01,Liverpool,West Ham,2.0,1.0,H,1.0,1.0,D,...,,,,,,,,,,
