# IPL - Win Predictions
In this notebook, I will be trying to predict who wins an IPL match mostly based on who's on opposing teams. I will be taking two approaches: a stochastic approach where I use probabilities to try and simulate a match, and an actual machine learning approach where I build various features about the match. Most of the ideas I show here are adapted from this paper about predicting tennis matches: http://www.doc.ic.ac.uk/teaching/distinguished-projects/2015/m.sipko.pdf

In [34]:
# First import everything
import time
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import seaborn as sns
import plotly.plotly as py
from plotly.graph_objs import *
from scipy.stats import *
from sklearn.neural_network import MLPClassifier
from bs4 import BeautifulSoup
import requests
import wikipedia
pd.options.display.max_columns = None

## Data Reading

In [35]:
matches = pd.read_csv('./matches.csv')
matches.head()

Unnamed: 0,id,season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,2008,Bangalore,4/18/2008,Kolkata Knight Riders,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Kolkata Knight Riders,140,0,BB McCullum,M Chinnaswamy Stadium,Asad Rauf,RE Koertzen,
1,2,2008,Chandigarh,4/19/2008,Chennai Super Kings,Kings XI Punjab,Chennai Super Kings,bat,normal,0,Chennai Super Kings,33,0,MEK Hussey,"Punjab Cricket Association Stadium, Mohali",MR Benson,SL Shastri,
2,3,2008,Delhi,4/19/2008,Rajasthan Royals,Delhi Daredevils,Rajasthan Royals,bat,normal,0,Delhi Daredevils,0,9,MF Maharoof,Feroz Shah Kotla,Aleem Dar,GA Pratapkumar,
3,4,2008,Mumbai,4/20/2008,Mumbai Indians,Royal Challengers Bangalore,Mumbai Indians,bat,normal,0,Royal Challengers Bangalore,0,5,MV Boucher,Wankhede Stadium,SJ Davis,DJ Harper,
4,5,2008,Kolkata,4/20/2008,Deccan Chargers,Kolkata Knight Riders,Deccan Chargers,bat,normal,0,Kolkata Knight Riders,0,5,DJ Hussey,Eden Gardens,BF Bowden,K Hariharan,


In [36]:
deliveries = pd.read_csv('./deliveries.csv')
deliveries.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,wide_runs,bye_runs,legbye_runs,noball_runs,penalty_runs,batsman_runs,extra_runs,total_runs,player_dismissed,dismissal_kind,fielder
0,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,1,SC Ganguly,BB McCullum,P Kumar,0,0,0,1,0,0,0,1,1,,,
1,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,2,BB McCullum,SC Ganguly,P Kumar,0,0,0,0,0,0,0,0,0,,,
2,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,3,BB McCullum,SC Ganguly,P Kumar,0,1,0,0,0,0,0,1,1,,,
3,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,4,BB McCullum,SC Ganguly,P Kumar,0,0,0,0,0,0,0,0,0,,,
4,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,5,BB McCullum,SC Ganguly,P Kumar,0,0,0,0,0,0,0,0,0,,,


## Data Preparation
The idea is pretty straightforward: Create an array of batsmen and bowlers that contain probabilities. The batsmen's probabilities would consist of the probability of getting 0,1,2,3,4,5,or 6 runs off a ball. It also includes the chance of getting out. The bowlers' probabilites contain the probability of bowling for a specific over.

In [37]:
link = 'http://www.espncricinfo.com/series/8048/scorecard/335983/Kings-XI-Punjab-vs-Chennai-Super-Kings-2nd-match-Indian-Premier-League-2007-08'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'lxml')

In [38]:
for batsmen in soup.find_all('div', 'wrap dnb'):
    print(batsmen.text)

Did not bat: MS Gony, M Muralitharan, P Amarnath, Joginder Sharma
Did not bat: P Dharmani, B Lee, PP Chawla, WA Mota, S Sreesanth


In [39]:
link = 'https://en.wikipedia.org/wiki/2009_Indian_Premier_League'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'lxml')

In [40]:
for item in soup.find_all('small'):
    if item.text == 'Scorecard':
        link = item.a['href']
        r = requests.get(link)
        soup = BeautifulSoup(r.text, 'lxml')
        for batsmen in soup.find_all('div', 'wrap dnb'):
            print(batsmen.text[13:].split(', '))

['SL Malinga', 'RR Raje']
['MS Gony', 'R Ashwin']
['A Kumble']
['YA Abdulla', 'VS Malik', 'VRV Singh']
['TM Dilshan', 'AB de Villiers', 'MK Tiwary', 'KD Karthik †', 'DP Nannes', 'DL Vettori', 'VY Mahesh', 'AM Salvi', 'PJ Sangwan']
['SB Styris', 'Y Venugopal Rao', 'Harmeet Singh', 'PP Ojha', 'RP Singh', 'DB Ravi Teja', 'FH Edwards']
['S Badrinath', 'MS Gony', 'L Balaji', 'M Muralitharan', 'Joginder Sharma']
['YA Abdulla', 'VRV Singh', 'VS Malik']
['M Kartik', 'Yashpal Singh', 'LR Shukla', 'A Chopra', 'MC Henriques', 'SC Ganguly', 'AB Dinda', 'I Sharma']
['RP Singh', 'Harmeet Singh', 'PP Ojha']
['DW Steyn']
['DP Nannes', 'A Nehra', 'AM Salvi', 'PJ Sangwan']
['M Rawat †', 'Kamran Khan', 'MM Patel']
['BAW Mendis', 'Anureet Singh']
['IK Pathan', 'WA Mota', 'PP Chawla', 'VRV Singh', 'RR Bose', 'YA Abdulla']
['Harmeet Singh']
['SL Malinga', 'DS Kulkarni']
['A Kumble', 'KP Appanna']
['A Mishra', 'DL Vettori', 'DP Nannes', 'PJ Sangwan', 'A Nehra']
['RR Powar', 'YA Abdulla', 'VRV Singh']
['MM Pa

In [28]:
fixtures = list(filter(lambda x: x.text == 'Fixtures[edit]', soup.find_all('h2')))[0]