# IPL Win % Calculator Project

<b>The Aim: </b>We will use Logistic Regression to build`a model where we will predict the win percentage of the teams in any given ball in the second innings of the game.<br>
<b>Datasets: </b>We download the <i>matches.csv</i> containing information about every IPL match from 2008-2024 and <i>deliveries.csv</i> containing information about every ball bowled in those matches from <a href="https://www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020/data"><u>Kaggle</u></a>

### 1. Importing libraries and datasets

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
matches = pd.read_csv('./data/matches.csv')
deliveries = pd.read_csv('./data/deliveries.csv')

### 2. Exploring the data

In [3]:
matches.head()

Unnamed: 0,id,season,city,date,match_type,player_of_match,venue,team1,team2,toss_winner,toss_decision,winner,result,result_margin,target_runs,target_overs,super_over,method,umpire1,umpire2
0,335982,2007/08,Bangalore,2008-04-18,League,BB McCullum,M Chinnaswamy Stadium,Royal Challengers Bangalore,Kolkata Knight Riders,Royal Challengers Bangalore,field,Kolkata Knight Riders,runs,140.0,223.0,20.0,N,,Asad Rauf,RE Koertzen
1,335983,2007/08,Chandigarh,2008-04-19,League,MEK Hussey,"Punjab Cricket Association Stadium, Mohali",Kings XI Punjab,Chennai Super Kings,Chennai Super Kings,bat,Chennai Super Kings,runs,33.0,241.0,20.0,N,,MR Benson,SL Shastri
2,335984,2007/08,Delhi,2008-04-19,League,MF Maharoof,Feroz Shah Kotla,Delhi Daredevils,Rajasthan Royals,Rajasthan Royals,bat,Delhi Daredevils,wickets,9.0,130.0,20.0,N,,Aleem Dar,GA Pratapkumar
3,335985,2007/08,Mumbai,2008-04-20,League,MV Boucher,Wankhede Stadium,Mumbai Indians,Royal Challengers Bangalore,Mumbai Indians,bat,Royal Challengers Bangalore,wickets,5.0,166.0,20.0,N,,SJ Davis,DJ Harper
4,335986,2007/08,Kolkata,2008-04-20,League,DJ Hussey,Eden Gardens,Kolkata Knight Riders,Deccan Chargers,Deccan Chargers,bat,Kolkata Knight Riders,wickets,5.0,111.0,20.0,N,,BF Bowden,K Hariharan


In [4]:
deliveries.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,


In [5]:
deliveries.shape

(260920, 17)

In [6]:
matches.shape

(1095, 20)

Here we know that many teams in IPL have changed names along the years. We need to treat them as the same team to keep their records together

In [7]:
matches["team1"].unique()

array(['Royal Challengers Bangalore', 'Kings XI Punjab',
       'Delhi Daredevils', 'Mumbai Indians', 'Kolkata Knight Riders',
       'Rajasthan Royals', 'Deccan Chargers', 'Chennai Super Kings',
       'Kochi Tuskers Kerala', 'Pune Warriors', 'Sunrisers Hyderabad',
       'Gujarat Lions', 'Rising Pune Supergiants',
       'Rising Pune Supergiant', 'Delhi Capitals', 'Punjab Kings',
       'Lucknow Super Giants', 'Gujarat Titans',
       'Royal Challengers Bengaluru'], dtype=object)

In [8]:
matches['team1'] = matches['team1'].str.replace('Delhi Daredevils', 'Delhi Capitals')
matches['team2'] = matches['team2'].str.replace('Delhi Daredevils', 'Delhi Capitals')

matches['team1'] = matches['team1'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')
matches['team2'] = matches['team2'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')

matches['team1'] = matches['team1'].str.replace('Kings XI Punjab', 'Punjab Kings')
matches['team2'] = matches['team2'].str.replace('Kings XI Punjab', 'Punjab Kings')

matches['team1'] = matches['team1'].str.replace('Royal Challengers Bengaluru', 'Royal Challengers Bangalore')
matches['team2'] = matches['team2'].str.replace('Royal Challengers Bengaluru', 'Royal Challengers Bangalore')

Also many of the teams in history don't play anymore so we don't have to train the model on them since we wouldn't use it on them anyway

In [9]:
teams = ['Royal Challengers Bangalore', 'Mumbai Indians', 'Kolkata Knight Riders',
       'Rajasthan Royals', 'Chennai Super Kings', 'Sunrisers Hyderabad',
        'Delhi Capitals', 'Punjab Kings']

In [10]:
matches = matches[matches['team1'].isin(teams)]
matches = matches[matches['team2'].isin(teams)]
matches.shape

(896, 20)

Now we will remove the matches where D/L method was used since the normal deliveries of the match would be irrelevant to the final result in those matches

In [11]:
matches["method"].unique()

array([nan, 'D/L'], dtype=object)

In [12]:
matches = matches[matches["method"] != 'D/L']
matches.shape

(880, 20)

Here we're going to extract the final features we need from the matches dataset, i.e, the <b>match id</b>, <b>city</b> where match is held, <b>target runs</b>, and the <b>winner</b>

In [13]:
match_df = matches[['id', 'winner', 'target_runs', 'city']]

We're going to join the match_df and deliveries dataframe by joining on the 'match_id' column to make delivery_df.<br>
For that we need to rename 'id' to 'match_id' so it matches in both

In [14]:
match_df = match_df.rename(columns={'id': 'match_id'})

In [15]:
delivery_df = match_df.merge(deliveries, on='match_id')

In [16]:
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,Kolkata Knight Riders,223.0,Bangalore,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,Kolkata Knight Riders,223.0,Bangalore,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,Kolkata Knight Riders,223.0,Bangalore,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,Kolkata Knight Riders,223.0,Bangalore,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,Kolkata Knight Riders,223.0,Bangalore,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,


UPDATE: We will have to replace the names of the teams which have changed names again in the delivery_df

In [17]:
delivery_df['batting_team'] = delivery_df['batting_team'].str.replace('Delhi Daredevils', 'Delhi Capitals')
delivery_df['bowling_team'] = delivery_df['bowling_team'].str.replace('Delhi Daredevils', 'Delhi Capitals')

delivery_df['batting_team'] = delivery_df['batting_team'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')
delivery_df['bowling_team'] = delivery_df['bowling_team'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')

delivery_df['batting_team'] = delivery_df['batting_team'].str.replace('Kings XI Punjab', 'Punjab Kings')
delivery_df['bowling_team'] = delivery_df['bowling_team'].str.replace('Kings XI Punjab', 'Punjab Kings')

delivery_df['batting_team'] = delivery_df['batting_team'].str.replace('Royal Challengers Bengaluru', 'Royal Challengers Bangalore')
delivery_df['bowling_team'] = delivery_df['bowling_team'].str.replace('Royal Challengers Bengaluru', 'Royal Challengers Bangalore')

Since we're only going to predict using balls in the second innings, we can eliminate all the balls of the first innings from the data

In [18]:
delivery_df = delivery_df[delivery_df["inning"] == 2]
delivery_df.tail()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
210963,1426312,Kolkata Knight Riders,114.0,Chennai,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,,0,,,
210964,1426312,Kolkata Knight Riders,114.0,Chennai,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,,0,,,
210965,1426312,Kolkata Knight Riders,114.0,Chennai,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,
210966,1426312,Kolkata Knight Riders,114.0,Chennai,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,,0,,,
210967,1426312,Kolkata Knight Riders,114.0,Chennai,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,3,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,


In [19]:
delivery_df.shape

(102034, 20)

Here we use <i>cumsum</i> to calculate <b>runs scored</b> at every ball

In [20]:
delivery_df['current_score'] = delivery_df.groupby('match_id')['total_runs'].transform('cumsum')

In [21]:
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,current_score
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,W Jaffer,1,0,1,,0,,,,1
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,R Dravid,0,1,1,wides,0,,,,2
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,R Dravid,0,0,0,,0,,,,2
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,R Dravid,1,0,1,,0,,,,3
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,W Jaffer,1,0,1,,0,,,,4


And then calculate <b>runs_left</b> by subtracting runs_scored from target_runs

In [22]:
delivery_df['runs_left'] = delivery_df['target_runs'] - delivery_df['current_score']
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,current_score,runs_left
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,1,0,1,,0,,,,1,222.0
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,0,1,1,wides,0,,,,2,221.0
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,0,0,0,,0,,,,2,221.0
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,1,0,1,,0,,,,3,220.0
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,1,0,1,,0,,,,4,219.0


In [23]:
delivery_df['balls_left'] = 120 - (delivery_df['over']*6 + delivery_df['ball'])
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,current_score,runs_left,balls_left
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,0,1,,0,,,,1,222.0,119
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,1,1,wides,0,,,,2,221.0,118
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,0,0,,0,,,,2,221.0,117
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,0,1,,0,,,,3,220.0,116
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,0,1,,0,,,,4,219.0,115


In [24]:
# delivery_df['crr'].unique()

We use the cumsum on 'is_wicket' column to calculate the number of <b>wickets</b> left after every ball (10 - wickets)

In [25]:
wickets = delivery_df.groupby('match_id')['is_wicket'].transform('cumsum')
delivery_df['wickets_left'] = 10 - wickets
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,current_score,runs_left,balls_left,wickets_left
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,1,,0,,,,1,222.0,119,10
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,1,wides,0,,,,2,221.0,118,10
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,0,,0,,,,2,221.0,117,10
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,1,,0,,,,3,220.0,116,10
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,1,,0,,,,4,219.0,115,10


Now to calculate the <b>Current Run Rate</b> (crr), we divide the current score by total overs bowled

In [26]:
delivery_df['crr'] = delivery_df["current_score"] / (delivery_df['over'] + delivery_df['ball']/6)
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,current_score,runs_left,balls_left,wickets_left,crr
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,,0,,,,1,222.0,119,10,6.0
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,wides,0,,,,2,221.0,118,10,6.0
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,,0,,,,2,221.0,117,10,4.0
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,,0,,,,3,220.0,116,10,4.5
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,,0,,,,4,219.0,115,10,4.8


For <b>Required Run Rate</b> (rrr) we will divide runs_left by balls_left and multiply by 6

In [27]:
delivery_df['rrr'] = (delivery_df['runs_left']/delivery_df['balls_left']) * 6
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,is_wicket,player_dismissed,dismissal_kind,fielder,current_score,runs_left,balls_left,wickets_left,crr,rrr
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,0,,,,1,222.0,119,10,6.0,11.193277
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,0,,,,2,221.0,118,10,6.0,11.237288
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,0,,,,2,221.0,117,10,4.0,11.333333
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,0,,,,3,220.0,116,10,4.5,11.37931
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,0,,,,4,219.0,115,10,4.8,11.426087


Now we need to make the <b>target variable</b>, 'result', which says whether the current batting team ultimately won the match. <br>
We will apply a function which will return 1 if the batting team is the winner, else 0

In [28]:
def result(row):
    return 1 if row["batting_team"] == row["winner"] else 0

In [29]:
delivery_df['result'] = delivery_df.apply(result, axis=1)
delivery_df.head()

Unnamed: 0,match_id,winner,target_runs,city,inning,batting_team,bowling_team,over,ball,batter,...,player_dismissed,dismissal_kind,fielder,current_score,runs_left,balls_left,wickets_left,crr,rrr,result
124,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,1,R Dravid,...,,,,1,222.0,119,10,6.0,11.193277,0
125,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,2,W Jaffer,...,,,,2,221.0,118,10,6.0,11.237288,0
126,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,3,W Jaffer,...,,,,2,221.0,117,10,4.0,11.333333,0
127,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,4,W Jaffer,...,,,,3,220.0,116,10,4.5,11.37931,0
128,335982,Kolkata Knight Riders,223.0,Bangalore,2,Royal Challengers Bangalore,Kolkata Knight Riders,0,5,R Dravid,...,,,,4,219.0,115,10,4.8,11.426087,0


In [30]:
final_df = delivery_df[['batting_team', 'bowling_team', 'city', 'runs_left', 'balls_left', 'wickets_left', 'crr', 'rrr', 'result']]
final_df

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr,result
124,Royal Challengers Bangalore,Kolkata Knight Riders,Bangalore,222.0,119,10,6.000000,11.193277,0
125,Royal Challengers Bangalore,Kolkata Knight Riders,Bangalore,221.0,118,10,6.000000,11.237288,0
126,Royal Challengers Bangalore,Kolkata Knight Riders,Bangalore,221.0,117,10,4.000000,11.333333,0
127,Royal Challengers Bangalore,Kolkata Knight Riders,Bangalore,220.0,116,10,4.500000,11.379310,0
128,Royal Challengers Bangalore,Kolkata Knight Riders,Bangalore,219.0,115,10,4.800000,11.426087,0
...,...,...,...,...,...,...,...,...,...
210963,Kolkata Knight Riders,Sunrisers Hyderabad,Chennai,4.0,61,8,11.186441,0.393443,1
210964,Kolkata Knight Riders,Sunrisers Hyderabad,Chennai,3.0,60,8,11.100000,0.300000,1
210965,Kolkata Knight Riders,Sunrisers Hyderabad,Chennai,2.0,59,8,11.016393,0.203390,1
210966,Kolkata Knight Riders,Sunrisers Hyderabad,Chennai,1.0,58,8,10.935484,0.103448,1


In [31]:
final_df = final_df.sample(final_df.shape[0])

In [32]:
final_df.isnull().sum()

batting_team       0
bowling_team       0
city            6012
runs_left          0
balls_left         0
wickets_left       0
crr                0
rrr               13
result             0
dtype: int64

In [33]:
# final_df[final_df['balls_left'] == 0]['rrr'] = 0
final_df.loc[final_df['balls_left'] == 0, 'rrr'] = 0

final_df

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr,result
27175,Kolkata Knight Riders,Royal Challengers Bangalore,Kolkata,117.0,102,10,6.333333,6.882353,1
38332,Delhi Capitals,Chennai Super Kings,Chennai,63.0,66,6,5.555556,5.727273,0
119357,Delhi Capitals,Sunrisers Hyderabad,Delhi,29.0,20,6,9.420000,8.700000,0
123051,Royal Challengers Bangalore,Punjab Kings,Bengaluru,155.0,118,9,3.000000,7.881356,1
57434,Chennai Super Kings,Punjab Kings,Chennai,101.0,79,9,8.195122,7.670886,0
...,...,...,...,...,...,...,...,...,...
71244,Mumbai Indians,Kolkata Knight Riders,Kolkata,44.0,30,7,7.733333,8.800000,1
139629,Punjab Kings,Chennai Super Kings,Chennai,53.0,27,8,6.967742,11.777778,0
131914,Chennai Super Kings,Sunrisers Hyderabad,Pune,23.0,23,8,9.711340,6.000000,1
205134,Delhi Capitals,Sunrisers Hyderabad,Delhi,181.0,85,8,14.742857,12.776471,0


In [34]:
final_df.shape

(102034, 9)

In [35]:
final_df.dropna(inplace=True)

In [36]:
final_df.shape

(96022, 9)

In [37]:
X = final_df.iloc[:, :-1]
y = final_df.iloc[:, -1]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [38]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

trf = ColumnTransformer([
    ('trf', OneHotEncoder(sparse_output=False, drop='first'), ['batting_team', 'bowling_team', 'city'])
], remainder='passthrough', force_int_remainder_cols=False)


In [39]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

In [40]:
pipe = Pipeline(steps=[
    ('step1', trf),
    ('step2', LogisticRegression(solver='liblinear'))
])

In [41]:
pipe.fit(X_train, y_train)

In [42]:
y_pred = pipe.predict(X_test)

In [43]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.8159333506899245

In [44]:
pipe.predict_proba(X_test)

array([[0.07390382, 0.92609618],
       [0.55945457, 0.44054543],
       [0.99583236, 0.00416764],
       ...,
       [0.75591653, 0.24408347],
       [0.75644457, 0.24355543],
       [0.17622276, 0.82377724]])

In [45]:
import pickle
pickle.dump(pipe, open('pipe.pkl', 'wb'))

In [46]:
final_df[final_df['balls_left'] == 0]

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr,result
164366,Royal Challengers Bangalore,Mumbai Indians,Chennai,0.0,0,2,8.00,0.0,1
198141,Delhi Capitals,Punjab Kings,Delhi,32.0,0,2,6.80,0.0,0
10063,Mumbai Indians,Punjab Kings,Mumbai,2.0,0,1,9.40,0.0,0
11789,Rajasthan Royals,Mumbai Indians,Jaipur,2.0,0,5,7.20,0.0,1
54369,Kolkata Knight Riders,Punjab Kings,Kolkata,3.0,0,3,6.60,0.0,0
...,...,...,...,...,...,...,...,...,...
7291,Punjab Kings,Chennai Super Kings,Chennai,19.0,0,1,8.15,0.0,0
192776,Sunrisers Hyderabad,Mumbai Indians,Hyderabad,15.0,0,0,8.90,0.0,0
204721,Rajasthan Royals,Kolkata Knight Riders,Kolkata,0.0,0,2,11.20,0.0,1
59197,Sunrisers Hyderabad,Chennai Super Kings,Chennai,11.0,0,5,7.50,0.0,0


Debugging...ignore

In [47]:
final_df[final_df['balls_left'] == 0]

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr,result
164366,Royal Challengers Bangalore,Mumbai Indians,Chennai,0.0,0,2,8.00,0.0,1
198141,Delhi Capitals,Punjab Kings,Delhi,32.0,0,2,6.80,0.0,0
10063,Mumbai Indians,Punjab Kings,Mumbai,2.0,0,1,9.40,0.0,0
11789,Rajasthan Royals,Mumbai Indians,Jaipur,2.0,0,5,7.20,0.0,1
54369,Kolkata Knight Riders,Punjab Kings,Kolkata,3.0,0,3,6.60,0.0,0
...,...,...,...,...,...,...,...,...,...
7291,Punjab Kings,Chennai Super Kings,Chennai,19.0,0,1,8.15,0.0,0
192776,Sunrisers Hyderabad,Mumbai Indians,Hyderabad,15.0,0,0,8.90,0.0,0
204721,Rajasthan Royals,Kolkata Knight Riders,Kolkata,0.0,0,2,11.20,0.0,1
59197,Sunrisers Hyderabad,Chennai Super Kings,Chennai,11.0,0,5,7.50,0.0,0


In [48]:
X_train.isnull().sum()

batting_team    0
bowling_team    0
city            0
runs_left       0
balls_left      0
wickets_left    0
crr             0
rrr             0
dtype: int64

In [49]:
X.isnull().sum()

batting_team    0
bowling_team    0
city            0
runs_left       0
balls_left      0
wickets_left    0
crr             0
rrr             0
dtype: int64

In [50]:
final_df.isnull().sum()

batting_team    0
bowling_team    0
city            0
runs_left       0
balls_left      0
wickets_left    0
crr             0
rrr             0
result          0
dtype: int64

In [51]:
final_df['city'].unique()

array(['Kolkata', 'Chennai', 'Delhi', 'Bengaluru', 'Mumbai',
       'Visakhapatnam', 'Bangalore', 'Cape Town', 'Hyderabad', 'Ranchi',
       'Chandigarh', 'Jaipur', 'Pune', 'Centurion', 'Raipur',
       'Johannesburg', 'Port Elizabeth', 'Sharjah', 'Abu Dhabi',
       'Navi Mumbai', 'Ahmedabad', 'Indore', 'Dubai', 'East London',
       'Mohali', 'Dharamsala', 'Cuttack', 'Guwahati', 'Durban',
       'Kimberley', 'Nagpur', 'Bloemfontein'], dtype=object)

In [52]:
X['city'].unique()

array(['Kolkata', 'Chennai', 'Delhi', 'Bengaluru', 'Mumbai',
       'Visakhapatnam', 'Bangalore', 'Cape Town', 'Hyderabad', 'Ranchi',
       'Chandigarh', 'Jaipur', 'Pune', 'Centurion', 'Raipur',
       'Johannesburg', 'Port Elizabeth', 'Sharjah', 'Abu Dhabi',
       'Navi Mumbai', 'Ahmedabad', 'Indore', 'Dubai', 'East London',
       'Mohali', 'Dharamsala', 'Cuttack', 'Guwahati', 'Durban',
       'Kimberley', 'Nagpur', 'Bloemfontein'], dtype=object)

In [53]:
X[X['city'].isnull()]

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr


In [54]:
X[X['balls_left'] == 0]

Unnamed: 0,batting_team,bowling_team,city,runs_left,balls_left,wickets_left,crr,rrr
164366,Royal Challengers Bangalore,Mumbai Indians,Chennai,0.0,0,2,8.00,0.0
198141,Delhi Capitals,Punjab Kings,Delhi,32.0,0,2,6.80,0.0
10063,Mumbai Indians,Punjab Kings,Mumbai,2.0,0,1,9.40,0.0
11789,Rajasthan Royals,Mumbai Indians,Jaipur,2.0,0,5,7.20,0.0
54369,Kolkata Knight Riders,Punjab Kings,Kolkata,3.0,0,3,6.60,0.0
...,...,...,...,...,...,...,...,...
7291,Punjab Kings,Chennai Super Kings,Chennai,19.0,0,1,8.15,0.0
192776,Sunrisers Hyderabad,Mumbai Indians,Hyderabad,15.0,0,0,8.90,0.0
204721,Rajasthan Royals,Kolkata Knight Riders,Kolkata,0.0,0,2,11.20,0.0
59197,Sunrisers Hyderabad,Chennai Super Kings,Chennai,11.0,0,5,7.50,0.0


In [55]:
final_df['batting_team'].unique()

array(['Kolkata Knight Riders', 'Delhi Capitals',
       'Royal Challengers Bangalore', 'Chennai Super Kings',
       'Mumbai Indians', 'Rajasthan Royals', 'Sunrisers Hyderabad',
       'Punjab Kings'], dtype=object)

In [56]:
matches['team1'].unique()

array(['Royal Challengers Bangalore', 'Punjab Kings', 'Delhi Capitals',
       'Mumbai Indians', 'Kolkata Knight Riders', 'Rajasthan Royals',
       'Sunrisers Hyderabad', 'Chennai Super Kings'], dtype=object)

In [57]:
delivery_df['batting_team'].unique()

array(['Royal Challengers Bangalore', 'Punjab Kings', 'Delhi Capitals',
       'Kolkata Knight Riders', 'Rajasthan Royals', 'Mumbai Indians',
       'Chennai Super Kings', 'Sunrisers Hyderabad'], dtype=object)