#**IPL SCORE PREDICTION**

**PROJECT AIM :**
PREDICT THE TOTAL SCORE OF AN IPL MATCH DATASET <br>

**APPROACH:**

1. Handle missing values
2. Drop the unnecessary columns
3. Convert the categorical string columns to numerical columns, by using dummy encoding
4. Perform feature scaling (if necessary).
5. Build a model on the “total” column, using a RandomForestRegressor
6. Calculate the score
7. Predict on a new set of features 

Dataset used : https://www.kaggle.com/malavikarvikraman/ipl2017

# **Importing Libraries**

In [None]:
import pandas as pd
import numpy as np  

#**Importing dataset**



In [None]:
df = pd.read_csv("/content/drive/My Drive/Colab Notebooks/ipl2017.csv")

**Checking Data Set**


In [None]:
print(df.info())
# print(df.dtypes)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76014 entries, 0 to 76013
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   mid             76014 non-null  int64  
 1   date            76014 non-null  object 
 2   venue           76014 non-null  object 
 3   bat_team        76014 non-null  object 
 4   bowl_team       76014 non-null  object 
 5   batsman         76014 non-null  object 
 6   bowler          76014 non-null  object 
 7   runs            76014 non-null  int64  
 8   wickets         76014 non-null  int64  
 9   overs           76014 non-null  float64
 10  runs_last_5     76014 non-null  int64  
 11  wickets_last_5  76014 non-null  int64  
 12  striker         76014 non-null  int64  
 13  non-striker     76014 non-null  int64  
 14  total           76014 non-null  int64  
dtypes: float64(1), int64(8), object(6)
memory usage: 8.7+ MB
None


In [None]:
print(df.shape)

(76014, 15)


In [None]:
print(df.head())

   mid        date                  venue  ... striker non-striker total
0    1  2008-04-18  M Chinnaswamy Stadium  ...       0           0   222
1    1  2008-04-18  M Chinnaswamy Stadium  ...       0           0   222
2    1  2008-04-18  M Chinnaswamy Stadium  ...       0           0   222
3    1  2008-04-18  M Chinnaswamy Stadium  ...       0           0   222
4    1  2008-04-18  M Chinnaswamy Stadium  ...       0           0   222

[5 rows x 15 columns]


#**Droping unnecessary columns**


In [None]:
df.drop(labels=["mid",'date'],axis=1,inplace=True)
print(df.head())

                   venue               bat_team  ... non-striker total
0  M Chinnaswamy Stadium  Kolkata Knight Riders  ...           0   222
1  M Chinnaswamy Stadium  Kolkata Knight Riders  ...           0   222
2  M Chinnaswamy Stadium  Kolkata Knight Riders  ...           0   222
3  M Chinnaswamy Stadium  Kolkata Knight Riders  ...           0   222
4  M Chinnaswamy Stadium  Kolkata Knight Riders  ...           0   222

[5 rows x 13 columns]


In [None]:
print(df.dtypes)

venue              object
bat_team           object
bowl_team          object
batsman            object
bowler             object
runs                int64
wickets             int64
overs             float64
runs_last_5         int64
wickets_last_5      int64
striker             int64
non-striker         int64
total               int64
dtype: object


#**Step 1 : Initializing Train-Test variables**
X = Features <br>
y = Target 


In [None]:
y = df["total"]
X = df.drop('total',axis=1)

In [None]:
print(y.dtypes)
print(X.dtypes)

int64
venue              object
bat_team           object
bowl_team          object
batsman            object
bowler             object
runs                int64
wickets             int64
overs             float64
runs_last_5         int64
wickets_last_5      int64
striker             int64
non-striker         int64
dtype: object


In [None]:
X = pd.get_dummies(data = X,columns = ['venue','bat_team','bowl_team','batsman','bowler'],drop_first=True)
X.head()
# df.dtypes

Unnamed: 0,runs,wickets,overs,runs_last_5,wickets_last_5,striker,non-striker,venue_Brabourne Stadium,venue_Buffalo Park,venue_De Beers Diamond Oval,venue_Dr DY Patil Sports Academy,venue_Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium,venue_Dubai International Cricket Stadium,venue_Eden Gardens,venue_Feroz Shah Kotla,venue_Green Park,venue_Himachal Pradesh Cricket Association Stadium,venue_Holkar Cricket Stadium,venue_JSCA International Stadium Complex,venue_Kingsmead,venue_M Chinnaswamy Stadium,"venue_MA Chidambaram Stadium, Chepauk",venue_Maharashtra Cricket Association Stadium,venue_Nehru Stadium,venue_New Wanderers Stadium,venue_Newlands,venue_OUTsurance Oval,"venue_Punjab Cricket Association IS Bindra Stadium, Mohali","venue_Punjab Cricket Association Stadium, Mohali","venue_Rajiv Gandhi International Stadium, Uppal","venue_Sardar Patel Stadium, Motera",venue_Saurashtra Cricket Association Stadium,venue_Sawai Mansingh Stadium,venue_Shaheed Veer Narayan Singh International Stadium,venue_Sharjah Cricket Stadium,venue_Sheikh Zayed Stadium,venue_St George's Park,venue_Subrata Roy Sahara Stadium,venue_SuperSport Park,"venue_Vidarbha Cricket Association Stadium, Jamtha",...,bowler_Shivam Sharma,bowler_Shoaib Ahmed,bowler_Shoaib Malik,bowler_Sohail Tanvir,bowler_Sunny Gupta,bowler_Swapnil Singh,bowler_T Henderson,bowler_T Natarajan,bowler_T Shamsi,bowler_T Thushara,bowler_TA Boult,bowler_TG Southee,bowler_TL Suman,bowler_TM Dilshan,bowler_TM Head,bowler_TP Sudhindra,bowler_TS Mills,bowler_UT Yadav,bowler_Umar Gul,bowler_V Kohli,bowler_V Pratap Singh,bowler_V Sehwag,bowler_V Shankar,bowler_VR Aaron,bowler_VRV Singh,bowler_VS Malik,bowler_VS Yeligati,bowler_VY Mahesh,bowler_WA Mota,bowler_WD Parnell,bowler_WPUJC Vaas,bowler_Washington Sundar,bowler_Y Gnaneswara Rao,bowler_Y Nagar,bowler_Y Venugopal Rao,bowler_YA Abdulla,bowler_YK Pathan,bowler_YS Chahal,bowler_Yuvraj Singh,bowler_Z Khan
0,1,0,0.1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0.2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2,0,0.2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,2,0,0.3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,2,0,0.4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


* **Features should be of the type : Arrays/DataFrames**

In [None]:
print(type(X))

<class 'pandas.core.frame.DataFrame'>


* **Features should be in the from of rows and columns**

In [None]:
print(X.shape)

(76014, 805)


#**Step 2 : Splitting into Training and Testing sets**

In [None]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.3,random_state = 38)

**Feature scaling**


In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

*** Note : Fitting is done on Training data, NOT ON Testing data**

In [None]:
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#**Step 3 : Using Regression**
* Regression is peformed on Continuous data<br>
* Classification is peformed on Discrete data


**Performing Random Forest Regression** 

In [None]:
from sklearn.ensemble import RandomForestRegressor

forest = RandomForestRegressor()

**Fitting and Scoring**

In [None]:
forest.fit(X_train,y_train)

forest_testing = forest.score(X_test,y_test)

print(f'RandomForestRegressor Testing Score : {forest_testing}')

RandomForestRegressor Testing Score : 0.9420298884881197


In [None]:
forest_training = forest.score(X_train,y_train)

print(f'RandomForestRegressor Training Score : {forest_training}')

RandomForestRegressor Training Score : 0.9915195669058396


**Performing Dicision Tree Regressor**

In [None]:
from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor()

**Fitting and Scoring**

In [None]:
tree.fit(X_train,y_train)

tree_testing = tree.score(X_test,y_test)

print(f'DecisionTreeRegressor Testing Score : {tree_testing}')

DecisionTreeRegressor Testing Score : 0.8711300054695686


In [None]:
tree_training = tree.score(X_train,y_train)

print(f'DecisionTreeRegressor Testing Score : {tree_training}')

DecisionTreeRegressor Testing Score : 0.9999925603139226


#**Step 4 : Let's Predict the Total Score**

**Creating a data frame**<br>
Since creating data with so many columns would be difficult,
so I have extracted few rows from the original data set.

In [None]:
data = X.iloc[[0,150]]
print(data.head())

     runs  wickets  overs  ...  bowler_YS Chahal  bowler_Yuvraj Singh  bowler_Z Khan
0       1        0    0.1  ...                 0                    0              0
150    33        1    4.1  ...                 0                    0              0

[2 rows x 805 columns]


**Random Forest Regressor**

In [None]:
RFR_Pre = forest.predict(data)

print(f'Random Forest Regressor Prediction \n\n{RFR_Pre}')

Random Forest Regressor Prediction 

[196.85 248.1 ]


**Decision Tree Regressor**

In [None]:
DTR_Pre = tree.predict(data)

print(f'Decision Tree Regressor Prediction \n\n{DTR_Pre}')

Decision Tree Regressor Prediction 

[197. 246.]


**Original Total Score**

In [None]:
Original_Score = y.iloc[[0,150]]

print(f'Original Total Score \n\n{Original_Score}')

Original Total Score 

0      222
150    240
Name: total, dtype: int64
