<a href="https://colab.research.google.com/github/anudeepayina/CricketTracker/blob/master/IPL_Seasonal_Predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

In [None]:
final = pd.read_csv("finaldata.csv")

# 2019 Prediction

In [None]:
final_train = final.drop(final.loc[final["season"]==2019].index)
final_test = final[final["season"]==2019]
x_train = final_train.drop(["season","winning_team"],axis=1)
x_test = final_test.drop(["season","winning_team"],axis=1)
y_train = final_train["winning_team"]
y_test = final_test["winning_team"]

# Random Forest Classifier

In [None]:
rf = RandomForestClassifier()

In [None]:
params = {"max_depth": [1,2,3],
          "n_estimators": [5,10,15,20,70],
          "criterion": ["gini","entropy"],
          "class_weight": [{1:w} for w in [1.0,1.1,1.2,1.3]]}

In [None]:
gridsearch = GridSearchCV(param_grid = params, cv = 3, estimator = rf, scoring = "roc_auc",verbose=1,n_jobs=3)

In [None]:
gridsearch.fit(x_train,y_train)
print("best parameters are:", gridsearch.best_params_)
print("best score is:", gridsearch.best_score_)

Fitting 3 folds for each of 120 candidates, totalling 360 fits


[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  58 tasks      | elapsed:    3.7s


best parameters are: {'class_weight': {1: 1.3}, 'criterion': 'gini', 'max_depth': 3, 'n_estimators': 20}
best score is: 0.569566424389259


[Parallel(n_jobs=3)]: Done 360 out of 360 | elapsed:   14.0s finished


In [None]:
##Instantiate rf with parameters from above 
rf = RandomForestClassifier(criterion = "entropy", max_depth = 3, n_estimators = 15,random_state=123,class_weight = {1:1.1})

In [None]:
rf.fit(x_train,y_train)
yhat = rf.predict(x_test)
report_RF = accuracy_score(y_test,yhat)

# Decision Tree

In [None]:
dt = DecisionTreeClassifier()

In [None]:
params = params = {"max_depth": [1,2,3,4],
          "criterion": ["entropy","gini"],
          "splitter": ["best","random"],
          "class_weight": [{1:w} for w in [1.0,1.1,1.2,1.3]]}

In [None]:
gridsearch = gridsearch = GridSearchCV(param_grid = params, estimator = dt, cv = 3, scoring = "roc_auc", verbose=1, n_jobs = 3)

In [None]:
gridsearch.fit(x_train,y_train)
print("best parameters are:", gridsearch.best_params_)
print("best score is:", gridsearch.best_score_)

Fitting 3 folds for each of 64 candidates, totalling 192 fits


[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.


best parameters are: {'class_weight': {1: 1.2}, 'criterion': 'entropy', 'max_depth': 4, 'splitter': 'random'}
best score is: 0.5289344841510196


[Parallel(n_jobs=3)]: Done 179 tasks      | elapsed:    0.9s
[Parallel(n_jobs=3)]: Done 187 out of 192 | elapsed:    0.9s remaining:    0.0s
[Parallel(n_jobs=3)]: Done 192 out of 192 | elapsed:    1.0s finished


In [None]:
#Instantiate classifier with above parameters
dt = DecisionTreeClassifier(criterion = "gini", max_depth=4, splitter = "best",class_weight = {1:1.3})

In [None]:
dt.fit(x_train,y_train)
yhat = dt.predict(x_test)
report_dt = accuracy_score(y_test,yhat)

# XGBClassifier

In [None]:
gbm = XGBClassifier()

In [None]:
param = {"max_depth": [1,2,3],
         "learning_rate": [0.1,0.01,0.001],
         "colsample_bytree": [0.4,0.5],
         "n_estimators": [70],
         "reg_lambda": [4,5],
         "reg_alpha": [1,2,3]}

In [None]:
gridsearch = GridSearchCV(param_grid = param,estimator=gbm, scoring="roc_auc", cv=4,verbose=1,n_jobs=3)

In [None]:
gridsearch.fit(x_train,y_train)
print("Best roc_auc score is", gridsearch.best_score_)
print("Best parameters are", gridsearch.best_params_)

Fitting 4 folds for each of 108 candidates, totalling 432 fits


[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  82 tasks      | elapsed:    2.8s
[Parallel(n_jobs=3)]: Done 382 tasks      | elapsed:   12.9s


Best roc_auc score is 0.6174688273166892
Best parameters are {'colsample_bytree': 0.4, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 70, 'reg_alpha': 1, 'reg_lambda': 4}


[Parallel(n_jobs=3)]: Done 432 out of 432 | elapsed:   14.7s finished


In [None]:
gbm = XGBClassifier(colsample_bytree=0.5,learning_rate=0.1,max_depth=3,n_estimators=70,reg_aplha=2,reg_lambda=5,objective="binary:hinge",verbose=1)

In [None]:
gbm.fit(x_train,y_train)
yhat = gbm.predict(x_test)
report_xgb = accuracy_score(y_test,yhat)

In [None]:
scores = [report_RF,report_dt,report_xgb]

final_results = pd.DataFrame(scores).transpose()
final_results.columns=["Random Forest", "Decision Tree Classifier", "XGBClassifer"]
final_results.head()

Unnamed: 0,Random Forest,Decision Tree Classifier,XGBClassifer
0,0.644068,0.627119,0.627119
