## LOG-LOSS

To define log-loss mathematically, start by defining  pij  as the probability the model will assign label  j  to record  i ;  N  as the number of records;  M  as the number of class labels, and  yij  as an indicator variable which is 1 if record  i  is assigned class  j  by the model, and 0 otherwise.

Then log-loss is "simply" :

              logloss(⋅)=−1N∑iN∑jMyijlogpij
 
In other words, log-loss is a logarithmic transform of the sum of the probabilities the model assigns to the records it misclassifies.

In [1]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import cross_val_score

import time
from sklearn.model_selection import *
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import *
import matplotlib.pyplot as plt
%matplotlib inline
import re

pd.set_option('display.max_rows',None)

In [2]:
df_train= pd.read_csv("ODI_Train.csv")
print(df_train.shape)

df_test= pd.read_csv("ODI_Test.csv")
print(df_test.shape)

print(df_train.head())
print(df_test.tail())

(2508, 10)
(1075, 9)
   Team1  Team2  Stadium  HostCountry Team1_Venue Team2_Venue Team1_Innings  \
0      5      4       37            4        Home        Away        Second   
1      1     14       84            7     Neutral     Neutral         First   
2      9     15       47            9        Home        Away         First   
3      7      2      102            6        Home        Away         First   
4      6      8       46            5        Home        Away         First   

  Team2_Innings MonthOfMatch  MatchWinner  
0         First          Dec            4  
1        Second          Sep            1  
2        Second          Feb            9  
3        Second          Aug            2  
4        Second          Aug            6  
      Team1  Team2  Stadium  HostCountry Team1_Venue Team2_Venue  \
1070     15      5       64           16        Home        Away   
1071      1     12       95            0        Home        Away   
1072      5     10       43         

In [3]:
df_train1=df_train.copy()
df_train1.drop('MatchWinner',axis=1,inplace=True)
df_train1.head()
df=df_train1.append(df_test,ignore_index=True)
print(df.shape)

(3583, 9)


In [4]:
print(df.head())
print(df.tail())

   Team1  Team2  Stadium  HostCountry Team1_Venue Team2_Venue Team1_Innings  \
0      5      4       37            4        Home        Away        Second   
1      1     14       84            7     Neutral     Neutral         First   
2      9     15       47            9        Home        Away         First   
3      7      2      102            6        Home        Away         First   
4      6      8       46            5        Home        Away         First   

  Team2_Innings MonthOfMatch  
0         First          Dec  
1        Second          Sep  
2        Second          Feb  
3        Second          Aug  
4        Second          Aug  
      Team1  Team2  Stadium  HostCountry Team1_Venue Team2_Venue  \
3578     15      5       64           16        Home        Away   
3579      1     12       95            0        Home        Away   
3580      5     10       43            1     Neutral     Neutral   
3581     10     13      111            0     Neutral     Neutral   

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3583 entries, 0 to 3582
Data columns (total 9 columns):
Team1            3583 non-null int64
Team2            3583 non-null int64
Stadium          3583 non-null int64
HostCountry      3583 non-null int64
Team1_Venue      3583 non-null object
Team2_Venue      3583 non-null object
Team1_Innings    3583 non-null object
Team2_Innings    3583 non-null object
MonthOfMatch     3583 non-null object
dtypes: int64(4), object(5)
memory usage: 252.0+ KB


In [6]:
df.describe()

Unnamed: 0,Team1,Team2,Stadium,HostCountry
count,3583.0,3583.0,3583.0,3583.0
mean,7.304773,9.164387,72.730394,7.566285
std,4.656641,4.560046,43.588258,5.618924
min,0.0,0.0,0.0,0.0
25%,4.0,5.0,36.0,3.0
50%,7.0,10.0,69.0,9.0
75%,12.0,13.0,111.0,13.0
max,15.0,15.0,151.0,16.0


## CATEGORICAL VARIABLE ENCODING

https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02

1. Innings must be OHE 
2. Venue must be ordinal 
3. Month must be ordinal

In [7]:
df_inn_t1= pd.get_dummies(df["Team1_Innings"], prefix='Team_1')
print(df_inn_t1.head())

df_inn_t2= pd.get_dummies(df["Team2_Innings"], prefix='Team_2')
print(df_inn_t2.head())

## Merging venues for both teams on index
df_inn=pd.merge(df_inn_t1,df_inn_t2,left_index=True, right_index=True)
print(df_inn.head())

   Team_1_First  Team_1_Second
0             0              1
1             1              0
2             1              0
3             1              0
4             1              0
   Team_2_First  Team_2_Second
0             1              0
1             0              1
2             0              1
3             0              1
4             0              1
   Team_1_First  Team_1_Second  Team_2_First  Team_2_Second
0             0              1             1              0
1             1              0             0              1
2             1              0             0              1
3             1              0             0              1
4             1              0             0              1


In [8]:
df_venue_t1= pd.get_dummies(df["Team1_Venue"], prefix='Team_1')
print(df_venue_t1.head())

df_venue_t2= pd.get_dummies(df["Team2_Venue"], prefix='Team_2')
print(df_venue_t2.head())

## Merging venues for both teams on index
df_venue=pd.merge(df_venue_t1,df_venue_t2,left_index=True, right_index=True)
print(df_venue.head())

   Team_1_Away  Team_1_Home  Team_1_Neutral
0            0            1               0
1            0            0               1
2            0            1               0
3            0            1               0
4            0            1               0
   Team_2_Away  Team_2_Home  Team_2_Neutral
0            1            0               0
1            0            0               1
2            1            0               0
3            1            0               0
4            1            0               0
   Team_1_Away  Team_1_Home  Team_1_Neutral  Team_2_Away  Team_2_Home  \
0            0            1               0            1            0   
1            0            0               1            0            0   
2            0            1               0            1            0   
3            0            1               0            1            0   
4            0            1               0            1            0   

   Team_2_Neutral  
0            

In [9]:
months = {'Jan':1, 'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12}

df['MonthOfMatch']=df['MonthOfMatch'].map(months)
df.head()

Unnamed: 0,Team1,Team2,Stadium,HostCountry,Team1_Venue,Team2_Venue,Team1_Innings,Team2_Innings,MonthOfMatch
0,5,4,37,4,Home,Away,Second,First,12
1,1,14,84,7,Neutral,Neutral,First,Second,9
2,9,15,47,9,Home,Away,First,Second,2
3,7,2,102,6,Home,Away,First,Second,8
4,6,8,46,5,Home,Away,First,Second,8


In [10]:
df_encode=pd.merge(df,df_inn,left_index=True, right_index=True)
df_encoded=pd.merge(df_encode,df_venue,left_index=True, right_index=True)
print(df_encoded.head())

   Team1  Team2  Stadium  HostCountry Team1_Venue Team2_Venue Team1_Innings  \
0      5      4       37            4        Home        Away        Second   
1      1     14       84            7     Neutral     Neutral         First   
2      9     15       47            9        Home        Away         First   
3      7      2      102            6        Home        Away         First   
4      6      8       46            5        Home        Away         First   

  Team2_Innings  MonthOfMatch  Team_1_First  Team_1_Second  Team_2_First  \
0         First            12             0              1             1   
1        Second             9             1              0             0   
2        Second             2             1              0             0   
3        Second             8             1              0             0   
4        Second             8             1              0             0   

   Team_2_Second  Team_1_Away  Team_1_Home  Team_1_Neutral  Team_2_A

In [11]:
df_encoded.drop(['Team1_Innings','Team2_Innings','Team1_Venue','Team2_Venue','Stadium','Team_2_First','Team_2_Second','Team_2_Away','Team_2_Home','Team_2_Neutral','MonthOfMatch'],axis=1,inplace=True)
df_encoded.head()

Unnamed: 0,Team1,Team2,HostCountry,Team_1_First,Team_1_Second,Team_1_Away,Team_1_Home,Team_1_Neutral
0,5,4,4,0,1,0,1,0
1,1,14,7,1,0,0,0,1
2,9,15,9,1,0,0,1,0
3,7,2,6,1,0,0,1,0
4,6,8,5,1,0,0,1,0


In [12]:
print(df_encoded.shape)
print(df_encoded.info())

(3583, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3583 entries, 0 to 3582
Data columns (total 8 columns):
Team1             3583 non-null int64
Team2             3583 non-null int64
HostCountry       3583 non-null int64
Team_1_First      3583 non-null uint8
Team_1_Second     3583 non-null uint8
Team_1_Away       3583 non-null uint8
Team_1_Home       3583 non-null uint8
Team_1_Neutral    3583 non-null uint8
dtypes: int64(3), uint8(5)
memory usage: 101.5 KB
None


In [13]:
train_encoded=df_encoded.iloc[0:2508,:]
train_encoded.shape

#print(train_encoded.tail())
#print(df_train.tail())

(2508, 8)

In [14]:
test_encoded=df_encoded.iloc[2508:,:]
test_encoded.shape

#print(test_encoded.head())
#print(df_test.head())

(1075, 8)

In [15]:
X=train_encoded.copy()
y=df_train['MatchWinner']

In [16]:
Xt=test_encoded.copy()
features = [c for c in X.columns]

In [17]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
logis_reg = LogisticRegression(random_state=0, solver='lbfgs',multi_class='multinomial')
logis_reg.fit(X_train,y_train)
y_pred = logis_reg.predict_proba(X_test)

In [18]:
from sklearn.metrics import log_loss
loss= log_loss(y_test,y_pred,labels=logis_reg.classes_)
print(loss)

2.1156726736453795


In [19]:
scores = cross_val_score(logis_reg, X, y, cv=6,scoring='neg_log_loss')
print ("Cross-validated scores:", scores)

Cross-validated scores: [-2.10530812 -2.04733223 -2.05392314 -2.11426896 -2.1221834  -2.12801994]


In [131]:
training_start_time = time.time()

losses= list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
reg_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    log_reg = LogisticRegression(random_state=0)
    _ = log_reg.fit(X_train, y_train)
    
    oofs[val_idx] = log_reg.predict_proba(X_val)
    current_test_pred = log_reg.predict_proba(X_test)
    reg_preds += log_reg.predict_proba(X_test)/max_iter


---- Fold 0 -----


---- Fold 1 -----


---- Fold 2 -----


---- Fold 3 -----


---- Fold 4 -----


---- Fold 5 -----


---- Fold 6 -----


---- Fold 7 -----


---- Fold 8 -----


---- Fold 9 -----



In [218]:
from xgboost import XGBClassifier


training_start_time = time.time()

losses= list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
xgb_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    xgb_clf = XGBClassifier(n_estimators=1000, num_leaves=127, max_depth=1,min_child_samples=4, learning_rate=0.02, colsample_bytree=0.4, reg_alpha=0.5, reg_lambda=2)
    _ = xgb_clf.fit(X_train, y_train,eval_set = [(X_val, y_val)], verbose=100, early_stopping_rounds=100, eval_metric='mlogloss')
    
    oofs[val_idx] = xgb_clf.predict_proba(X_val)
    current_test_pred = xgb_clf.predict_proba(X_test)
    xgb_preds += xgb_clf.predict_proba(X_test)/max_iter


---- Fold 0 -----

[0]	validation_0-mlogloss:2.75342
Will train until validation_0-mlogloss hasn't improved in 100 rounds.
[100]	validation_0-mlogloss:2.02238
[200]	validation_0-mlogloss:1.76399
[300]	validation_0-mlogloss:1.62014
[400]	validation_0-mlogloss:1.51698
[500]	validation_0-mlogloss:1.4347
[600]	validation_0-mlogloss:1.36448
[700]	validation_0-mlogloss:1.30714
[800]	validation_0-mlogloss:1.25438
[900]	validation_0-mlogloss:1.20574
[999]	validation_0-mlogloss:1.16291

---- Fold 1 -----

[0]	validation_0-mlogloss:2.75435
Will train until validation_0-mlogloss hasn't improved in 100 rounds.
[100]	validation_0-mlogloss:2.04697
[200]	validation_0-mlogloss:1.80614
[300]	validation_0-mlogloss:1.66731
[400]	validation_0-mlogloss:1.56637
[500]	validation_0-mlogloss:1.48372
[600]	validation_0-mlogloss:1.41106
[700]	validation_0-mlogloss:1.35152
[800]	validation_0-mlogloss:1.29821
[900]	validation_0-mlogloss:1.24897
[999]	validation_0-mlogloss:1.20629

---- Fold 2 -----

[0]	validatio

In [22]:
from lightgbm import LGBMClassifier


training_start_time = time.time()

losses= list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
lgbm_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    lgbm_clf = LGBMClassifier(n_estimators=1000,objective="multiclass", num_leaves=127, max_depth=1,min_child_samples=4, learning_rate=0.02, colsample_bytree=0.4, reg_alpha=0.5, reg_lambda=2)
    _ = lgbm_clf.fit(X_train, y_train,eval_set = [(X_val, y_val)], verbose=100, early_stopping_rounds=100, eval_metric='logloss')
    
    oofs[val_idx] = lgbm_clf.predict_proba(X_val)
    current_test_pred = lgbm_clf.predict_proba(X_test)
    lgbm_preds += lgbm_clf.predict_proba(X_test)/max_iter


---- Fold 0 -----

Training until validation scores don't improve for 100 rounds
[100]	valid_0's multi_logloss: 1.9381
[200]	valid_0's multi_logloss: 1.70858
[300]	valid_0's multi_logloss: 1.55993
[400]	valid_0's multi_logloss: 1.45124
[500]	valid_0's multi_logloss: 1.36359
[600]	valid_0's multi_logloss: 1.29087
[700]	valid_0's multi_logloss: 1.22843
[800]	valid_0's multi_logloss: 1.17438
[900]	valid_0's multi_logloss: 1.12802
[1000]	valid_0's multi_logloss: 1.08843
Did not meet early stopping. Best iteration is:
[1000]	valid_0's multi_logloss: 1.08843

---- Fold 1 -----

Training until validation scores don't improve for 100 rounds
[100]	valid_0's multi_logloss: 1.97265
[200]	valid_0's multi_logloss: 1.75388
[300]	valid_0's multi_logloss: 1.61077
[400]	valid_0's multi_logloss: 1.50227
[500]	valid_0's multi_logloss: 1.41519
[600]	valid_0's multi_logloss: 1.34143
[700]	valid_0's multi_logloss: 1.27952
[800]	valid_0's multi_logloss: 1.22391
[900]	valid_0's multi_logloss: 1.17481
[1000]	

In [20]:
from catboost import CatBoostClassifier


training_start_time = time.time()

losses= list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
cat_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    cat_clf = CatBoostClassifier(n_estimators=2000, learning_rate=0.05, max_depth=9,loss_function='MultiClass')
    _ = cat_clf.fit(X_train, y_train,eval_set = [(X_val, y_val)], verbose=100, early_stopping_rounds=100)
    
    oofs[val_idx] = cat_clf.predict_proba(X_val)
    current_test_pred = cat_clf.predict_proba(X_test)
    cat_preds += cat_clf.predict_proba(X_test)/max_iter


---- Fold 0 -----

0:	learn: 2.6603544	test: 2.6687994	best: 2.6687994 (0)	total: 472ms	remaining: 15m 43s
100:	learn: 0.9007928	test: 1.0424689	best: 1.0424689 (100)	total: 3.26s	remaining: 1m 1s
200:	learn: 0.6996545	test: 0.8999787	best: 0.8999787 (200)	total: 6.05s	remaining: 54.2s
300:	learn: 0.6177316	test: 0.8475304	best: 0.8475304 (300)	total: 8.83s	remaining: 49.8s
400:	learn: 0.5740887	test: 0.8234416	best: 0.8234416 (400)	total: 11.5s	remaining: 46s
500:	learn: 0.5472158	test: 0.8135146	best: 0.8135146 (500)	total: 14.3s	remaining: 42.8s
600:	learn: 0.5291613	test: 0.8081130	best: 0.8081130 (600)	total: 17.1s	remaining: 39.8s
700:	learn: 0.5165014	test: 0.8058408	best: 0.8058216 (691)	total: 19.9s	remaining: 36.9s
800:	learn: 0.5067722	test: 0.8061321	best: 0.8051066 (748)	total: 22.6s	remaining: 33.9s
Stopped by overfitting detector  (100 iterations wait)

bestTest = 0.8051066342
bestIteration = 748

Shrink model to first 749 iterations.

---- Fold 1 -----

0:	learn: 2.679

Stopped by overfitting detector  (100 iterations wait)

bestTest = 0.9166034363
bestIteration = 453

Shrink model to first 454 iterations.

---- Fold 9 -----

0:	learn: 2.6815041	test: 2.6890848	best: 2.6890848 (0)	total: 87.6ms	remaining: 2m 55s
100:	learn: 0.8940344	test: 1.0608327	best: 1.0608327 (100)	total: 3.18s	remaining: 59.9s
200:	learn: 0.6946512	test: 0.9271851	best: 0.9271851 (200)	total: 6.94s	remaining: 1m 2s
300:	learn: 0.6137450	test: 0.8876944	best: 0.8876944 (300)	total: 10.3s	remaining: 58s
400:	learn: 0.5691447	test: 0.8703564	best: 0.8703564 (400)	total: 13.8s	remaining: 54.8s
500:	learn: 0.5422303	test: 0.8646476	best: 0.8643982 (498)	total: 17.2s	remaining: 51.5s
600:	learn: 0.5245790	test: 0.8651701	best: 0.8642311 (523)	total: 20.9s	remaining: 48.6s
Stopped by overfitting detector  (100 iterations wait)

bestTest = 0.8642311179
bestIteration = 523

Shrink model to first 524 iterations.


In [23]:
from sklearn.ensemble import RandomForestClassifier

training_start_time = time.time()

losses= list()
max_iter = 10
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
rfg_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    rfg_clf= RandomForestClassifier(n_estimators=1000,random_state=1234,max_depth=8)
    _ = rfg_clf.fit(X_train, y_train)
    
    oofs[val_idx] = rfg_clf.predict_proba(X_val)
    current_test_pred = rfg_clf.predict_proba(X_test)
    rfg_preds += rfg_clf.predict_proba(X_test)/max_iter


---- Fold 0 -----


---- Fold 1 -----


---- Fold 2 -----


---- Fold 3 -----


---- Fold 4 -----


---- Fold 5 -----


---- Fold 6 -----


---- Fold 7 -----


---- Fold 8 -----


---- Fold 9 -----



In [221]:
from sklearn.ensemble import StackingClassifier


training_start_time = time.time()

losses= list()
max_iter = 2
folds = StratifiedKFold(n_splits = max_iter)
oofs = np.zeros((len(X),16))
stack_preds = np.zeros((len(Xt),16))


for fold_, (train_idx, val_idx) in enumerate(folds.split(X, pd.qcut(y, 10, labels=False, duplicates='drop'))):
    
    print(f'\n---- Fold {fold_} -----\n')
    
    X_train, y_train = X.iloc[train_idx][features], y.iloc[train_idx]
    X_val, y_val = X.iloc[val_idx][features], y.iloc[val_idx]
    X_test = Xt[features]
    
    estimators = [(rfg_clf,lgbm_clf)]
    
    stack_clf= StackingClassifier(estimators,final_estimator= cat_clf)
    _ = stack_clf.fit(X_train, y_train)
    
    oofs[val_idx] = stack_clf.predict_proba(X_val)
    current_test_pred = stack_clf.predict_proba(X_test)
    stack_preds += stack_clf.predict_proba(X_test)/max_iter


---- Fold 0 -----

0:	learn: 2.6842171	total: 2.06s	remaining: 1h 8m 45s
1:	learn: 2.6105911	total: 3.08s	remaining: 51m 20s
2:	learn: 2.5494770	total: 4.01s	remaining: 44m 33s
3:	learn: 2.4504372	total: 5.39s	remaining: 44m 48s
4:	learn: 2.3973939	total: 6.44s	remaining: 42m 50s
5:	learn: 2.3355398	total: 7.43s	remaining: 41m 7s
6:	learn: 2.2835807	total: 8.35s	remaining: 39m 37s
7:	learn: 2.2357648	total: 9.14s	remaining: 37m 56s
8:	learn: 2.1870443	total: 9.95s	remaining: 36m 42s
9:	learn: 2.1253055	total: 10.8s	remaining: 35m 40s
10:	learn: 2.0875649	total: 11.6s	remaining: 34m 57s
11:	learn: 2.0474058	total: 12.6s	remaining: 34m 41s
12:	learn: 2.0053592	total: 13.5s	remaining: 34m 24s
13:	learn: 1.9694641	total: 14.5s	remaining: 34m 14s
14:	learn: 1.9288722	total: 15.4s	remaining: 33m 55s
15:	learn: 1.8838267	total: 16.2s	remaining: 33m 30s
16:	learn: 1.8444289	total: 17s	remaining: 33m
17:	learn: 1.8106803	total: 18s	remaining: 33m 3s
18:	learn: 1.7761422	total: 19s	remaining: 3

KeyboardInterrupt: 

In [25]:
from sklearn.ensemble import StackingClassifier
oofs = np.zeros((len(X),16))
stack_preds = np.zeros((len(Xt),16))

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.25, random_state = 0)
X_test = Xt[features]
    
estimators = [(rfg_clf,lgbm_clf)]
    
stack_clf= StackingClassifier(estimators,final_estimator= cat_clf)
_ = stack_clf.fit(X_train, y_train)
    


0:	learn: 2.6611649	total: 929ms	remaining: 30m 56s
1:	learn: 2.5438151	total: 1.84s	remaining: 30m 38s
2:	learn: 2.4592876	total: 2.77s	remaining: 30m 40s
3:	learn: 2.3432011	total: 3.67s	remaining: 30m 34s
4:	learn: 2.2525520	total: 4.49s	remaining: 29m 50s
5:	learn: 2.1779341	total: 5.3s	remaining: 29m 22s
6:	learn: 2.1163185	total: 6.07s	remaining: 28m 49s
7:	learn: 2.0506220	total: 7.04s	remaining: 29m 11s
8:	learn: 1.9927623	total: 7.93s	remaining: 29m 15s
9:	learn: 1.9357151	total: 8.78s	remaining: 29m 6s
10:	learn: 1.8948612	total: 9.67s	remaining: 29m 9s
11:	learn: 1.8530157	total: 10.5s	remaining: 28m 58s
12:	learn: 1.8070664	total: 11.4s	remaining: 28m 55s
13:	learn: 1.7619354	total: 12.2s	remaining: 28m 53s
14:	learn: 1.7264599	total: 13s	remaining: 28m 45s
15:	learn: 1.6834963	total: 13.9s	remaining: 28m 49s
16:	learn: 1.6491630	total: 14.7s	remaining: 28m 39s
17:	learn: 1.6173117	total: 15.6s	remaining: 28m 35s
18:	learn: 1.5839971	total: 16.4s	remaining: 28m 32s
19:	lear

154:	learn: 0.5828865	total: 2m 27s	remaining: 29m 16s
155:	learn: 0.5811893	total: 2m 28s	remaining: 29m 15s
156:	learn: 0.5795375	total: 2m 29s	remaining: 29m 15s
157:	learn: 0.5778344	total: 2m 30s	remaining: 29m 16s
158:	learn: 0.5760798	total: 2m 31s	remaining: 29m 15s
159:	learn: 0.5744953	total: 2m 32s	remaining: 29m 14s
160:	learn: 0.5728524	total: 2m 33s	remaining: 29m 12s
161:	learn: 0.5711813	total: 2m 34s	remaining: 29m 12s
162:	learn: 0.5695641	total: 2m 35s	remaining: 29m 13s
163:	learn: 0.5679238	total: 2m 36s	remaining: 29m 13s
164:	learn: 0.5664651	total: 2m 37s	remaining: 29m 14s
165:	learn: 0.5649491	total: 2m 38s	remaining: 29m 14s
166:	learn: 0.5631894	total: 2m 39s	remaining: 29m 12s
167:	learn: 0.5616955	total: 2m 40s	remaining: 29m 11s
168:	learn: 0.5601811	total: 2m 41s	remaining: 29m 14s
169:	learn: 0.5585314	total: 2m 43s	remaining: 29m 14s
170:	learn: 0.5569600	total: 2m 43s	remaining: 29m 14s
171:	learn: 0.5555191	total: 2m 44s	remaining: 29m 12s
172:	learn

305:	learn: 0.4356418	total: 4m 55s	remaining: 27m 17s
306:	learn: 0.4350474	total: 4m 56s	remaining: 27m 15s
307:	learn: 0.4344640	total: 4m 57s	remaining: 27m 14s
308:	learn: 0.4338515	total: 4m 58s	remaining: 27m 13s
309:	learn: 0.4333961	total: 4m 59s	remaining: 27m 12s
310:	learn: 0.4327417	total: 5m	remaining: 27m 10s
311:	learn: 0.4321421	total: 5m 1s	remaining: 27m 9s
312:	learn: 0.4313957	total: 5m 2s	remaining: 27m 8s
313:	learn: 0.4309420	total: 5m 3s	remaining: 27m 7s
314:	learn: 0.4302881	total: 5m 3s	remaining: 27m 5s
315:	learn: 0.4298703	total: 5m 4s	remaining: 27m 4s
316:	learn: 0.4292210	total: 5m 5s	remaining: 27m 3s
317:	learn: 0.4286749	total: 5m 6s	remaining: 27m 3s
318:	learn: 0.4282546	total: 5m 7s	remaining: 27m 2s
319:	learn: 0.4275703	total: 5m 8s	remaining: 27m 1s
320:	learn: 0.4270867	total: 5m 9s	remaining: 27m
321:	learn: 0.4265411	total: 5m 10s	remaining: 26m 59s
322:	learn: 0.4260641	total: 5m 11s	remaining: 26m 57s
323:	learn: 0.4254615	total: 5m 12s	r

456:	learn: 0.3771631	total: 7m 20s	remaining: 24m 47s
457:	learn: 0.3769452	total: 7m 21s	remaining: 24m 46s
458:	learn: 0.3766699	total: 7m 22s	remaining: 24m 45s
459:	learn: 0.3764250	total: 7m 23s	remaining: 24m 44s
460:	learn: 0.3762297	total: 7m 24s	remaining: 24m 43s
461:	learn: 0.3758746	total: 7m 25s	remaining: 24m 42s
462:	learn: 0.3755531	total: 7m 26s	remaining: 24m 41s
463:	learn: 0.3753424	total: 7m 27s	remaining: 24m 40s
464:	learn: 0.3751157	total: 7m 28s	remaining: 24m 39s
465:	learn: 0.3748068	total: 7m 29s	remaining: 24m 38s
466:	learn: 0.3746487	total: 7m 29s	remaining: 24m 37s
467:	learn: 0.3743495	total: 7m 30s	remaining: 24m 35s
468:	learn: 0.3740872	total: 7m 31s	remaining: 24m 34s
469:	learn: 0.3739230	total: 7m 32s	remaining: 24m 33s
470:	learn: 0.3736808	total: 7m 33s	remaining: 24m 32s
471:	learn: 0.3734178	total: 7m 34s	remaining: 24m 31s
472:	learn: 0.3731133	total: 7m 35s	remaining: 24m 30s
473:	learn: 0.3728329	total: 7m 36s	remaining: 24m 29s
474:	learn

606:	learn: 0.3469040	total: 9m 49s	remaining: 22m 31s
607:	learn: 0.3467369	total: 9m 50s	remaining: 22m 31s
608:	learn: 0.3465849	total: 9m 51s	remaining: 22m 30s
609:	learn: 0.3463769	total: 9m 52s	remaining: 22m 29s
610:	learn: 0.3462121	total: 9m 53s	remaining: 22m 28s
611:	learn: 0.3459646	total: 9m 53s	remaining: 22m 27s
612:	learn: 0.3458822	total: 9m 54s	remaining: 22m 26s
613:	learn: 0.3457199	total: 9m 55s	remaining: 22m 25s
614:	learn: 0.3454932	total: 9m 56s	remaining: 22m 24s
615:	learn: 0.3453982	total: 9m 57s	remaining: 22m 23s
616:	learn: 0.3451805	total: 9m 58s	remaining: 22m 22s
617:	learn: 0.3449707	total: 9m 59s	remaining: 22m 21s
618:	learn: 0.3448162	total: 10m	remaining: 22m 20s
619:	learn: 0.3446148	total: 10m 1s	remaining: 22m 19s
620:	learn: 0.3444890	total: 10m 2s	remaining: 22m 18s
621:	learn: 0.3443566	total: 10m 3s	remaining: 22m 17s
622:	learn: 0.3441541	total: 10m 4s	remaining: 22m 16s
623:	learn: 0.3440073	total: 10m 5s	remaining: 22m 15s
624:	learn: 0

754:	learn: 0.3276960	total: 12m 31s	remaining: 20m 38s
755:	learn: 0.3275462	total: 12m 32s	remaining: 20m 38s
756:	learn: 0.3274294	total: 12m 33s	remaining: 20m 37s
757:	learn: 0.3273239	total: 12m 34s	remaining: 20m 36s
758:	learn: 0.3272775	total: 12m 35s	remaining: 20m 35s
759:	learn: 0.3271780	total: 12m 36s	remaining: 20m 34s
760:	learn: 0.3270874	total: 12m 37s	remaining: 20m 33s
761:	learn: 0.3270532	total: 12m 38s	remaining: 20m 33s
762:	learn: 0.3269172	total: 12m 40s	remaining: 20m 32s
763:	learn: 0.3267343	total: 12m 41s	remaining: 20m 31s
764:	learn: 0.3266055	total: 12m 42s	remaining: 20m 30s
765:	learn: 0.3264449	total: 12m 43s	remaining: 20m 29s
766:	learn: 0.3263439	total: 12m 44s	remaining: 20m 28s
767:	learn: 0.3262635	total: 12m 45s	remaining: 20m 27s
768:	learn: 0.3261361	total: 12m 46s	remaining: 20m 26s
769:	learn: 0.3260086	total: 12m 47s	remaining: 20m 25s
770:	learn: 0.3259119	total: 12m 48s	remaining: 20m 25s
771:	learn: 0.3258270	total: 12m 50s	remaining: 

902:	learn: 0.3143374	total: 15m 3s	remaining: 18m 18s
903:	learn: 0.3142548	total: 15m 4s	remaining: 18m 17s
904:	learn: 0.3142250	total: 15m 5s	remaining: 18m 16s
905:	learn: 0.3141190	total: 15m 6s	remaining: 18m 15s
906:	learn: 0.3139955	total: 15m 7s	remaining: 18m 14s
907:	learn: 0.3139421	total: 15m 8s	remaining: 18m 13s
908:	learn: 0.3138765	total: 15m 9s	remaining: 18m 12s
909:	learn: 0.3138331	total: 15m 11s	remaining: 18m 11s
910:	learn: 0.3137643	total: 15m 12s	remaining: 18m 10s
911:	learn: 0.3136997	total: 15m 13s	remaining: 18m 9s
912:	learn: 0.3136237	total: 15m 14s	remaining: 18m 8s
913:	learn: 0.3135729	total: 15m 15s	remaining: 18m 7s
914:	learn: 0.3135059	total: 15m 16s	remaining: 18m 6s
915:	learn: 0.3134495	total: 15m 17s	remaining: 18m 5s
916:	learn: 0.3133486	total: 15m 18s	remaining: 18m 4s
917:	learn: 0.3132558	total: 15m 19s	remaining: 18m 3s
918:	learn: 0.3132073	total: 15m 20s	remaining: 18m 2s
919:	learn: 0.3131418	total: 15m 21s	remaining: 18m 1s
920:	lea

1049:	learn: 0.3047289	total: 17m 52s	remaining: 16m 10s
1050:	learn: 0.3046546	total: 17m 53s	remaining: 16m 9s
1051:	learn: 0.3045949	total: 17m 55s	remaining: 16m 9s
1052:	learn: 0.3045714	total: 17m 56s	remaining: 16m 8s
1053:	learn: 0.3044804	total: 17m 58s	remaining: 16m 7s
1054:	learn: 0.3044231	total: 17m 59s	remaining: 16m 7s
1055:	learn: 0.3043853	total: 18m 1s	remaining: 16m 6s
1056:	learn: 0.3043542	total: 18m 3s	remaining: 16m 6s
1057:	learn: 0.3042892	total: 18m 4s	remaining: 16m 5s
1058:	learn: 0.3042297	total: 18m 6s	remaining: 16m 5s
1059:	learn: 0.3041358	total: 18m 7s	remaining: 16m 4s
1060:	learn: 0.3041006	total: 18m 9s	remaining: 16m 3s
1061:	learn: 0.3040312	total: 18m 10s	remaining: 16m 3s
1062:	learn: 0.3039644	total: 18m 12s	remaining: 16m 2s
1063:	learn: 0.3039257	total: 18m 14s	remaining: 16m 2s
1064:	learn: 0.3038165	total: 18m 15s	remaining: 16m 1s
1065:	learn: 0.3037640	total: 18m 17s	remaining: 16m 1s
1066:	learn: 0.3037206	total: 18m 19s	remaining: 16m 

1195:	learn: 0.2970850	total: 21m 42s	remaining: 14m 35s
1196:	learn: 0.2970586	total: 21m 44s	remaining: 14m 34s
1197:	learn: 0.2970269	total: 21m 45s	remaining: 14m 34s
1198:	learn: 0.2969831	total: 21m 47s	remaining: 14m 33s
1199:	learn: 0.2969412	total: 21m 49s	remaining: 14m 32s
1200:	learn: 0.2969114	total: 21m 51s	remaining: 14m 32s
1201:	learn: 0.2968727	total: 21m 53s	remaining: 14m 32s
1202:	learn: 0.2968282	total: 21m 55s	remaining: 14m 31s
1203:	learn: 0.2967951	total: 21m 56s	remaining: 14m 30s
1204:	learn: 0.2967472	total: 21m 58s	remaining: 14m 29s
1205:	learn: 0.2967024	total: 22m	remaining: 14m 29s
1206:	learn: 0.2966097	total: 22m 1s	remaining: 14m 28s
1207:	learn: 0.2965524	total: 22m 3s	remaining: 14m 27s
1208:	learn: 0.2965256	total: 22m 4s	remaining: 14m 26s
1209:	learn: 0.2964842	total: 22m 6s	remaining: 14m 25s
1210:	learn: 0.2964407	total: 22m 7s	remaining: 14m 25s
1211:	learn: 0.2963915	total: 22m 9s	remaining: 14m 24s
1212:	learn: 0.2963284	total: 22m 10s	rem

1340:	learn: 0.2915698	total: 25m 37s	remaining: 12m 35s
1341:	learn: 0.2915122	total: 25m 38s	remaining: 12m 34s
1342:	learn: 0.2914891	total: 25m 39s	remaining: 12m 33s
1343:	learn: 0.2914573	total: 25m 41s	remaining: 12m 32s
1344:	learn: 0.2914465	total: 25m 43s	remaining: 12m 31s
1345:	learn: 0.2914117	total: 25m 44s	remaining: 12m 30s
1346:	learn: 0.2913863	total: 25m 46s	remaining: 12m 29s
1347:	learn: 0.2913292	total: 25m 47s	remaining: 12m 28s
1348:	learn: 0.2913143	total: 25m 49s	remaining: 12m 27s
1349:	learn: 0.2912989	total: 25m 51s	remaining: 12m 26s
1350:	learn: 0.2912784	total: 25m 52s	remaining: 12m 25s
1351:	learn: 0.2912415	total: 25m 54s	remaining: 12m 24s
1352:	learn: 0.2912020	total: 25m 55s	remaining: 12m 23s
1353:	learn: 0.2911637	total: 25m 57s	remaining: 12m 22s
1354:	learn: 0.2911119	total: 25m 58s	remaining: 12m 21s
1355:	learn: 0.2910911	total: 25m 59s	remaining: 12m 20s
1356:	learn: 0.2910677	total: 26m 1s	remaining: 12m 19s
1357:	learn: 0.2910300	total: 26

1485:	learn: 0.2868612	total: 29m 31s	remaining: 10m 12s
1486:	learn: 0.2868403	total: 29m 32s	remaining: 10m 11s
1487:	learn: 0.2868179	total: 29m 34s	remaining: 10m 10s
1488:	learn: 0.2867876	total: 29m 35s	remaining: 10m 9s
1489:	learn: 0.2867567	total: 29m 37s	remaining: 10m 8s
1490:	learn: 0.2867423	total: 29m 39s	remaining: 10m 7s
1491:	learn: 0.2867117	total: 29m 40s	remaining: 10m 6s
1492:	learn: 0.2866567	total: 29m 42s	remaining: 10m 5s
1493:	learn: 0.2866339	total: 29m 43s	remaining: 10m 4s
1494:	learn: 0.2866192	total: 29m 45s	remaining: 10m 3s
1495:	learn: 0.2865816	total: 29m 47s	remaining: 10m 2s
1496:	learn: 0.2865658	total: 29m 48s	remaining: 10m 1s
1497:	learn: 0.2865376	total: 29m 50s	remaining: 9m 59s
1498:	learn: 0.2864939	total: 29m 51s	remaining: 9m 58s
1499:	learn: 0.2864555	total: 29m 53s	remaining: 9m 57s
1500:	learn: 0.2864143	total: 29m 55s	remaining: 9m 56s
1501:	learn: 0.2863682	total: 29m 56s	remaining: 9m 55s
1502:	learn: 0.2863571	total: 29m 58s	remaini

1633:	learn: 0.2825501	total: 32m 27s	remaining: 7m 16s
1634:	learn: 0.2825219	total: 32m 28s	remaining: 7m 15s
1635:	learn: 0.2825023	total: 32m 29s	remaining: 7m 13s
1636:	learn: 0.2824773	total: 32m 31s	remaining: 7m 12s
1637:	learn: 0.2824527	total: 32m 32s	remaining: 7m 11s
1638:	learn: 0.2824303	total: 32m 32s	remaining: 7m 10s
1639:	learn: 0.2824106	total: 32m 33s	remaining: 7m 8s
1640:	learn: 0.2823880	total: 32m 35s	remaining: 7m 7s
1641:	learn: 0.2823517	total: 32m 36s	remaining: 7m 6s
1642:	learn: 0.2823380	total: 32m 37s	remaining: 7m 5s
1643:	learn: 0.2822951	total: 32m 38s	remaining: 7m 4s
1644:	learn: 0.2822574	total: 32m 39s	remaining: 7m 2s
1645:	learn: 0.2822367	total: 32m 40s	remaining: 7m 1s
1646:	learn: 0.2822188	total: 32m 41s	remaining: 7m
1647:	learn: 0.2821826	total: 32m 42s	remaining: 6m 59s
1648:	learn: 0.2821532	total: 32m 43s	remaining: 6m 57s
1649:	learn: 0.2821259	total: 32m 44s	remaining: 6m 56s
1650:	learn: 0.2821033	total: 32m 45s	remaining: 6m 55s
165

1781:	learn: 0.2790055	total: 35m 6s	remaining: 4m 17s
1782:	learn: 0.2789939	total: 35m 7s	remaining: 4m 16s
1783:	learn: 0.2789762	total: 35m 8s	remaining: 4m 15s
1784:	learn: 0.2789620	total: 35m 10s	remaining: 4m 14s
1785:	learn: 0.2789468	total: 35m 11s	remaining: 4m 12s
1786:	learn: 0.2789283	total: 35m 12s	remaining: 4m 11s
1787:	learn: 0.2789079	total: 35m 13s	remaining: 4m 10s
1788:	learn: 0.2788927	total: 35m 14s	remaining: 4m 9s
1789:	learn: 0.2788750	total: 35m 15s	remaining: 4m 8s
1790:	learn: 0.2788492	total: 35m 16s	remaining: 4m 7s
1791:	learn: 0.2788157	total: 35m 18s	remaining: 4m 5s
1792:	learn: 0.2787932	total: 35m 19s	remaining: 4m 4s
1793:	learn: 0.2787793	total: 35m 20s	remaining: 4m 3s
1794:	learn: 0.2787590	total: 35m 21s	remaining: 4m 2s
1795:	learn: 0.2787404	total: 35m 22s	remaining: 4m 1s
1796:	learn: 0.2787149	total: 35m 23s	remaining: 3m 59s
1797:	learn: 0.2786923	total: 35m 25s	remaining: 3m 58s
1798:	learn: 0.2786749	total: 35m 26s	remaining: 3m 57s
179

1929:	learn: 0.2760088	total: 38m 10s	remaining: 1m 23s
1930:	learn: 0.2759859	total: 38m 12s	remaining: 1m 21s
1931:	learn: 0.2759644	total: 38m 13s	remaining: 1m 20s
1932:	learn: 0.2759496	total: 38m 15s	remaining: 1m 19s
1933:	learn: 0.2759317	total: 38m 16s	remaining: 1m 18s
1934:	learn: 0.2759052	total: 38m 18s	remaining: 1m 17s
1935:	learn: 0.2758727	total: 38m 20s	remaining: 1m 16s
1936:	learn: 0.2758490	total: 38m 21s	remaining: 1m 14s
1937:	learn: 0.2758161	total: 38m 23s	remaining: 1m 13s
1938:	learn: 0.2758022	total: 38m 25s	remaining: 1m 12s
1939:	learn: 0.2757835	total: 38m 26s	remaining: 1m 11s
1940:	learn: 0.2757678	total: 38m 27s	remaining: 1m 10s
1941:	learn: 0.2757427	total: 38m 29s	remaining: 1m 8s
1942:	learn: 0.2757268	total: 38m 31s	remaining: 1m 7s
1943:	learn: 0.2757111	total: 38m 33s	remaining: 1m 6s
1944:	learn: 0.2756957	total: 38m 34s	remaining: 1m 5s
1945:	learn: 0.2756846	total: 38m 36s	remaining: 1m 4s
1946:	learn: 0.2756635	total: 38m 37s	remaining: 1m 3

In [26]:
#oofs[val_idx] = stack_clf.predict_proba(X_val)
current_test_pred = stack_clf.predict_proba(X_test)
filename = 'stacknew_13.xlsx'
pd.DataFrame(current_test_pred).to_excel(filename, index=False)

In [27]:
filename = 'catnew_12.xlsx'
pd.DataFrame(cat_preds).to_excel(filename, index=False)