# Problem

Is to identify products at risk of backorder before the event occurs so that business has time to react. 

## What is a Backorder?
Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. 
A backorder generally indicates that customer demand for a product or service exceeds a company’s capacity to supply it. Back orders are both good and bad. Strong demand can drive back orders, but so can suboptimal planning. 

## Data description

Data file contains the historical data for the 8 weeks prior to the week we are trying to predict. The data was taken as weekly snapshots at the start of each week. Columns are defined as follows:

    sku - Random ID for the product

    national_inv - Current inventory level for the part

    lead_time - Transit time for product (if available)

    in_transit_qty - Amount of product in transit from source

    forecast_3_month - Forecast sales for the next 3 months

    forecast_6_month - Forecast sales for the next 6 months

    forecast_9_month - Forecast sales for the next 9 months

    sales_1_month - Sales quantity for the prior 1 month time period

    sales_3_month - Sales quantity for the prior 3 month time period

    sales_6_month - Sales quantity for the prior 6 month time period

    sales_9_month - Sales quantity for the prior 9 month time period

    min_bank - Minimum recommend amount to stock

    potential_issue - Source issue for part identified

    pieces_past_due - Parts overdue from source

    perf_6_month_avg - Source performance for prior 6 month period

    perf_12_month_avg - Source performance for prior 12 month period

    local_bo_qty - Amount of stock orders overdue

    deck_risk - Part risk flag

    oe_constraint - Part risk flag

    ppap_risk - Part risk flag

    stop_auto_buy - Part risk flag

    rev_stop - Part risk flag

    went_on_backorder - Product actually went on backorder. This is the target value.
    
         Yes or 1 : Product backordered

         No or 0  : Product not backordered

# Loading the required libraries

In [102]:
import pandas as pd
import numpy as np

In [24]:
import os
import numpy as np
import pandas as pd


from sklearn.preprocessing import StandardScaler, OneHotEncoder

from sklearn.impute import SimpleImputer

from sklearn.svm import SVC

from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score

from sklearn.model_selection import GridSearchCV

In [25]:
import warnings
warnings.filterwarnings("ignore")

In [26]:
df=pd.read_csv("BackOrders.csv")

In [27]:
df.head()

Unnamed: 0,sku,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,...,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,went_on_backorder
0,1888279,117,,0,0,0,0,0,0,15,...,0,-99.0,-99.0,0,No,No,Yes,Yes,No,No
1,1870557,7,2.0,0,0,0,0,0,0,0,...,0,0.5,0.28,0,Yes,No,No,Yes,No,No
2,1475481,258,15.0,10,10,77,184,46,132,256,...,0,0.54,0.7,0,No,No,No,Yes,No,No
3,1758220,46,2.0,0,0,0,0,1,2,6,...,0,0.75,0.9,0,Yes,No,No,Yes,No,No
4,1360312,2,2.0,0,4,6,10,2,2,5,...,0,0.97,0.92,0,No,No,No,Yes,No,No


In [28]:
df.tail()

Unnamed: 0,sku,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,...,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,went_on_backorder
61584,1397275,6,8.0,0,24,24,24,0,7,9,...,0,0.98,0.98,0,No,No,No,Yes,No,No
61585,3072139,130,2.0,0,40,80,140,18,108,230,...,0,0.51,0.28,0,No,No,No,Yes,No,No
61586,1909363,135,9.0,0,0,0,0,10,40,65,...,0,1.0,0.99,0,No,No,Yes,Yes,No,No
61587,1845783,63,,0,0,0,0,452,1715,3425,...,0,-99.0,-99.0,1,No,No,No,No,No,Yes
61588,1200539,0,2.0,0,8,8,8,0,1,1,...,0,0.79,0.78,0,Yes,No,No,Yes,No,Yes


In [29]:
df.shape

(61589, 23)

In [30]:
df.dtypes

sku                    int64
national_inv           int64
lead_time            float64
in_transit_qty         int64
forecast_3_month       int64
forecast_6_month       int64
forecast_9_month       int64
sales_1_month          int64
sales_3_month          int64
sales_6_month          int64
sales_9_month          int64
min_bank               int64
potential_issue       object
pieces_past_due        int64
perf_6_month_avg     float64
perf_12_month_avg    float64
local_bo_qty           int64
deck_risk             object
oe_constraint         object
ppap_risk             object
stop_auto_buy         object
rev_stop              object
went_on_backorder     object
dtype: object

In [31]:
df.describe()

Unnamed: 0,sku,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,sales_9_month,min_bank,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty
count,61589.0,61589.0,58186.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0,61589.0
mean,2037188.0,287.721882,7.559619,30.192843,169.2728,315.0413,453.576,44.742957,150.732631,283.5465,419.6427,43.087256,1.6054,-6.264182,-5.863664,1.205361
std,656417.8,4233.906931,6.498952,792.869253,5286.742,9774.362,14202.01,1373.805831,5224.959649,8872.27,12698.58,959.614135,42.309229,25.537906,24.844514,29.981155
min,1068628.0,-2999.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-99.0,-99.0,0.0
25%,1498574.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.62,0.64,0.0
50%,1898033.0,10.0,8.0,0.0,0.0,0.0,0.0,0.0,2.0,4.0,6.0,0.0,0.0,0.82,0.8,0.0
75%,2314826.0,57.0,8.0,0.0,12.0,25.0,36.0,6.0,17.0,34.0,51.0,3.0,0.0,0.96,0.95,0.0
max,3284895.0,673445.0,52.0,170976.0,1126656.0,2094336.0,3062016.0,295197.0,934593.0,1799099.0,2631590.0,192978.0,7392.0,1.0,1.0,2999.0


In [32]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61589 entries, 0 to 61588
Data columns (total 23 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sku                61589 non-null  int64  
 1   national_inv       61589 non-null  int64  
 2   lead_time          58186 non-null  float64
 3   in_transit_qty     61589 non-null  int64  
 4   forecast_3_month   61589 non-null  int64  
 5   forecast_6_month   61589 non-null  int64  
 6   forecast_9_month   61589 non-null  int64  
 7   sales_1_month      61589 non-null  int64  
 8   sales_3_month      61589 non-null  int64  
 9   sales_6_month      61589 non-null  int64  
 10  sales_9_month      61589 non-null  int64  
 11  min_bank           61589 non-null  int64  
 12  potential_issue    61589 non-null  object 
 13  pieces_past_due    61589 non-null  int64  
 14  perf_6_month_avg   61589 non-null  float64
 15  perf_12_month_avg  61589 non-null  float64
 16  local_bo_qty       615

In [33]:
df.nunique()

sku                  61589
national_inv          2916
lead_time               28
in_transit_qty         908
forecast_3_month      1623
forecast_6_month      2195
forecast_9_month      2664
sales_1_month         1092
sales_3_month         1928
sales_6_month         2679
sales_9_month         3220
min_bank              1098
potential_issue          2
pieces_past_due        190
perf_6_month_avg       102
perf_12_month_avg      102
local_bo_qty           201
deck_risk                2
oe_constraint            2
ppap_risk                2
stop_auto_buy            2
rev_stop                 2
went_on_backorder        2
dtype: int64

In [34]:
df.isna().sum()

sku                     0
national_inv            0
lead_time            3403
in_transit_qty          0
forecast_3_month        0
forecast_6_month        0
forecast_9_month        0
sales_1_month           0
sales_3_month           0
sales_6_month           0
sales_9_month           0
min_bank                0
potential_issue         0
pieces_past_due         0
perf_6_month_avg        0
perf_12_month_avg       0
local_bo_qty            0
deck_risk               0
oe_constraint           0
ppap_risk               0
stop_auto_buy           0
rev_stop                0
went_on_backorder       0
dtype: int64

In [35]:
df.went_on_backorder .value_counts()

No     50296
Yes    11293
Name: went_on_backorder, dtype: int64

In [36]:
data=df.copy()

In [37]:
newdf=df.dropna()

In [38]:
newdf.shape

(58186, 23)

In [39]:
newdf.isna().sum()

sku                  0
national_inv         0
lead_time            0
in_transit_qty       0
forecast_3_month     0
forecast_6_month     0
forecast_9_month     0
sales_1_month        0
sales_3_month        0
sales_6_month        0
sales_9_month        0
min_bank             0
potential_issue      0
pieces_past_due      0
perf_6_month_avg     0
perf_12_month_avg    0
local_bo_qty         0
deck_risk            0
oe_constraint        0
ppap_risk            0
stop_auto_buy        0
rev_stop             0
went_on_backorder    0
dtype: int64

In [40]:
newdf.dtypes

sku                    int64
national_inv           int64
lead_time            float64
in_transit_qty         int64
forecast_3_month       int64
forecast_6_month       int64
forecast_9_month       int64
sales_1_month          int64
sales_3_month          int64
sales_6_month          int64
sales_9_month          int64
min_bank               int64
potential_issue       object
pieces_past_due        int64
perf_6_month_avg     float64
perf_12_month_avg    float64
local_bo_qty           int64
deck_risk             object
oe_constraint         object
ppap_risk             object
stop_auto_buy         object
rev_stop              object
went_on_backorder     object
dtype: object

In [41]:
num_cols=["national_inv","lead_time","in_transit_qty","forecast_3_month","forecast_6_month","forecast_9_month","sales_1_month"
         ,"sales_3_month","sales_6_month","sales_9_month","min_bank","local_bo_qty"]
cat_cols=["sku","potential_issue","deck_risk","oe_constraint","ppap_risk","stop_auto_buy","rev_stop","went_on_backorder"]

In [42]:
newdf[cat_cols] = newdf[cat_cols].astype('category')

In [43]:
newdf.dtypes

sku                  category
national_inv            int64
lead_time             float64
in_transit_qty          int64
forecast_3_month        int64
forecast_6_month        int64
forecast_9_month        int64
sales_1_month           int64
sales_3_month           int64
sales_6_month           int64
sales_9_month           int64
min_bank                int64
potential_issue      category
pieces_past_due         int64
perf_6_month_avg      float64
perf_12_month_avg     float64
local_bo_qty            int64
deck_risk            category
oe_constraint        category
ppap_risk            category
stop_auto_buy        category
rev_stop             category
went_on_backorder    category
dtype: object

In [44]:
newdf.drop(["sku"],axis=1,inplace=True)

In [45]:
newdf.dtypes

national_inv            int64
lead_time             float64
in_transit_qty          int64
forecast_3_month        int64
forecast_6_month        int64
forecast_9_month        int64
sales_1_month           int64
sales_3_month           int64
sales_6_month           int64
sales_9_month           int64
min_bank                int64
potential_issue      category
pieces_past_due         int64
perf_6_month_avg      float64
perf_12_month_avg     float64
local_bo_qty            int64
deck_risk            category
oe_constraint        category
ppap_risk            category
stop_auto_buy        category
rev_stop             category
went_on_backorder    category
dtype: object

In [46]:
X= newdf.drop(["went_on_backorder"], axis = 1)

In [47]:
y=newdf["went_on_backorder"]

In [48]:
print(X.shape, y.shape)

(58186, 21) (58186,)


In [49]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123, stratify=y)

In [50]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(40730, 21)
(17456, 21)
(40730,)
(17456,)


In [51]:
y_train.value_counts(True)

No     0.81149
Yes    0.18851
Name: went_on_backorder, dtype: float64

In [52]:
y_test.value_counts(True)

No     0.811469
Yes    0.188531
Name: went_on_backorder, dtype: float64

In [53]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
le.fit(y_train)
y_train=le.transform(y_train)
y_test=le.transform(y_test)

In [54]:
pd.value_counts(y_train)/y_train.size*100

0    81.14903
1    18.85097
dtype: float64

In [55]:
cat_attr=X_train.select_dtypes(include=["category"]).columns

In [56]:
from sklearn.preprocessing import OneHotEncoder

In [57]:
enc= OneHotEncoder(drop="first")
enc.fit(X_train[cat_attr])

X_train_ohe=enc.transform(X_train[cat_attr]).toarray()

X_test_ohe=enc.transform(X_test[cat_attr]).toarray()

In [58]:
enc.fit(X_test[cat_attr])

OneHotEncoder(drop='first')

In [59]:
## standardzing


In [60]:
scaler = StandardScaler()
scaler.fit(X_train[num_cols])

StandardScaler()

In [61]:
X_train_std = scaler.transform(X_train[num_cols])
X_test_std = scaler.transform(X_test[num_cols])

In [62]:
print(X_train_std.shape)
print(X_test_std.shape)

(40730, 12)
(17456, 12)


In [63]:
X_train_con = np.concatenate([X_train_std, X_train_ohe], axis=1)
X_test_con = np.concatenate([X_test_std, X_test_ohe], axis=1)

In [64]:
print(X_train_con.shape)
print(X_test_con.shape)

(40730, 18)
(17456, 18)


In [65]:
from sklearn.ensemble import RandomForestClassifier
clf1=RandomForestClassifier()

clf1.fit(X_train_con,y_train)

train_preds=clf1.predict(X_train_con)
test_preds=clf1.predict(X_test_con)

In [66]:
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score


In [67]:
def evaluate_model(act, pred):
    print("Confusion Matrix \n", confusion_matrix(act, pred))
    print("Accurcay : ", accuracy_score(act, pred))
    print("Recall   : ", recall_score(act, pred))
    print("Precision: ", precision_score(act, pred))
    print("F1_score : ", f1_score(act, pred))

In [68]:
print("---train---")
evaluate_model(y_train,train_preds)
    
print("---train---")
evaluate_model(y_test,test_preds)   

---train---
Confusion Matrix 
 [[32907   145]
 [  233  7445]]
Accurcay :  0.9907193714706605
Recall   :  0.969653555613441
Precision:  0.9808959156785244
F1_score :  0.9752423369138066
---train---
Confusion Matrix 
 [[13529   636]
 [  699  2592]]
Accurcay :  0.9235219981668195
Recall   :  0.7876025524156791
Precision:  0.8029739776951673
F1_score :  0.7952139898757479


In [69]:
from imblearn.over_sampling import SMOTE
smote=SMOTE(random_state=123)
X_train_sm,y_train_sm=smote.fit_resample(X_train_con,y_train)

In [121]:
clf2=RandomForestClassifier()
clf2.fit(X_train_sm,y_train_sm)

train_pred_sm=clf2.predict(X_train_sm)
test_pred_sm=clf2.predict(X_test_con)

In [122]:
print("---train---")
evaluate_model(y_train_sm,train_pred_sm)
    
print("---train---")
evaluate_model(y_test,test_pred_sm) 

---train---
Confusion Matrix 
 [[32765   287]
 [  282 32770]]
Accurcay :  0.991392351446206
Recall   :  0.9914679898342007
Precision:  0.9913180264391808
F1_score :  0.9913930024656249
---train---
Confusion Matrix 
 [[13241   924]
 [  501  2790]]
Accurcay :  0.9183661778185152
Recall   :  0.8477666362807658
Precision:  0.7512116316639742
F1_score :  0.7965738758029979


In [72]:
param_grid={"n_estimators":[50,100],
           "max_depth":[1,5],
            "max_features":[3,5],
            "min_samples_leaf":[1,2,3]
           }

In [125]:
clf3=RandomForestClassifier()
from sklearn.model_selection import GridSearchCV
clf_grid=GridSearchCV(clf3,param_grid,cv=2)

clf_grid.fit(X_train_sm,y_train_sm)

train_pred_gs=clf_grid.predict(X_train_sm)
test_pred_gs=clf_grid.predict(X_test_con)

In [126]:
print("---train---")
evaluate_model(y_train_sm,train_pred_gs)
    
print("---train---")
evaluate_model(y_test,test_pred_gs) 

---train---
Confusion Matrix 
 [[27859  5193]
 [ 3482 29570]]
Accurcay :  0.8687673968292388
Recall   :  0.8946508532010166
Precision:  0.8506170353536806
F1_score :  0.8720784487207845
---train---
Confusion Matrix 
 [[11906  2259]
 [  528  2763]]
Accurcay :  0.8403414298808433
Recall   :  0.8395624430264357
Precision:  0.5501792114695341
F1_score :  0.664741970407795


In [75]:
dataframe={
    
    "Accuracy":[0.9906702676160078,0.99140,0.86],
    "Recall" :[0.9716071893722323,0.9915285005445964,0.8930775747307274],
    "precission":[0.9912885662431942,0.9912885662431942,0.8531953637598636],
   "f1_score":[0.991408518877057, 0.991408518877057, 0.8726810448048012]
    }

In [76]:
df0=pd.DataFrame(dataframe)

In [77]:
dataframe2={
    "Accuracy":[0.9237511457378552,0.9177360219981668,0.843263],
    "Recall":[0.7885141294439381,0.8447280461865694,0.838954],
    "precission":[0.8034055727554179,0.7503373819163293,0.5558687],
   "f1_score":[0.7958902008894342,0.7947398513436249,0.668684]
}

In [78]:
df01=pd.DataFrame(dataframe2)

In [79]:
frames=[df0,df01]
result=pd.concat(frames)
display(result)

Unnamed: 0,Accuracy,Recall,precission,f1_score
0,0.99067,0.971607,0.991289,0.991409
1,0.9914,0.991529,0.991289,0.991409
2,0.86,0.893078,0.853195,0.872681
0,0.923751,0.788514,0.803406,0.79589
1,0.917736,0.844728,0.750337,0.79474
2,0.843263,0.838954,0.555869,0.668684


In [80]:
print(pd.DataFrame(dataframe,index=["RandomForestClassifier","smote","GridSearchCV"],))

                        Accuracy    Recall  precission  f1_score
RandomForestClassifier   0.99067  0.971607    0.991289  0.991409
smote                    0.99140  0.991529    0.991289  0.991409
GridSearchCV             0.86000  0.893078    0.853195  0.872681


In [81]:
print(pd.DataFrame(dataframe2,index=["RandomForestClassifier","smote","GridSearchCV"]))

                        Accuracy    Recall  precission  f1_score
RandomForestClassifier  0.923751  0.788514    0.803406  0.795890
smote                   0.917736  0.844728    0.750337  0.794740
GridSearchCV            0.843263  0.838954    0.555869  0.668684


In [82]:
pef_columns=["model name","training accuracy","train precision","train recall","test accuracy","test precision","test recall"]
performance_comparision=pd.DataFrame(columns=pef_columns)

In [111]:
def add_to_perform_compare_df(df,model_name,train_actual,train_predict,test_actual,test_predict):
    from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score
    
    train_accuracy=accuracy_score(train_actual,train_predict)
    test_accuracy=accuracy_score(test_actual,test_predict)
    
    train_recall=recall_score(train_actual,train_predict)
    test_recall=recall_score(test_actual,test_predict)
    
    train_precision=precision_score(train_actual,train_predict)
    test_precision=precision_score(test_actual,test_predict)
    
    df=df.append(pd.Series([model_name,train_accuracy,train_precision,train_recall,test_accuracy,test_precision,test_recall],index=df.columns),
                ignore_index=True)
    return df

In [112]:
performance_comparision=add_to_perform_compare_df(performance_comparision,"Random Forest",y_train_sm,train_pred,y_test,test_pred)

In [123]:
performance_comparision=add_to_perform_compare_df(performance_comparision,"Upsampling.SMOTE",y_train_sm,train_pred_sm,y_test,test_pred_sm)

In [132]:
performance_comparision

Unnamed: 0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
1,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
2,Random Forest,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
3,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
4,Hyper_para_RF_SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
5,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
6,Upsampling.SMOTE,0.991392,0.991318,0.991468,0.918366,0.751212,0.847767
7,Hyper_para_RF_SMOTE,0.868767,0.850617,0.894651,0.840341,0.550179,0.839562
8,Gradientboost_rf_smote,0.900172,0.888409,0.915315,0.873568,0.620284,0.849286


In [127]:
performance_comparision=add_to_perform_compare_df(performance_comparision,"Hyper_para_RF_SMOTE",y_train_sm,train_pred_gs,y_test,test_pred_gs)

In [88]:
y_train.shape

(40730,)

In [89]:
train_pred.shape

(66104,)

In [129]:
from sklearn.ensemble import GradientBoostingClassifier
gbc=GradientBoostingClassifier()
gbc.fit(X_train_sm,y_train_sm)

GradientBoostingClassifier()

In [130]:
train_pred_gbc=gbc.predict(X_train_sm)
test_pred_gbc=gbc.predict(X_test_con)

In [131]:
performance_comparision=add_to_perform_compare_df(performance_comparision,"Gradientboost_rf_smote",y_train_sm,train_pred_gbc,y_test,test_pred_gbc)

In [137]:
performance_comparision

Unnamed: 0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
1,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
2,Random Forest,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
3,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
4,Hyper_para_RF_SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
5,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
6,Upsampling.SMOTE,0.991392,0.991318,0.991468,0.918366,0.751212,0.847767
7,Hyper_para_RF_SMOTE,0.868767,0.850617,0.894651,0.840341,0.550179,0.839562
8,Gradientboost_rf_smote,0.900172,0.888409,0.915315,0.873568,0.620284,0.849286
9,xgbboosting_rf_smote,0.951092,0.942432,0.96088,0.902326,0.71068,0.812823


In [134]:
from xgboost import XGBClassifier
xgb=XGBClassifier()
xgb.fit(X_train_sm,y_train_sm)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=16, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [135]:
train_pred_xgb=xgb.predict(X_train_sm)
test_pred_xgb=xgb.predict(X_test_con)

In [136]:
performance_comparision=add_to_perform_compare_df(performance_comparision,"xgbboosting_rf_smote",y_train_sm,train_pred_xgb,y_test,test_pred_xgb)

In [138]:
performance_comparision

Unnamed: 0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
0,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
1,model name,training accuracy,train precision,train recall,test accuracy,test precision,test recall
2,Random Forest,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
3,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
4,Hyper_para_RF_SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
5,Upsampling.SMOTE,0.872988,0.858039,0.893864,0.847273,0.563867,0.838347
6,Upsampling.SMOTE,0.991392,0.991318,0.991468,0.918366,0.751212,0.847767
7,Hyper_para_RF_SMOTE,0.868767,0.850617,0.894651,0.840341,0.550179,0.839562
8,Gradientboost_rf_smote,0.900172,0.888409,0.915315,0.873568,0.620284,0.849286
9,xgbboosting_rf_smote,0.951092,0.942432,0.96088,0.902326,0.71068,0.812823
