# Classfication of Titanic Dataset

### Contents
**Perform classification and cover these following steps:**
- Use all these algorithms: LogisticRegression, SVM, DecisoinTree, RandomForest
- Train Test split , model training and evaluation 
- Cross Validation , model training and evaluation 
- Select best performing model using CV
- Fine tune the best performing model
- Again Evaluation 
- Apply adaboost on fine tuned model

### Import all require libraries and dataset

In [37]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV


from sklearn.linear_model import LogisticRegressionCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier


from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score


import warnings
warnings.filterwarnings('ignore')

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=500; total time=   1.3s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   1.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=200; total time=   0.5s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=600; total time=   1.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=200; total time=   0.5s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_dept

In [2]:
df = pd.read_csv('./train.csv')

In [3]:
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


### Data Cleaning 

In [4]:
df['Initial'] = 0
for i in df:
    df['Initial'] = df.Name.str.extract('([A-Za-z]+)\.')

In [5]:
df['Initial'].replace(
    ['Mlle','Mme','Ms','Dr','Major','Lady','Countess','Jonkheer','Col','Rev','Capt','Sir','Don'],
    ['Miss','Miss','Miss','Mr','Mr','Mrs','Mrs','Other','Other','Other','Mr','Mr','Mr'],inplace=True)

In [6]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
Initial          0
dtype: int64

**Observation:**
- Age, Cabin and Embarked contain null values.
- fill these nan values by mean or different methods

In [7]:
#Assigning the NaN values with Ceil values of the mean ages
df.loc[(df.Age.isnull())&(df.Initial=='Mr'),'Age']=33
df.loc[(df.Age.isnull())&(df.Initial=='Mrs'),'Age']=36
df.loc[(df.Age.isnull())&(df.Initial=='Master'),'Age']=5
df.loc[(df.Age.isnull())&(df.Initial=='Miss'),'Age']=22
df.loc[(df.Age.isnull())&(df.Initial=='Other'),'Age']=46         

In [8]:
#filling nan values with S
df['Embarked'].fillna('S',inplace=True)

In [9]:
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Initial
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,Mr
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,Mrs
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,Miss


In [10]:
df['Cabin'] = df['Cabin'].apply(lambda x:not isinstance(x,float))

In [11]:
#Droping unnecessary columns 
df.drop(['PassengerId','Name','Ticket'], axis=1, inplace=True)

## Label Encoding 
- There are three columns (Sex, Embarked, Initial) which required encoding categorical values.

In [12]:
label_encoder = {}

for x in ['Sex','Embarked','Initial']:
    label_encoder[x] = LabelEncoder()
    df[x] = label_encoder[x].fit_transform(df[x])

In [13]:
df.head(3)

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Cabin,Embarked,Initial
0,0,3,1,22.0,1,0,7.25,False,2,2
1,1,1,0,38.0,1,0,71.2833,True,0,3
2,1,3,0,26.0,0,0,7.925,False,2,1


In [14]:
df.isnull().sum()

Survived    0
Pclass      0
Sex         0
Age         0
SibSp       0
Parch       0
Fare        0
Cabin       0
Embarked    0
Initial     0
dtype: int64

In [15]:
X = df[df.columns[1:]].values
y = df['Survived']

## - Train Test Split

In [16]:
#split dataset into 80:20 ratio, 80% for training and 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [17]:
#create scores function to calcualte 4 type of scores(Accuracy_score, Precision_score, recall_score, f1_score)

def scores(y_true, y_pred):
    results = [accuracy_score(y_true, y_pred),
              precision_score(y_true, y_pred),
              recall_score(y_true, y_pred),
               f1_score(y_true, y_pred),
               roc_auc_score(y_true, y_pred)
              ]
    return [round(x, 2) for x in results]

## Apply all these alogrithms 
**- LogisticRegressionCV, DecisionTreee, SVM and RandomForestClassifier**

In [18]:
#create a funciton create_models to fit all above algorithms and calculate all 4 types of score
def create_models(X_train, X_test, y_train, y_test):
    print('Logistic Regression')
    model = LogisticRegressionCV()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results = scores(y_test, y_pred)
    print('Accuracy: {}, Precision: {}, Recall: {}, ROC_AUC: {}'.format(
    results[0], results[1], results[2], results[3],) ) 
    
    print('\n')
    
    print('Decision Tree')
    model = DecisionTreeClassifier()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results = scores(y_test, y_pred)
    print('Accuracy: {}, Precision: {}, Recall: {}, ROC_AUC: {}'.format(
    results[0], results[1], results[2], results[3],) ) 
     
    print('\n')
    
    print('SVM')
    model = SVC()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results = scores(y_test, y_pred)
    print('Accuracy: {}, Precision: {}, Recall: {}, ROC_AUC: {}'.format(
    results[0], results[1], results[2], results[3],) ) 
    
    print('\n')

    print('Random Forest')
    model = RandomForestClassifier()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results = scores(y_test, y_pred)
    print('Accuracy: {}, Precision: {}, Recall: {}, ROC_AUC: {}'.format(
    results[0], results[1], results[2], results[3],) ) 

In [19]:
create_models(X_train, X_test, y_train, y_test)

Logistic Regression
Accuracy: 0.82, Precision: 0.8, Recall: 0.76, ROC_AUC: 0.78


Decision Tree
Accuracy: 0.82, Precision: 0.8, Recall: 0.76, ROC_AUC: 0.78


SVM
Accuracy: 0.66, Precision: 0.76, Recall: 0.26, ROC_AUC: 0.38


Random Forest
Accuracy: 0.84, Precision: 0.81, Recall: 0.8, ROC_AUC: 0.8


**Observations:**
- Get LogisticRegression accuracy is 0.82, on Decision Tree is 0.82, on SVM is 0,66 and Random Forest is 0.84.
- So, Random Forest performs best on data and gives highest accuracy score is 0.84 than all others.

## Cross Validation

In [20]:
print('Logistic Regression')
lr = LogisticRegressionCV(random_state=42)
score1 = cross_validate(lr, X,y, cv=10, n_jobs=-1, verbose=1)
print('Score: ',score1['test_score'].mean())

print('\n')

print('Decision Tree')
dt = DecisionTreeClassifier(random_state=42)
score2 = cross_validate(dt, X,y, cv=10, n_jobs=-1, verbose=1)
print('Score: ',score2['test_score'].mean())
     
print('\n')
    
print('SVM')
svm = SVC(random_state=42)
score3 = cross_validate(svm, X,y, cv=10, n_jobs=-1, verbose=1)
print('Score: ',score3['test_score'].mean())
    
print('\n')

print('Random Forest')
rf = RandomForestClassifier(random_state=42)
score4 = cross_validate(rf, X,y, cv=10, n_jobs=-1, verbose=1)
print('Score: ',score4['test_score'].mean())

Logistic Regression


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preproce

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Score:  0.7889762796504369


Decision Tree


[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Score:  0.778976279650437


SVM


[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Score:  0.6779650436953808


Random Forest
Score:  0.8170911360799001


[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.8s finished


## Select best performing model on the basis using CV

**Observations:**
- After applying cross_validation, Random Forest gives best accuracy than other algorithms which is 0.817.
- So i choosed best model is RandomForest for further paramter tuining.

In [21]:
def evaluate(model, X_test, y_test):
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print('Model Performance')
    print('Accuracy: {:0.3f}'.format(accuracy*100))
    
    return accuracy

In [22]:
#fit model on trainig dataset
model = RandomForestClassifier(random_state = 42)
model.fit(X_train, y_train)
#evaluate the model and store base score in base_score variable
base_score = evaluate(model, X_test, y_test)

Model Performance
Accuracy: 83.799


**Observation:** - The base_score of base line RandomForestClassifier model is 83.799

## Fine tuning - Using GridSearchCV

In [30]:
param_grid = {
    'bootstrap': [False],
    'max_depth': [80, 90, 100, 110, 120, 130],
    'max_features': ['auto'],
    'min_samples_leaf': [2, 3, 4, 5],
    'min_samples_split': [2, 3, 4, 5, 6],
    'n_estimators': [200, 300, 400, 500, 600]
}


# Create a based model
model = RandomForestClassifier(random_state=42)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = model, param_grid = param_grid, 
                          cv = 4, n_jobs = -1, verbose = 2)

In [31]:
grid_search.fit(X_train, y_train)

Fitting 4 folds for each of 600 candidates, totalling 2400 fits
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=600; total time=   1.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=300; total time=   1.0s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=600; total time=   2.0s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=300; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   1.7s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estim

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=300; total time=   1.0s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=500; total time=   1.5s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   1.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=500; total time=   1.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=600; total time=   1.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=400; total time=   1.4s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=500; total time=   2.0s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=500; total time=   1.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=4, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=5, min_samples_split=4, n_estimators=500; total time=   1.3s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=500; total time=   1.8s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=600; total time=   2.2s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=200; total time=   0.7s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=300; total time=   1.0s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=500; total time=   1.6s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=80, m

[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=600; total time=   2.0s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.7s
[CV] END bootstrap=False, max_depth=80, m

[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=300; total time=   1.4s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=400; total time=   1.6s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=500; total time=   1.8s
[CV] END bootstrap=False, max_depth=80, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=600; total time=   2.5s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=300; total time=   1.6s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   1.8s
[CV] END bootstrap=False, max_depth=90, m

[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=500; total time=   1.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=500; total time=   1.9s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=600; total time=   3.1s
[CV] END bootstrap=False, max_depth=90, m

[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=400; total time=   1.3s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=500; total time=   2.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=90, m

[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=200; total time=   0.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=500; total time=   1.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=90, max_features=auto, min_samples_leaf=4, min_samples_split=5, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=90, m

[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=500; total time=   1.5s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=6, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=6, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=6, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=6, n_estimators=500; total time=   1.5s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=2, min_samples_split=6, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=3, min_samples_split=2, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=4, min_samples_split=6, n_estimators=600; total time=   1.6s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=200; total time=   0.5s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=300; total time=   1.3s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=2, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=100, max_features=auto, min_samples_leaf=5, min_samples_split=3, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=2, n_estimators=600; total time=   2.8s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=3, n_estimators=300; total time=   1.1s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=3, n_estimators=400; total time=   1.5s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=3, n_estimators=500; total time=   2.4s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=3, n_estimators=600; total time=   3.7s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=200; total time=   1.0s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=300; total time=   1.2s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=500; total time=   1.5s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=4, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=200; total time=   0.8s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=200; total time=   0.7s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=3, min_samples_split=5, n_estimators=400; total time=   1.3s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=4, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=4, n_estimators=600; total time=   1.6s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=5, n_estimators=200; total time=   0.5s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=5, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=5, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=5, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=5, n_estimators=600; total time=   1.6s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=300; total time=   0.8s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=110, max_features=auto, min_samples_leaf=5, min_samples_split=6, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=400; total time=   1.2s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time=   1.5s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=3, min_samples_split=6, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.8s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=300; total time=   1.6s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   1.8s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=3, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=120, max_features=auto, min_samples_leaf=4, min_samples_split=4, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=3, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_dept

[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=4, n_estimators=600; total time=   1.8s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.6s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=300; total time=   0.9s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   1.1s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=500; total time=   1.4s
[CV] END bootstrap=False, max_depth=130, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   1.7s
[CV] END bootstrap=False, max_dept

GridSearchCV(cv=4, estimator=RandomForestClassifier(random_state=42), n_jobs=-1,
             param_grid={'bootstrap': [False],
                         'max_depth': [80, 90, 100, 110, 120, 130],
                         'max_features': ['auto'],
                         'min_samples_leaf': [2, 3, 4, 5],
                         'min_samples_split': [2, 3, 4, 5, 6],
                         'n_estimators': [200, 300, 400, 500, 600]},
             verbose=2)

In [32]:
#so we got the best parameters for the model
grid_search.best_params_

{'bootstrap': False,
 'max_depth': 80,
 'max_features': 'auto',
 'min_samples_leaf': 5,
 'min_samples_split': 2,
 'n_estimators': 500}

In [33]:
#Evaluate the accuracy on the best_paramterse

best_grid = grid_search.best_estimator_
grid_accuracy = evaluate(best_grid, X_test, y_test)

Model Performance
Accuracy: 83.799


In [34]:
print('Improvement after Hyper-Parameter tuning: {:0.2f}%'.format(100 * (grid_accuracy - base_score)))

Improvement after Hyper-Parameter tuning: 0.00%


**Observations:**:As hyper paramter tuning using GridSearchCV accuracy score still same as base_score

## Apply AdaBoost Classifier 

In [39]:
rf = RandomForestClassifier(random_state=42)
abc = AdaBoostClassifier(n_estimators=50, base_estimator=rf, learning_rate=1)

In [40]:
# Train Adaboost Classifer
model = abc.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = model.predict(X_test)

In [41]:
# Model Accuracy, how often is the classifier correct?
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.8379888268156425
