# Introduction to Neural Networks and Deep Learning course.

# Given a Bank customer, can we build a classifier that can determine whether they will leave or not using Neural networks?

The points distribution for this case is as follows:
1. Read the dataset
2. Drop the columns which are unique for all users like IDs (2.5 points)
3. Distinguish the feature and target set (2.5 points)
4. Divide the data set into train and test sets
5. Normalize the train and test data (2.5 points)
6. Initialize & build the model (10 points)
7. Optimize the model (5 points)
9. Predict the results using 0.5 as a threshold (5 points) 
10. Print the Accuracy score and confusion matrix (2.5 points)

In [7]:

import tensorflow as tf
import numpy as np


import pandas as pd

# 1. Read the dataset

In [9]:
bank_data=pd.read_csv("bank.csv")

In [10]:
bank_data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [11]:
bank_data.shape

(10000, 14)

In [12]:
bank_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


# Observations:

We have 10000 rows and 14 features.

Some of the features like Rownumber, CustomerId, Surname will not be useful for evaluation as they are unique for all users which does not describe any characteristics for our predictions. These features can be dropped.

There are some features with object type. These features should be converted to category or Label encoding should be done before evaluation.

The target feature Exited is of binary type (0 or 1)

All the features have varied scale of measurement. Noramlization should be done before evaluation.

# 2. Drop the columns which are unique for all users like IDs (2.5 points)

In [13]:
bank_data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1, inplace = True)

In [14]:
print ("Shape of the data: ",bank_data.shape)
bank_data.head()

Shape of the data:  (10000, 11)


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [15]:
# Lets convert the Geography and Gender to categorial.
bank_data["Geography"] = bank_data["Geography"].astype('category')
bank_data["Gender"] = bank_data["Gender"].astype('category')

In [16]:
bank_data["Geography"] = bank_data["Geography"].cat.codes
bank_data["Gender"] = bank_data["Gender"].cat.codes

In [17]:
bank_data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,0,0,42,2,0.0,1,1,1,101348.88,1
1,608,2,0,41,1,83807.86,1,0,1,112542.58,0
2,502,0,0,42,8,159660.8,3,1,0,113931.57,1
3,699,0,0,39,1,0.0,2,0,0,93826.63,0
4,850,2,0,43,2,125510.82,1,1,1,79084.1,0


In [18]:
bank_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
CreditScore        10000 non-null int64
Geography          10000 non-null int8
Gender             10000 non-null int8
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(7), int8(2)
memory usage: 722.7 KB


# Observations:

After dropping the irrelevant features, we are left with 10 features and a target.

Also we have converted the Gender and Geography features to categorical codes.

# 3. Distinguish the feature and the target set (2.5 points)

In [19]:
X = bank_data.iloc[:,:-1]
y = bank_data.iloc[:,-1]


In [20]:
X.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,619,0,0,42,2,0.0,1,1,1,101348.88
1,608,2,0,41,1,83807.86,1,0,1,112542.58
2,502,0,0,42,8,159660.8,3,1,0,113931.57
3,699,0,0,39,1,0.0,2,0,0,93826.63
4,850,2,0,43,2,125510.82,1,1,1,79084.1


In [21]:
y.head()

0    1
1    0
2    1
3    0
4    0
Name: Exited, dtype: int64

# Observations:

We have splitted the date to 10 features and a target.


# 4. Divide the data set into Train and test sets.

In [24]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 7)

In [25]:
print ("Shape: ", X_train.shape)
X_train.head()

Shape:  (8000, 10)


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
4989,629,0,1,40,9,0.0,1,1,0,106.67
4498,570,0,1,30,2,131406.56,1,1,1,47952.45
8876,556,2,1,46,3,131764.96,1,1,1,108500.66
670,624,1,0,48,3,122388.38,2,0,0,30020.09
9552,664,0,0,41,5,0.0,1,1,1,152054.33


In [26]:
print ("Shape: ", X_test.shape)
X_test.head()

Shape:  (2000, 10)


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
1977,717,2,1,35,1,0.0,3,0,0,174770.14
3880,677,0,0,72,8,0.0,2,1,1,153604.44
52,788,0,0,33,5,0.0,2,0,0,116978.19
2551,537,0,0,53,3,0.0,1,1,1,91406.62
2246,717,2,0,39,6,0.0,2,1,0,93275.61


In [27]:
print ("Shape: ", y_train.shape)
y_train.head()

Shape:  (8000,)


4989    0
4498    0
8876    1
670     0
9552    0
Name: Exited, dtype: int64

In [28]:
print ("Shape: ", y_test.shape)
y_test.head()

Shape:  (2000,)


1977    1
3880    0
52      0
2551    0
2246    0
Name: Exited, dtype: int64

In [29]:
print ("Unique train labels: ", np.unique(y_train))

Unique train labels:  [0 1]


In [30]:
print ("Unique test labels: ", np.unique(y_test))

Unique test labels:  [0 1]


# Observations:

* Splitted the date into train (80%) and test (20%).. 8000 records and 10 features in train dataset and 2000 records and 10 features in test datase

# 5. Normalize the train and test data (2.5 points)

In [31]:
from scipy import stats

X_train_std = stats.zscore(X_train) 
X_test_std = stats.zscore(X_test)

In [32]:
y_train_cat = tf.keras.utils.to_categorical(y_train)
y_test_cat = tf.keras.utils.to_categorical(y_test)

In [33]:
y_train_cat[:5]

array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.]], dtype=float32)

# Observations:

As the dataset have varied scales, normalizing the data will yield better results.Used zscore to normalize the features and have converted both train and test labels into one-hot vectors

# 6. Initialize & build the model (10 points)

In [37]:
# Build a neural Network with a binary crossentropy loss function and sgd optimizer in Keras. The output layer with 1 neurons.

#Initialize Sequential model
model1 = tf.keras.models.Sequential()

#Input Layer
model1.add(tf.keras.layers.Dense(10, input_dim = 10, activation='relu'))

#Add Dense Layer which provides 2 Output after applying sigmoid (Output Layer)
model1.add(tf.keras.layers.Dense(2, activation='sigmoid'))

#Compile the model
model1.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

# Execute the model

In [47]:
model1.fit(X_train_std, y_train_cat, 
          validation_data=(X_test_std, y_test_cat), 
          epochs=30,
          batch_size=35)

Train on 8000 samples, validate on 2000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x236cc5c4630>

In [49]:
model1.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 22        
Total params: 132
Trainable params: 132
Non-trainable params: 0
_________________________________________________________________


# Observations:

As we have binary classification, we have used binary crossentropy for loss and sigmoid for activation in output layer.

Tried with relu activation in input layer and used the best activation method using grid search.

Same way tried with sgd optimizer and found the best optimizer using grid search.

The accuracy is around 84%


# 7. Optimize the model (5 points)

In [50]:
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import Nadam
from keras.optimizers import sgd
from keras.layers import Dropout
from keras.constraints import maxnorm

Using TensorFlow backend.


**Lets first findout the best optimizer among 'SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'**

In [53]:

# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
  #Initialize Sequential model
  model2 = Sequential()
  
  #Input Layer
  model2.add(Dense(10, input_dim = 10, activation='relu'))
  
  #Add Dense Layer which provides 1 Outputs after applying softmax (Output Layer)
  model2.add(Dense(1, activation='sigmoid'))
  
	#Compile the model
  model2.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
  
  return model2

model2 = KerasClassifier(build_fn=create_model, epochs=30, batch_size=35, verbose=0)


# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)

grid = GridSearchCV(estimator=model2, param_grid=param_grid, n_jobs=-1, scoring="accuracy", cv=2)
grid_result = grid.fit(X_train_std, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.854375 using {'optimizer': 'Nadam'}
0.816875 (0.004875) with: {'optimizer': 'SGD'}
0.852000 (0.002500) with: {'optimizer': 'RMSprop'}
0.817000 (0.002750) with: {'optimizer': 'Adagrad'}
0.850000 (0.004000) with: {'optimizer': 'Adadelta'}
0.853500 (0.007500) with: {'optimizer': 'Adam'}
0.842625 (0.003875) with: {'optimizer': 'Adamax'}
0.854375 (0.005375) with: {'optimizer': 'Nadam'}


# Observations:

The best optimizer we have got is Nadam and the accuracy is 85.43%.

The accuracy have increased 1%.

Note: As there is difference in multiclass representation with scikit-learn and keras, we are not going to use the categorical transformation on target variable with gridsearch. If we use the categorical transformation of target variable, we will be ending up with the error, "ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets". So with gridsearchcv, we are going to use target variable without categorical transformation.

# Best learning rate

In [55]:
# Tune Learning Rate
from keras.optimizers import Nadam

# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01):
  #Initialize Sequential model
  model4 = Sequential()
  #Input Layer
  model4.add(Dense(10, input_dim = 10, activation='relu'))
  #Add Dense Layer which provides 1 Outputs after applying sigmoid (Output Layer)
  model4.add(Dense(2, activation='sigmoid'))
	#Comile the model
  optimizer = Nadam(lr=learn_rate)
  model4.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
  return model4

# create model
model4 = KerasClassifier(build_fn=create_model, epochs=30, batch_size=30, verbose=0)

# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
param_grid = dict(learn_rate=learn_rate)

grid = GridSearchCV(estimator=model4, param_grid=param_grid, n_jobs=1, cv=2)
grid_result = grid.fit(X_train_std, y_train_cat)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.857000 using {'learn_rate': 0.01}
0.847625 (0.002750) with: {'learn_rate': 0.001}
0.857000 (0.004500) with: {'learn_rate': 0.01}
0.849125 (0.004125) with: {'learn_rate': 0.1}
0.831125 (0.011375) with: {'learn_rate': 0.2}
0.800500 (0.002000) with: {'learn_rate': 0.3}


# Observation:

The best learning rate we got is 0.01 and the accuracy is 85.7%.

There is a slight increase in accuracy

In [56]:
# Tune Batch Size and Number of Epochs

# Function to create model, required for KerasClassifier
def create_model():
  #Initialize Sequential model
  model3 = Sequential()
  
  #Input Layer
  model3.add(Dense(30, input_dim = 10, activation='softmax'))
  
  #Dropout
  model3.add(Dropout(0.2))
  
  #Add Dense Layer which provides 2 Outputs after applying sigmoid (Output Layer)
  model3.add(Dense(1, activation='sigmoid'))
  
	#Compile the model
  optimizer = Nadam(lr=0.01)
  model3.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
  
  return model3

# create model
model3 = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=model3, param_grid=param_grid, n_jobs=1, scoring="accuracy", cv=2)
grid_result = grid.fit(X_train_std, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.857625 using {'batch_size': 100, 'epochs': 50}
0.856250 (0.002750) with: {'batch_size': 10, 'epochs': 10}
0.849875 (0.001375) with: {'batch_size': 10, 'epochs': 50}
0.851000 (0.007500) with: {'batch_size': 10, 'epochs': 100}
0.854375 (0.004875) with: {'batch_size': 20, 'epochs': 10}
0.852000 (0.004250) with: {'batch_size': 20, 'epochs': 50}
0.852625 (0.003875) with: {'batch_size': 20, 'epochs': 100}
0.855625 (0.003875) with: {'batch_size': 40, 'epochs': 10}
0.855500 (0.007500) with: {'batch_size': 40, 'epochs': 50}
0.851500 (0.005250) with: {'batch_size': 40, 'epochs': 100}
0.856875 (0.004125) with: {'batch_size': 60, 'epochs': 10}
0.855250 (0.005500) with: {'batch_size': 60, 'epochs': 50}
0.856750 (0.005000) with: {'batch_size': 60, 'epochs': 100}
0.855625 (0.003625) with: {'batch_size': 80, 'epochs': 10}
0.857000 (0.006500) with: {'batch_size': 80, 'epochs': 50}
0.852625 (0.004375) with: {'batch_size': 80, 'epochs': 100}
0.854625 (0.003625) with: {'batch_size': 100, 'epochs':

# Observations:

We have got, the best batch size as 100 and number of epochs as 50 with accuracy 85.76%.

**Now lets build out final model with all the best parameter we have identified**

# Final Model after tuning...

In [59]:
#Initialize Sequential model
modelF = Sequential()
  
#Input Layer
modelF.add(Dense(30, input_dim = 10, activation='softmax'))
  
#Dropout
modelF.add(Dropout(0.2))

#Add Dense Layer which provides 10 Outputs
modelF.add(Dense(30, activation='softmax'))

#Dropout
modelF.add(Dropout(0.2))
  
#Add Dense Layer which provides 1 Output after applying sigmoid (Output Layer)
modelF.add(Dense(2, activation='sigmoid'))
 
#Comile the model
optimizer = Nadam(lr=0.01)
modelF.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
 
modelF.fit(X_train_std, y_train_cat, 
        validation_data=(X_test_std, y_test_cat), 
        epochs=50,
        batch_size=100)

Train on 8000 samples, validate on 2000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x2378ad4e7b8>

# Review model

In [60]:
modelF.summary()

Model: "sequential_64"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_128 (Dense)            (None, 30)                330       
_________________________________________________________________
dropout_40 (Dropout)         (None, 30)                0         
_________________________________________________________________
dense_129 (Dense)            (None, 30)                930       
_________________________________________________________________
dropout_41 (Dropout)         (None, 30)                0         
_________________________________________________________________
dense_130 (Dense)            (None, 2)                 62        
Total params: 1,322
Trainable params: 1,322
Non-trainable params: 0
_________________________________________________________________


# 9. Predict the results using 0.5 as a threshold (5 points)

In [61]:
# make predictions for the testing set without threshold (default threshold is 0.5 for binary classification)
y_pred = modelF.predict(X_test_std)

In [62]:
print ("Prediction: ", y_pred[:10])

Prediction:  [[0.1295929  0.8699752 ]
 [0.9417225  0.05827078]
 [0.9520912  0.04801086]
 [0.38283974 0.61905515]
 [0.9605429  0.03955212]
 [0.98263466 0.01744935]
 [0.95826375 0.0416472 ]
 [0.98274076 0.01726231]
 [0.9777361  0.02231273]
 [0.07952449 0.9201669 ]]


In [63]:
# make predictions for the testing set with threshold 0.4
y_pred_threshold = (modelF.predict_proba(X_test_std) >= 0.5)

In [64]:
print ("Prediction: ", y_pred_threshold[:10])

Prediction:  [[False  True]
 [ True False]
 [ True False]
 [False  True]
 [ True False]
 [ True False]
 [ True False]
 [ True False]
 [ True False]
 [False  True]]


# Observations:

We have predicted the results with and without specifying the threshold 0.5.

Lets check the accuracy score and confusion matrix for the same.


# 10. Print the Accuracy score and confusion matrix (2.5 points)

In [65]:
# Accuracy score for predictions without threshold

from sklearn import metrics
print("Accuracy score for predictions with no specified thershold: ", metrics.accuracy_score(y_test_cat, y_pred.round()))
print("Accuracy score for predictions with specified threshold 0.5: ", metrics.accuracy_score(y_test_cat, y_pred_threshold.round()))

Accuracy score for predictions with no specified thershold:  0.8555
Accuracy score for predictions with specified threshold 0.5:  0.8555


In [66]:
print ("Confusion Matrix for predictions with no specified threshold")
pd.DataFrame(metrics.confusion_matrix(y_test_cat.argmax(axis=1), y_pred.argmax(axis=1)),
                 columns=['pred_neg', 'pred_pos'], index=['neg', 'pos'])

Confusion Matrix for predictions with no specified threshold


Unnamed: 0,pred_neg,pred_pos
neg,1521,68
pos,221,190


In [67]:
print ("Confusion Matrix for predictions with specified threshold 0.5")
pd.DataFrame(metrics.confusion_matrix(y_test_cat.argmax(axis=1), y_pred_threshold.argmax(axis=1)),
                 columns=['pred_neg', 'pred_pos'], index=['neg', 'pos'])

Confusion Matrix for predictions with specified threshold 0.5


Unnamed: 0,pred_neg,pred_pos
neg,1521,68
pos,221,190


In [68]:
from sklearn.metrics import classification_report
print ("Classification Report for predictions with no specified threshold")
print(classification_report(y_test_cat, y_pred.round()))

Classification Report for predictions with no specified threshold
              precision    recall  f1-score   support

           0       0.87      0.96      0.91      1589
           1       0.74      0.46      0.57       411

   micro avg       0.86      0.86      0.86      2000
   macro avg       0.80      0.71      0.74      2000
weighted avg       0.85      0.86      0.84      2000
 samples avg       0.86      0.86      0.86      2000



In [69]:
from sklearn.metrics import classification_report
print ("Classification Report for predictions with specified threshold 0.5")
print(classification_report(y_test_cat, y_pred_threshold))

Classification Report for predictions with specified threshold 0.5
              precision    recall  f1-score   support

           0       0.87      0.96      0.91      1589
           1       0.74      0.46      0.57       411

   micro avg       0.86      0.86      0.86      2000
   macro avg       0.80      0.71      0.74      2000
weighted avg       0.85      0.86      0.84      2000
 samples avg       0.86      0.86      0.86      2000



# Observations:

For binary classification by default the threshold is 0.5. So there is no difference in the accuracy score or classification report with and without specifying the 0.5 threshold.
