# Testing the model

Using your solution so far, test the model on new data.

The new data is located in the ‘Bank_data_testing.csv’.

Good luck!

## Import the relevant libraries

In [342]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import warnings
import seaborn
warnings.filterwarnings("ignore")
seaborn.set()

## Load the data

Load the ‘Bank_data.csv’ dataset.

In [343]:
bank_data = pd.read_csv('Bank-data.csv')

In [344]:
bank_data.head()

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,0,1.334,0.0,1.0,0.0,0.0,117.0,no
1,1,0.767,0.0,0.0,2.0,1.0,274.0,yes
2,2,4.858,0.0,1.0,0.0,0.0,167.0,no
3,3,4.12,0.0,0.0,0.0,0.0,686.0,yes
4,4,4.856,0.0,1.0,0.0,0.0,157.0,no


In [345]:
bank_data['y'] = bank_data['y'].map({'yes':1,'no':0})
bank_data = bank_data.drop(['Unnamed: 0'],axis=1)

### Declare the dependent and independent variables

Use 'duration' as the independet variable.

In [346]:
x1 = bank_data['duration']
y = bank_data['y']

### Simple Logistic Regression

Run the regression and graph the scatter plot.

In [347]:
x = sm.add_constant(x1)
reg_log = sm.Logit(y,x)
res = reg_log.fit()

Optimization terminated successfully.
         Current function value: 0.546118
         Iterations 7


## Expand the model

We can be omitting many causal factors in our simple logistic model, so we instead switch to a multivariate logistic regression model. Add the ‘interest_rate’, ‘march’, ‘credit’ and ‘previous’ estimators to our model and run the regression again. 

### Declare the independent variable(s)

In [348]:
estimators=['interest_rate','credit','march','may','previous','duration']
X1 = bank_data[estimators]
y = bank_data['y']

In [349]:
X = sm.add_constant(X1)
Reg_Log = sm.Logit(y,X)
Res = Reg_Log.fit()

Optimization terminated successfully.
         Current function value: 0.335942
         Iterations 7


### Confusion Matrix

Find the confusion matrix of the model and estimate its accuracy. 

<i> For convenience we have already provided you with a function that finds the confusion matrix and the model accuracy.</i>

In [350]:
def confusion_Matrix_Accuracy(xData , actual_values, model):
        pred_values = model.predict(xData)
        bins = np.array([0,0.5,1])
        cm = np.histogram2d(actual_values,pred_values,bins=bins)[0]
        accuracy = (cm[0,0]+cm[1,1])/cm.sum()
        return cm,accuracy

cm_train = confusion_Matrix_Accuracy(X,y,Res)[0]
accuracy_train = (confusion_Matrix_Accuracy(X,y,Res)[1] * 100).round(3)


## Test the model

Load the test data from the ‘Bank_data_testing.csv’ file provided. (Remember to convert the outcome variable ‘y’ into Boolean). 

### Load new data 

In [351]:
test_data = pd.read_csv('Bank-data-testing.csv')
test_data.head()

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,0,1.313,0.0,1.0,0.0,0.0,487.0,no
1,1,4.961,0.0,0.0,0.0,0.0,132.0,no
2,2,4.856,0.0,1.0,0.0,0.0,92.0,no
3,3,4.12,0.0,0.0,0.0,0.0,1468.0,yes
4,4,4.963,0.0,0.0,0.0,0.0,36.0,no


In [352]:
test_data = test_data.drop(['Unnamed: 0'],axis=1)
test_data['y'] = test_data['y'].map({'yes':1,'no':0})
test_data

Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,1.313,0.0,1.0,0.0,0.0,487.0,0
1,4.961,0.0,0.0,0.0,0.0,132.0,0
2,4.856,0.0,1.0,0.0,0.0,92.0,0
3,4.120,0.0,0.0,0.0,0.0,1468.0,1
4,4.963,0.0,0.0,0.0,0.0,36.0,0
...,...,...,...,...,...,...,...
217,4.963,0.0,0.0,0.0,0.0,458.0,1
218,1.264,0.0,1.0,1.0,0.0,397.0,1
219,1.281,0.0,1.0,0.0,0.0,34.0,0
220,0.739,0.0,0.0,2.0,0.0,233.0,0


### Declare the dependent and the independent variables

In [353]:
x_test1 = test_data[estimators]
y_test = test_data['y']
x_test1

Unnamed: 0,interest_rate,credit,march,may,previous,duration
0,1.313,0.0,1.0,0.0,0.0,487.0
1,4.961,0.0,0.0,0.0,0.0,132.0
2,4.856,0.0,1.0,0.0,0.0,92.0
3,4.120,0.0,0.0,0.0,0.0,1468.0
4,4.963,0.0,0.0,0.0,0.0,36.0
...,...,...,...,...,...,...
217,4.963,0.0,0.0,0.0,0.0,458.0
218,1.264,0.0,1.0,1.0,0.0,397.0
219,1.281,0.0,1.0,0.0,0.0,34.0
220,0.739,0.0,0.0,2.0,0.0,233.0


In [354]:
X_test = sm.add_constant(x_test1)

Determine the test confusion matrix and the test accuracy and compare them with the train confusion matrix and the train accuracy.

In [355]:
cm_test = confusion_Matrix_Accuracy(X_test,y_test,Res)[0]
accuracy_test = (confusion_Matrix_Accuracy(X_test,y_test,Res)[1]*100).round(3)

In [356]:
print("Overall accuracy : ", accuracy_train ,"\n",)
print("Test accuracy : ", accuracy_test ,"\n",)
print("Confusion Matrix for all data : ","\n",cm_train,"\n",)
print("Confusion Matrix for test data : ","\n",cm_test,"\n",)

Overall accuracy :  86.486 

Test accuracy :  86.937 

Confusion Matrix for all data :  
 [[220.  39.]
 [ 31. 228.]] 

Confusion Matrix for test data :  
 [[94. 17.]
 [12. 99.]] 



### THUS WE CAN SAY THAT THE ACCURACY FOR THE TEST DATA AND THE OVERALL DATA IS APPROX SAME ND THUS OUR MODEL IS SIGNIFICANT