<h2 style='color:blue' align="center">Decision Tree Classification</h2>

**We will predict whether a bank note is authentic or fake depending upon the four different attributes of the image of the note.**

The attributes are: <br>
**1 - Variance of wavelet transformed image**, <br> 
**2 - Curtosis of the image**<br>
**3 - Entropy**, and <br>
**4 - Skewness of the image.**

Install pandas and sklearn

pip install pandas-profiling

pip install sklearn

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
dt_df = pd.read_csv("3_bill_authentication.csv")
dt_df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


**Analyze Shape of Data**

In [3]:
dt_df.shape

(1372, 5)

*which means that our dataset has 1372 records and 5 attributes.*

**Divide our data into attributes and labels**

In [4]:
dt_X = dt_df.drop('Class', axis=1)
dt_Y = dt_df['Class']

*Here the X variable contains all the columns from the dataset, except the "Class" column, which is the label. The Y variable contains the values from the "Class" column. The X variable is our attribute set and Y variable contains corresponding labels.*

In [5]:
dt_X

Unnamed: 0,Variance,Skewness,Curtosis,Entropy
0,3.62160,8.66610,-2.8073,-0.44699
1,4.54590,8.16740,-2.4586,-1.46210
2,3.86600,-2.63830,1.9242,0.10645
3,3.45660,9.52280,-4.0112,-3.59440
4,0.32924,-4.45520,4.5718,-0.98880
...,...,...,...,...
1367,0.40614,1.34920,-1.4501,-0.55949
1368,-1.38870,-4.87730,6.4774,0.34179
1369,-3.75030,-13.45860,17.5932,-2.77710
1370,-3.56370,-8.38270,12.3930,-1.28230


In [6]:
dt_Y

0       0
1       0
2       0
3       0
4       0
       ..
1367    1
1368    1
1369    1
1370    1
1371    1
Name: Class, Length: 1372, dtype: int64

**Divide our data into training and test sets.** <br>
*split up 20% of the data in to the test set and 80% for training.*

In [7]:
from sklearn.model_selection import train_test_split
dt_X_train, dt_X_test, dt_Y_train, dt_Y_test = train_test_split(dt_X, dt_Y, test_size=0.20)

**Train the decision tree algorithm on this data and make predictions.**

In [8]:
from sklearn.tree import DecisionTreeClassifier
dt_classifier = DecisionTreeClassifier()
dt_classifier.fit(dt_X_train, dt_Y_train)

DecisionTreeClassifier()

**Make predictions on the test data.**

In [9]:
dt_Y_pred = dt_classifier.predict(dt_X_test)

In [10]:
dt_Y_pred

array([0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1,
       1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1,
       1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0,
       1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0,
       1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0,
       0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1])

In [11]:
dt_Y_test

39      0
4       0
446     0
259     0
52      0
       ..
1163    1
276     0
939     1
1118    1
1359    1
Name: Class, Length: 275, dtype: int64

**Validate Model Performance basis Predicted vs. Actual values of Y for X_test**

**Precision and recall, F1 measure, accuracy and confusion matrix**

In [12]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(confusion_matrix(dt_Y_test, dt_Y_pred))
print(classification_report(dt_Y_test, dt_Y_pred))
print(accuracy_score(dt_Y_test, dt_Y_pred))

[[152   3]
 [  1 119]]
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       155
           1       0.98      0.99      0.98       120

    accuracy                           0.99       275
   macro avg       0.98      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275

0.9854545454545455
