# Bank Note Authentication problem
Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

Dataset reference : https://www.kaggle.com/ritesaluja/bank-note-authentication-uci-data?select=BankNote_Authentication.csv

## 1. Data Ingestion and Understing

In [14]:
#read dataset
import pandas as pd
import numpy as np
from pprint import pprint

df = pd.read_csv("../datasets/BankNote_Authentication.csv")
df.head()

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [7]:
#see the type of values in columns
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
variance,1372.0,0.433735,2.842763,-7.0421,-1.773,0.49618,2.821475,6.8248
skewness,1372.0,1.922353,5.869047,-13.7731,-1.7082,2.31965,6.814625,12.9516
curtosis,1372.0,1.397627,4.31003,-5.2861,-1.574975,0.61663,3.17925,17.9274
entropy,1372.0,-1.191657,2.101013,-8.5482,-2.41345,-0.58665,0.39481,2.4495
class,1372.0,0.444606,0.497103,0.0,0.0,0.0,1.0,1.0


In [9]:
#Size of dataset
print(f"Dataset has {df.shape[0]} rows and {df.shape[1]} columns")

Dataset has 1372 rows and 5 columns


## 2. Data Preperation For Modeling 

In [16]:
#Sperate out independent and dependent features
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
pprint(X.head())
print()
pprint(y.head())

   variance  skewness  curtosis  entropy
0   3.62160    8.6661   -2.8073 -0.44699
1   4.54590    8.1674   -2.4586 -1.46210
2   3.86600   -2.6383    1.9242  0.10645
3   3.45660    9.5228   -4.0112 -3.59440
4   0.32924   -4.4552    4.5718 -0.98880

0    0
1    0
2    0
3    0
4    0
Name: class, dtype: int64


In [17]:
# Train test split to 70% training and 30% testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

## 3. Model Training, prediction and Evaluation

In [23]:
#Train, predict and evaluate a random forest classifier
#training
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)

#prediction 
y_predicted = classifier.predict(X_test)

#evalation
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test, y_predicted)
print(f"Accuracy of Random forest classifer is {score*100} %")

Accuracy of Random forest classifer is 98.7864077669903 %


## 4. Save Trained Model

In [28]:
#save model in pickle format
import pickle
pickle_file = open("../models/random_forest_classifer_model.pkl", "wb")
pickle.dump(classifier, pickle_file)
pickle_file.close()
print("Random forest classifer model for bank note authenticatio  is saved sucessfuly")

Random forest classifer model for bank note authenticatio  is saved sucessfuly


In [31]:
df[df["class"] == 1]

Unnamed: 0,variance,skewness,curtosis,entropy,class
762,-1.39710,3.31910,-1.392700,-1.99480,1
763,0.39012,-0.14279,-0.031994,0.35084,1
764,-1.66770,-7.15350,7.892900,0.96765,1
765,-3.84830,-12.80470,15.682400,-1.28100,1
766,-3.56810,-8.21300,10.083000,0.96765,1
...,...,...,...,...,...
1367,0.40614,1.34920,-1.450100,-0.55949,1
1368,-1.38870,-4.87730,6.477400,0.34179,1
1369,-3.75030,-13.45860,17.593200,-2.77710,1
1370,-3.56370,-8.38270,12.393000,-1.28230,1
