# Bank Note Authentication with Dockers and Flask

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

In [1]:
# Dataset link: https://www.kaggle.com/ritesaluja/bank-note-authentication-uci-data

import numpy as np
import pandas as pd

In [12]:
# set the working directory to where the csv file is located

data_dir = 'C:\\Users\\Drew\\Downloads\\BankNote'

In [15]:
# read the csv into pandas and define it as a dataframe

df = pd.read_csv('C:\\Users\\Drew\\Downloads\\BankNote\\BankNote_Authentication.csv')

In [17]:
# there are 1371 rows in this dataset
# all values have already been feature engineered/normalized in this dataset

df

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.62160,8.66610,-2.8073,-0.44699,0
1,4.54590,8.16740,-2.4586,-1.46210,0
2,3.86600,-2.63830,1.9242,0.10645,0
3,3.45660,9.52280,-4.0112,-3.59440,0
4,0.32924,-4.45520,4.5718,-0.98880,0
...,...,...,...,...,...
1367,0.40614,1.34920,-1.4501,-0.55949,1
1368,-1.38870,-4.87730,6.4774,0.34179,1
1369,-3.75030,-13.45860,17.5932,-2.77710,1
1370,-3.56370,-8.38270,12.3930,-1.28230,1


In [18]:
# Identify independent and dependent features to explore relationships (X_train, Y_train):

X = df.iloc[:,:-1]
Y = df.iloc[:,-1]

In [22]:
# X contains the first 4 columns of the dataframe df

X.head(5)

Unnamed: 0,variance,skewness,curtosis,entropy
0,3.6216,8.6661,-2.8073,-0.44699
1,4.5459,8.1674,-2.4586,-1.4621
2,3.866,-2.6383,1.9242,0.10645
3,3.4566,9.5228,-4.0112,-3.5944
4,0.32924,-4.4552,4.5718,-0.9888


In [23]:
# Y contains the very last column of the dataframe df

Y.head(5)

0    0
1    0
2    0
3    0
4    0
Name: class, dtype: int64

Train Test Split (designating X & Y train/test)

In [24]:
# import train_test_split 

from sklearn.model_selection import train_test_split

In [27]:
# desinate split variables
# the test size will be 30% of the dataset present

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=0)

The model being used here is the Random Forest Classifier


In [28]:
# import the model

from sklearn.ensemble import RandomForestClassifier

In [29]:
# create model instance 'classifier'

classifier = RandomForestClassifier()

In [30]:
# fit the data to the classifier

classifier.fit(X_train,Y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

Predict off of fitted model

In [31]:
# predict off of the test set and assign to a variable

Y_pred = classifier.predict(X_test)

Check accuracy of predicitons

In [32]:
# import the model for metrics

from sklearn.metrics import accuracy_score

In [34]:
# compare the original values to the predicted values

score = accuracy_score(Y_test, Y_pred)

In [35]:
# display the accuracy results

score

0.9878640776699029

**Need to convert to a pickle file to be able to use it within a flask app** 
(About Pickle: https://www.youtube.com/watch?v=2Tw39kZIbhs)**:**

**Pickle File:**
- the end results of the classifier in the ML model is a an object

- we want to serialize (or pickle) the model into a byte stream so that it can be used later on without retraining the model

- also expedites the time it takes to read in a large dataset (whether it be from SQL or a .csv)

**Flask:**

- micro framework that allows you to make servers 

Create the Pickle file for the ML classifier:

In [36]:
# create pickle file using serialization

import pickle
pickle_out = open("classifier.pkl","wb") #creates the out file to dump data into
pickle.dump(classifier, pickle_out) #designates where the data is dumped into
pickle_out.close() #closing the file and saving it