## Introduction

The objective of this notebook is to perform binary classification on the bank note authentication dataset obtained from UCI machine learning repositories. For this test case, we will be using a simple random forest classifier. The aim of this exercise is not to achieve the highest possible accuracy, but rather to utilize the model as a component in the complete implementation of Docker software.

**Data Source:**
https://archive.ics.uci.edu/ml/datasets/banknote+authentication

## Import Libaries

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

import pickle

## Load Data

In [2]:
df = pd.read_csv("BankNote_Authentication.csv")

In [3]:
df.head()

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


## Prepare Data

In [4]:
# Feature and target variables
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [5]:
X.head()

Unnamed: 0,variance,skewness,curtosis,entropy
0,3.6216,8.6661,-2.8073,-0.44699
1,4.5459,8.1674,-2.4586,-1.4621
2,3.866,-2.6383,1.9242,0.10645
3,3.4566,9.5228,-4.0112,-3.5944
4,0.32924,-4.4552,4.5718,-0.9888


In [6]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: class, dtype: int64

In [7]:
# Split into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Model and Evaluation

In [8]:
# Random forest classifier
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)

RandomForestClassifier()

In [9]:
# Prediction
y_pred = classifier.predict(X_test)

In [10]:
# Classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       157
           1       0.98      0.99      0.99       118

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275



In [11]:
# Pickle file via serialization
pickle_out = open("classifier.pkl", "wb")
pickle.dump(classifier, pickle_out)
pickle_out.close()

In [12]:
# Predict new data
classifier.predict([[2,3,4,1]])



array([0])

In [13]:
import os
os.getcwd()

'/Users/akkwong/Desktop/DOCKER '