# K-Fold Crosss Valdation on Bank Notes

## Description of the data:

Data was extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

# Reading in data

In [1]:
from symbol import import_as_name
import numpy as np
import os
import pandas as pd
import sys
import sklearn.metrics




dataset = pd.read_csv("C:/Users/teddy/Downloads/Machine Learning/Perceptron HW/BankNote_Authentication.csv")

dataset.tail()

  from symbol import import_as_name


Unnamed: 0,variance,skewness,curtosis,entropy,class
1367,0.40614,1.3492,-1.4501,-0.55949,1
1368,-1.3887,-4.8773,6.4774,0.34179,1
1369,-3.7503,-13.4586,17.5932,-2.7771,1
1370,-3.5637,-8.3827,12.393,-1.2823,1
1371,-2.5419,-0.65804,2.6842,1.1952,1


# Divide the data into training and testing sets

In [2]:

X = dataset.iloc[:, [0, 2]]
y = dataset.iloc[:,4]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=1, stratify=y)



# Create pipelines for each of the classifiers

Pipe 1 ->Logistic Regression

Pipe 2 -> Decision Tree

Pipe 3 ->KNN Classifier

In [3]:
import numpy as np
from sklearn.preprocessing import StandardScaler
# from sklearn.linear_model import Perceptron

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression 
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score


pipe1 = make_pipeline(StandardScaler(), LogisticRegression(random_state = 0,solver="lbfgs"))


pipe2 = make_pipeline(StandardScaler(),DecisionTreeClassifier(max_depth=6,
                                             criterion='entropy',
                                             random_state=0))

pipe3 = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=20,
                                                             p=3,
                                                             metric='minkowski'))

clf_labels = ['LogisticRegression', 'Decision tree', 'KNN']

print('10-fold cross validation:\n')
for clf, label in zip([pipe1, pipe2, pipe3], clf_labels):
    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='accuracy')
    print("Accuracy: " + str(round(scores.mean(), 2)) + 
          " Stdev: " + str(round(scores.std(), 3)) +
          " [" + label + "]")

10-fold cross validation:

Accuracy: 0.87 Stdev: 0.025 [LogisticRegression]
Accuracy: 0.88 Stdev: 0.027 [Decision tree]
Accuracy: 0.89 Stdev: 0.027 [KNN]


## Ensemble method used -> Voting Classifier
 Trains various base models or estimators and predicts on the basis of aggregating the findings of each base estimator

In [4]:
from sklearn.ensemble import VotingClassifier



mv_clf = VotingClassifier(estimators=[('p', pipe1), ('dt', pipe2), ('kn', pipe3)])


clf_labels += ['Majority voting']
all_clf = [pipe1, pipe2, pipe3, mv_clf]

for clf, label in zip(all_clf, clf_labels):
    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='accuracy')
    print("Accuracy: " + str(round(scores.mean(), 2)) + 
          " Stdev: " + str(round(scores.std(), 3)) +
          " [" + label + "]")

Accuracy: 0.87 Stdev: 0.025 [LogisticRegression]
Accuracy: 0.88 Stdev: 0.027 [Decision tree]
Accuracy: 0.89 Stdev: 0.027 [KNN]
Accuracy: 0.89 Stdev: 0.027 [Majority voting]


## Logistic Regression

In [5]:
pipe1.fit(X_train, y_train)

y_pred = pipe1.predict(X_test)
print('Misclassified test set examples:', (y_test != y_pred).sum())
print('Out of a total of:', y_test.shape[0])
print('Logistic Accuracy:', pipe1.score(X_test, y_test))

Misclassified test set examples: 73
Out of a total of: 549
Logistic Accuracy: 0.8670309653916212


## Decison Tree Classifier


In [6]:
pipe2.fit(X_train, y_train)

y_pred = pipe2.predict(X_test)
print('Misclassified test set examples:', (y_test != y_pred).sum())
print('Out of a total of:', y_test.shape[0])
print('Decision Tree Accuracy:', pipe2.score(X_test, y_test))

Misclassified test set examples: 48
Out of a total of: 549
Decision Tree Accuracy: 0.912568306010929


## KNN 

In [7]:
pipe3.fit(X_train, y_train)

y_pred = pipe3.predict(X_test)
print('Misclassified test set examples:', (y_test != y_pred).sum())
print('Out of a total of:', y_test.shape[0])
print('KNN Accuracy:', pipe3.score(X_test, y_test))

Misclassified test set examples: 52
Out of a total of: 549
KNN Accuracy: 0.9052823315118397


## Voting Classifier

In [8]:
mv_clf.fit(X_train, y_train)

y_pred = mv_clf.predict(X_test)
print('Misclassified test set examples:', (y_test != y_pred).sum())
print('Out of a total of:', y_test.shape[0])
print('Voting Classifier Accuracy:', mv_clf.score(X_test, y_test))

Misclassified test set examples: 48
Out of a total of: 549
Voting Classifier Accuracy: 0.912568306010929


## Final Analysis
The cross-evaluation accuracy scores match up to the to the testing data in terms of trend.The testing data offers a more nuanced evaluation, breaking the tie in score between Logistic Regression and Voting Classifier found in the cross-evaluation.