In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import os

Banknote Authentication Dataset
--------------------------------------------

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.


Attribute Information:

1. variance of Wavelet Transformed image (continuous)
2. skewness of Wavelet Transformed image (continuous)
3. curtosis of Wavelet Transformed image (continuous)
4. entropy of image (continuous)
5. class (integer)

This is a copy of the UCI Machine Learning banknote authentication dataset. https://archive.ics.uci.edu/ml/datasets/banknote+authentication

In [16]:
# Read the CSV file into a pandas DataFrame

notes = pd.read_csv('../Resources/data_banknote_authentication.csv', header=None, names=['variance','skewness','curtosis', 'entropy', 'class'])
notes

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.62160,8.66610,-2.8073,-0.44699,0
1,4.54590,8.16740,-2.4586,-1.46210,0
2,3.86600,-2.63830,1.9242,0.10645,0
3,3.45660,9.52280,-4.0112,-3.59440,0
4,0.32924,-4.45520,4.5718,-0.98880,0
...,...,...,...,...,...
1367,0.40614,1.34920,-1.4501,-0.55949,1
1368,-1.38870,-4.87730,6.4774,0.34179,1
1369,-3.75030,-13.45860,17.5932,-2.77710,1
1370,-3.56370,-8.38270,12.3930,-1.28230,1


In [36]:
# Assign the data to X and y

X = notes[["variance", 'skewness', 'curtosis', 'entropy']]
y = notes["class"]

print("Shape: ", X.shape, y.shape)

Shape:  (1372, 4) (1372,)


Split the data into training and testing sets

In [37]:
# Split the data by using train_test_split()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

Create a logistic regression model

In [38]:
# Create a logistic regression model

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier

LogisticRegression()

Fit (train) our model by using the training data

In [39]:
# Fit the model to the data

classifier.fit(X_train, y_train)

LogisticRegression()

Validate the model by using the test data

In [41]:
# Print the accuracy score for the test data

print(f"Training Data Score: {classifier.score(X_train, y_train)}")
print(f"Testing Data Score: {classifier.score(X_test, y_test)}")

Training Data Score: 0.9912536443148688
Testing Data Score: 0.9883381924198251


Make predictions

In [50]:
# Make predictions by using the X_test and y_test data
# Print at least 10 predictions vs. their actual labels

print("Classes are either 0 (legitimate?) or 1 (counterfeit?)")
print(f"The points are classified as: {list(y_test[:10])}")
print(f"The new point are predicted as: {list(classifier.predict(X_test[:10]))}")

Classes are either 0 (legitimate?) or 1 (counterfeit?)
The points are classified as: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The new point are predicted as: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
