<a href="https://colab.research.google.com/github/abitaaugustine/DPhi-Deep-Learning-Bootcamp/blob/master/DPhi_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Context
Banknotes are one of the most important assets of a country. Some miscreants introduce fake notes which bear a resemblance to original note to create discrepancies of the money in the financial market. It is difficult for humans to tell true and fake banknotes apart especially because they have a lot of similar features.

#Motivation 
Despite a decrease in the use of currency due to the recent growth in the use of electronic transactions, cash transactions remain very important in the global market. Banknotes are used to carry out financial activities. To continue with smooth cash transactions, entry of forged banknotes in circulation should be preserved. There has been a drastic increase in the rate of fake notes in the market. Fake money is an imitation of the genuine notes and is created illegally for various motives. These fake notes are created in all denominations which brings the financial market of the country to a low level. The various advancements in the field of scanners and copy machines have led the miscreants to create copies of banknotes. It is difficult for human-eye to recognize a fake note because they are created with great accuracy to look alike a genuine note. Security aspects of banknotes have to be considered and security features are to be introduced to mitigate fake currency. Hence, there is a dire need in banks and ATM machines to implement a system that classifies a note as genuine or fake.

# About the Data
Data were extracted from images that were taken for the evaluation of an authentication procedure for banknotes. Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object grey-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool was used to extract features from images.

### Train Dataset 

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

bank_note_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_note_data/training_set_label.csv" )

#### Data Description

VWTI: Variance of Wavelet Transformed Image
SWTI: Skewness of Wavelet Transformed Image
CWTI: Curtosis of Wavelet Transformed Image
EI: Entropy of Image
Class: Class (1: genuine, 0: forged)
### Test Dataset 

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_note_data/testing_set_label.csv')

Here the target column is deliberately not there as you need to predict it.

##Acknowledgement
The dataset is downloaded from the UCI Machine Learning Repository.



In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn import datasets, linear_model
from sklearn.metrics import roc_curve, auc

import statsmodels.api as sm
from statsmodels.formula.api import ols

### Function to create Confusion Matrix

In [50]:
def confusionMatrix(predicted, actual, threshold):
  if len(predicted) != len(actual):
    return -1
  tp = 0.0
  fp = 0.0
  tn = 0.0
  fn = 0.0
  for i in range(len(actual)):
    if actual[i] > 0.5: #labels that are 1.0 (denote authentic bank notes)
      if predicted[i] > threshold:
        tp += 1.0 
      else:
        fn += 1.0 
    else: #labels that are 0.0 (denote inauthentic bank notes)
      if predicted[i] < threshold:
        tn += 1.0
      else:
        fp += 1.0 
  rtn = [tp, fn, fp, tn]
  return rtn

### Reading the data into values, Ordinary Least Square Regression (OLS) and Removing Class label from the data

In [30]:
train_set = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_note_data/training_set_label.csv" )
test_set = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_note_data/testing_set_label.csv')
banknote_model = ols("Class ~ VWTI + SWTI + CWTI + EI", data=train_set).fit()
banknote_model_summary = banknote_model.summary()
train_set_labels = train_set.pop('Class')
test_set_labels = test_set

In [31]:
train = np.array(train_set)
test = np.array(test_set)
train_labels = np.array(train_set_labels)
test_labels = np.array(test_set_labels)

In [38]:
train.shape

(1096, 4)

In [39]:
test.shape

(275, 4)

In [40]:
train_labels.shape

(1096,)

In [41]:
test_labels.shape

(275, 4)

### Training the data

In [42]:
model = linear_model.LinearRegression()
model.fit(train,train_labels)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [43]:
trainPredictions = model.predict(train)

In [59]:
df1 = pd.DataFrame({'Actual': train_labels, 'Predicted': trainPredictions.flatten()})
df1.Predicted

0       0.463187
1      -0.023899
2       1.056323
3       0.672195
4       0.173525
          ...   
1091    0.143500
1092    0.065618
1093   -0.103644
1094   -0.090891
1095    0.394616
Name: Predicted, Length: 1096, dtype: float64

In [46]:
t = np.linspace(0,1,50)
t

array([0.        , 0.02040816, 0.04081633, 0.06122449, 0.08163265,
       0.10204082, 0.12244898, 0.14285714, 0.16326531, 0.18367347,
       0.20408163, 0.2244898 , 0.24489796, 0.26530612, 0.28571429,
       0.30612245, 0.32653061, 0.34693878, 0.36734694, 0.3877551 ,
       0.40816327, 0.42857143, 0.44897959, 0.46938776, 0.48979592,
       0.51020408, 0.53061224, 0.55102041, 0.57142857, 0.59183673,
       0.6122449 , 0.63265306, 0.65306122, 0.67346939, 0.69387755,
       0.71428571, 0.73469388, 0.75510204, 0.7755102 , 0.79591837,
       0.81632653, 0.83673469, 0.85714286, 0.87755102, 0.89795918,
       0.91836735, 0.93877551, 0.95918367, 0.97959184, 1.        ])

### Creaing the confusion matrix for the training data

In [51]:
for threshold in t:  
  confusionMatTrain = confusionMatrix(trainPredictions, train_labels, threshold)
  tp = confusionMatTrain[0]
  fn = confusionMatTrain[1]
  fp = confusionMatTrain[2]
  tn = confusionMatTrain[3]
  print("For Threshold value = "+str(threshold))
  print("tp = " + str(tp) + "\tfn = " + str(fn) + "\n" + "fp = " + str(fp) + "\ttn = " + str(tn))
  print("Error Rate = "+str((fp+fn)/(tp+fn+fp+tn)) + '\n')

For Threshold value = 0.0
tp = 488.0	fn = 0.0
fp = 364.0	tn = 244.0
Error Rate = 0.33211678832116787

For Threshold value = 0.02040816326530612
tp = 488.0	fn = 0.0
fp = 327.0	tn = 281.0
Error Rate = 0.2983576642335766

For Threshold value = 0.04081632653061224
tp = 488.0	fn = 0.0
fp = 283.0	tn = 325.0
Error Rate = 0.25821167883211676

For Threshold value = 0.061224489795918366
tp = 488.0	fn = 0.0
fp = 251.0	tn = 357.0
Error Rate = 0.229014598540146

For Threshold value = 0.08163265306122448
tp = 488.0	fn = 0.0
fp = 224.0	tn = 384.0
Error Rate = 0.20437956204379562

For Threshold value = 0.1020408163265306
tp = 488.0	fn = 0.0
fp = 202.0	tn = 406.0
Error Rate = 0.1843065693430657

For Threshold value = 0.12244897959183673
tp = 488.0	fn = 0.0
fp = 183.0	tn = 425.0
Error Rate = 0.16697080291970803

For Threshold value = 0.14285714285714285
tp = 488.0	fn = 0.0
fp = 156.0	tn = 452.0
Error Rate = 0.14233576642335766

For Threshold value = 0.16326530612244897
tp = 488.0	fn = 0.0
fp = 131.0	tn 

### Testing the model

In [52]:

testPredictions = model.predict(test)

In [54]:
df2 = pd.DataFrame({'Predicted': testPredictions.flatten()})
df2

Unnamed: 0,Predicted
0,0.866183
1,0.742020
2,-0.341532
3,-0.000983
4,-0.059089
...,...
270,0.958988
271,0.158454
272,0.221635
273,-0.134136


### Sorting the predicted value to either authenticated or fraudulent

In [68]:
predictions = []
for i in df2.Predicted:
  if i >= 0.5:
    predictions.append(1)
  else:
    predictions.append(0)

In [69]:
print(predictions)

[1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
