# Creating a Machine Learning Model with Keras Library with TensorFlow as a backend.

#### We will start with some data on currency bank notes.

Some of this bank notes were forgeries and others were legitimate.

#### The researchers created a dataset from these bank notes by taking 400x400 pixel images of the notes and then extracting various numerical features based off the wavelets of the images.

## Very Important Note:

The data we are working with is not an image. (for now)

We are focusing right now on how to use keras for general machine learning.

#### Once we learn about convolution neuron networks, then we can expand on keras to feed in image data (pixel images) into a network.

### Let's start with importing the modules required.

In [74]:
import numpy as np
from numpy import genfromtxt
# genfromtxt - it generates an array from the text file.

In [75]:
data = genfromtxt(r"C:\Users\JERRY\OpenCV Udemy\Computer-Vision-with-Python\DATA\bank_note_data.txt",delimiter = ',')

# delimiter basically indicates that the actual features(values in the txt file which we are getting here as an array) are separated by commas

In [76]:
data

# we have various columns of features
# and in the end we have 0's and 1's which indicats whether or not its an actual authentic note
# 0 = Forgery 1 = Authentic  --  this is known as labels or class

# We can build the machine learning model that classify these bank notes by using the features (see 4 columns)

# we will feed in these features and can predict 0 or 1 class

array([[  3.6216 ,   8.6661 ,  -2.8073 ,  -0.44699,   0.     ],
       [  4.5459 ,   8.1674 ,  -2.4586 ,  -1.4621 ,   0.     ],
       [  3.866  ,  -2.6383 ,   1.9242 ,   0.10645,   0.     ],
       ...,
       [ -3.7503 , -13.4586 ,  17.5932 ,  -2.7771 ,   1.     ],
       [ -3.5637 ,  -8.3827 ,  12.393  ,  -1.2823 ,   1.     ],
       [ -2.5419 ,  -0.65804,   2.6842 ,   1.1952 ,   1.     ]])

We can start by separating the labels from the actual features...
### Separating Labels:

In [77]:
labels = data[:,4]   # selecting all the rows and taking the 4th column out

In [78]:
labels

array([0., 0., 0., ..., 1., 1., 1.])

### Separating features:

In [79]:
features = data[:,0:4]  # selecting all the rows and taking columns out from i=0 to i=4

In [80]:
features

array([[  3.6216 ,   8.6661 ,  -2.8073 ,  -0.44699],
       [  4.5459 ,   8.1674 ,  -2.4586 ,  -1.4621 ],
       [  3.866  ,  -2.6383 ,   1.9242 ,   0.10645],
       ...,
       [ -3.7503 , -13.4586 ,  17.5932 ,  -2.7771 ],
       [ -3.5637 ,  -8.3827 ,  12.393  ,  -1.2823 ],
       [ -2.5419 ,  -0.65804,   2.6842 ,   1.1952 ]])

In [81]:
# By convention, we use...coz this notations are used in machine learning papers

X = features   # usually 2d matrix 
y = labels     # singular array vector

### Now let's split the data in to trainning and test set, 

We are going to do that using sklearn library

In [82]:
from sklearn.model_selection import train_test_split

# Its going to split up the feature and the labels into a trainnnig set and a test set

# and nice thing about this train_test_split is that it also does randomize shuffling, so we dont need to
# worry about the concern of labels being sorted order.

In [83]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# press shift + tab and see the examples and how to use this....

# so all we are saying is ,,, Pass in your 'X' features and 'y' labels, choose your test_size which is 33 % here
#  33 % of features will go into X_test and 33 % will go into y_set

# random_state = 42  : as we mentioned that the data we have is going to get shuffled before we split in to train 
# and test set.  so to make sure that we get the same shuffle every time we set the random_state.

# 42 is just a arbitary value (default value)

Length of X_train (after split)

In [84]:
X_train

array([[-0.8734  , -0.033118, -0.20165 ,  0.55774 ],
       [ 2.0177  ,  1.7982  , -2.9581  ,  0.2099  ],
       [-0.36038 ,  4.1158  ,  3.1143  , -0.37199 ],
       ...,
       [-7.0364  ,  9.2931  ,  0.16594 , -4.5396  ],
       [-3.4605  ,  2.6901  ,  0.16165 , -1.0224  ],
       [-3.3582  , -7.2404  , 11.4419  , -0.57113 ]])

In [85]:
# y_train

In [86]:
len(X_train)

919

Length of X (before split)

In [87]:
X

array([[  3.6216 ,   8.6661 ,  -2.8073 ,  -0.44699],
       [  4.5459 ,   8.1674 ,  -2.4586 ,  -1.4621 ],
       [  3.866  ,  -2.6383 ,   1.9242 ,   0.10645],
       ...,
       [ -3.7503 , -13.4586 ,  17.5932 ,  -2.7771 ],
       [ -3.5637 ,  -8.3827 ,  12.393  ,  -1.2823 ],
       [ -2.5419 ,  -0.65804,   2.6842 ,   1.1952 ]])

In [88]:
len(X)

1372

That means if we check the length of X_test, it would be remaining data...

Length of X_test

In [89]:
X_test

array([[ 1.5691  ,  6.3465  , -0.1828  , -2.4099  ],
       [-0.27802 ,  8.1881  , -3.1338  , -2.5276  ],
       [ 0.051979,  7.0521  , -2.0541  , -3.1508  ],
       ...,
       [ 3.5127  ,  2.9073  ,  1.0579  ,  0.40774 ],
       [ 5.504   , 10.3671  , -4.413   , -4.0211  ],
       [-0.2062  ,  9.2207  , -3.7044  , -6.8103  ]])

In [90]:
len(X_test)

453

Checking the range values 

In [91]:
X_test.max()

17.1116

In [92]:
X_test.min()

-13.2869

Conclusion:  Now 67% data in X_train and remaining 33% data in X_test

Similarly we can check/ see y_test and y_train

Note: They corresponds based of their index to the rows of X_test and X_train

# Now typically when working with neural networks, it's a good idea to standardize or scale your data

We can do that by the convienient function in sklearn

In [93]:
from sklearn.preprocessing import MinMaxScaler

# Its going to force all the feature data to fall with in a certain range, and this can help the neural network
# actually perform better.

In [94]:
#So the way it works is , we create the scaler object

scaler_object = MinMaxScaler()

In [95]:
# Now we are going to fit scaler_object to  our trainning data

scaler_object.fit(X_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

 This fit basically just finds the minimum value and maximum value, then it is going to transform whatever array we pass in such as X_test and X_train based of the min and max that it has calculated during the fit...
 
 so fit lets it know what the min and max is.

In [96]:
# Now we do the transform
# and this is going to return back the scaled version. 
scaled_X_train = scaler_object.transform(X_train)

In [97]:
# we do the same for test data.

scaled_X_test = scaler_object.transform(X_test)

#### Now a common question here is why did we only fit to X_train, why not to all the data we have (X)...?

That is because we wanna make sure that the scaler object doesn't get to peak at any data, Otherwise, that's kind of like cheating, because we transform X_test having only fitted on X_train.

So if you were to fit on the entire dataset,thats known as data leakage , and that is cheating...


Motive: 

So we usually just wanna fit to our trainnning data and then transform both train and test having only fit to the trainnning data....Otherwise you are assuming some knowledge of the test data that in real life you are not gonna have.

#### So let's check the max of our dataset

In [98]:
scaled_X_train.max()

1.0000000000000002

In [99]:
scaled_X_train.min()   # so its all between 0 and 1

0.0

In [100]:
X_train

array([[-0.8734  , -0.033118, -0.20165 ,  0.55774 ],
       [ 2.0177  ,  1.7982  , -2.9581  ,  0.2099  ],
       [-0.36038 ,  4.1158  ,  3.1143  , -0.37199 ],
       ...,
       [-7.0364  ,  9.2931  ,  0.16594 , -4.5396  ],
       [-3.4605  ,  2.6901  ,  0.16165 , -1.0224  ],
       [-3.3582  , -7.2404  , 11.4419  , -0.57113 ]])

In [101]:
scaled_X_train

array([[4.44850688e-01, 5.14130449e-01, 2.18194638e-01, 8.50172258e-01],
       [6.53339968e-01, 5.82655745e-01, 9.93242398e-02, 8.17696322e-01],
       [4.81846700e-01, 6.69377018e-01, 3.61193167e-01, 7.63368407e-01],
       ...,
       [4.11050776e-04, 8.63104170e-01, 2.34046756e-01, 3.74261253e-01],
       [2.58284115e-01, 6.16029366e-01, 2.33861752e-01, 7.02643151e-01],
       [2.65661395e-01, 2.44444278e-01, 7.20316361e-01, 7.44775785e-01]])

### So now we're able to successfully scale those, both the trainning set and the test set

Now its time to build a simple network with Keras      :)   Which is actually very straight forward. 

In [102]:
from keras.models import Sequential
from keras.layers import Dense        # and this is that densely connected layer

In [103]:
model = Sequential()     # This basically creates a model and in the next steps, we add in our layers

# adding in the Dense layer
# and it expects 4 features, remember we have 4 columns in our feature array

# so right now we have 4 neurons in this Dense layer, the input dimension is 4 and activation is relu

model.add(Dense(4,input_dim=4,activation='relu'))   # relu = rectified linear unit


# lets add another Densely connected layer...

# In this layer, we can play around with the neurons. this layer would be at the middle of the neural network.
# Its  up to us, how many neurons we want to decide, 
# if we go to large, end up getting bad results...
# if we go to small...not good results either...

# Good idea to choose somewhere between 1x or 2x input dimensions, lets take 2x
# we dont need to provide input dimension here because this is not the input layer, the first one was.

# This is kind of a hidden layer
model.add(Dense(8,activation='relu'))


# And now we are  gonna have the last layer, output layer
# this is actually just going to be one neuron, this is one neuron and only has one output either 0 or 1
# activation will be sigmoid here, becasue remember the sigmoid function as discussed will be fit between 0 & 1

model.add(Dense(1,activation='sigmoid'))


# Note: there could be more hiddern layers if necessary, here it is not for this problem, so we did not consider.

# Now we compile the model

for the compilation of the model, we need to choose a loss, an optimizer, and the matrics that we are concerned with during fitting.

In [104]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
# run this and that should compile the model.

## Now its time to fit or train the model

In [105]:
# we are going to fit to the scaled trainning data; scaled_X_train
# then we also need to provide the correct labels to train on,y_train

# then we have to choose the number of "epochs": one epoch means you have gone through all the trainning data
# one time. lets take 50
# verbose: verbose means that its reporting back. It's gonna print along as its trainning.

model.fit(scaled_X_train,y_train,epochs=50,verbose=2)

# after running the cell, we should begin to see epoch 1 to 50...trainning

Epoch 1/50
 - 0s - loss: 0.7159 - acc: 0.4646
Epoch 2/50
 - 0s - loss: 0.6942 - acc: 0.4864
Epoch 3/50
 - 0s - loss: 0.6840 - acc: 0.5528
Epoch 4/50
 - 0s - loss: 0.6779 - acc: 0.5647
Epoch 5/50
 - 0s - loss: 0.6721 - acc: 0.5789
Epoch 6/50
 - 0s - loss: 0.6647 - acc: 0.5963
Epoch 7/50
 - 0s - loss: 0.6565 - acc: 0.6137
Epoch 8/50
 - 0s - loss: 0.6474 - acc: 0.6279
Epoch 9/50
 - 0s - loss: 0.6376 - acc: 0.6409
Epoch 10/50
 - 0s - loss: 0.6269 - acc: 0.6561
Epoch 11/50
 - 0s - loss: 0.6154 - acc: 0.6736
Epoch 12/50
 - 0s - loss: 0.6049 - acc: 0.6779
Epoch 13/50
 - 0s - loss: 0.5941 - acc: 0.6844
Epoch 14/50
 - 0s - loss: 0.5831 - acc: 0.6931
Epoch 15/50
 - 0s - loss: 0.5717 - acc: 0.7073
Epoch 16/50
 - 0s - loss: 0.5600 - acc: 0.7214
Epoch 17/50
 - 0s - loss: 0.5487 - acc: 0.7388
Epoch 18/50
 - 0s - loss: 0.5361 - acc: 0.7541
Epoch 19/50
 - 0s - loss: 0.5234 - acc: 0.7552
Epoch 20/50
 - 0s - loss: 0.5109 - acc: 0.7715
Epoch 21/50
 - 0s - loss: 0.4982 - acc: 0.7780
Epoch 22/50
 - 0s - lo

<keras.callbacks.History at 0x2c4d73c8710>

It looks like it was hovering almost around 96 % accuracy, remember that this is the accuracy on the trainning set...

We still don't know how it is going to do when it tries to predict on data that it has not seen before, and that data is our Test data...

### So once we train the model, The next question is,

How we actually predict on new data, well we can predict on the test set because the test set is essentially a new data, the model was never trained on that test set.

In [106]:
# This is our scaled X_test data which model hasn't seen before
# scaled_X_test

In [107]:
model.predict_classes(scaled_X_test)

# when we actually run this, its gonna produce this array of 0's and 1's that it predicts for the scaled_X_test

array([[0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
    

So for now, its good that we manage to do the prediction but how well did it do??

Did we actually perform well?

What we are going to do now is, evaluate our model...

# Evaluating our model

In [108]:
model.metrics_names

['loss', 'acc']

Now we are going to use these matrics to evaluate the performance

In [111]:
from sklearn.metrics import confusion_matrix,classification_report

In [None]:
# now we grab the prediction which we did before...

predictions = model.predict_classes(scaled_X_test)

# so now that we have a list of prediction on the test set, we already know the right answer.

# the right answer were the y_test, so we can say....

In [112]:
confusion_matrix(y_test,predictions)   

#run this and we have our confusion matrix

array([[254,   3],
       [  9, 187]], dtype=int64)

As we can see, it only misidentifying twelve(12) bank notes.

### If we actually wanna get things like precision,recall  and F1-score

we say, print the classification report

In [113]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

        0.0       0.97      0.99      0.98       257
        1.0       0.98      0.95      0.97       196

avg / total       0.97      0.97      0.97       453



## If we want to save this model

In [114]:
model.save('My_Super_Model.h5')   # and now its saved.

## if we want to load the model

In [115]:
from keras.models import load_model

In [116]:
new_model = load_model('My_Super_Model.h5')

And now the model is loaded.

In [117]:
# now we can just call predict classes
new_model.predict_classes(scaled_X_test)

array([[0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
    