# Introduction

<br></br>
Take me to the [code and Jupyter Notebook](https://github.com/AMoazeni/Machine-Learning-Database-Prediction/blob/master/Jupyter%20Notebook/ML%20-%20Database%20Prediction.ipynb) for Database Prediction!

<br></br>
This article shows you the theory and code behind a popular Machine Learning technique called Artificial Neural Network (ANN) which is a common Deep Learning algorithm.

<br></br>
You are provided with a Bank database with 10,000 customers. You're training an ML model to predict the likelihood of a customer leaving or staying with the bank. This code is robust in architecture and can be modified to predict all sorts of customer behavior, provided with enough data.


<br></br>
<br></br>

# Artificial Neural Networks

<br></br>
Let's take a look at the theory behind the Artificial Neural Network algorithm which was popularized by Geoffrey Hinton in the 1980's and is used in Deep Machine Learning. "Deep" in Deep Learning refers to all the hidden layers used in this type of Dynamic Programming algorithm.


<br></br>
The input layer observations and related output refer to ONE row of data. Adjustment of weights is how Neural Nets learn, they decide the strength and importance of signals that are passed along or blocked by an Activation Function. They keep adjusting weights until the predicted output closely matches the actual output.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/01%20-%20Deep%20Learning.png" alt="Deep-Learning"></div>


<br></br>
Here is a zoomed in version of the node diagram. Yellow nodes represent inputs, green nodes are the hidden layers, and red nodes are outputs.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/02%20-%20Neuron.png" alt="Neuron"></div>



<br></br>
<br></br>

# Feature Scaling

<br></br>
Feature Scaling (Standardize or Normalize) is applied to input variables. This makes it easy for Neural Nets to process data by bringing input values close to each other, read 'Efficient Back Propagation.pdf' in the research papers section.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/02_1%20-%20Standardized%20Equation.png" alt="Standardize"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/02_2%20-%20Normalized%20Equation.png" alt="Normalize"></div>


<br></br>
<br></br>

# Activation Function

<br></br>
Here is a list of some Neural Network Activation Functions. Read 'Deep sparse rectifier neural networks.pdf' in the research papers section.


<br></br>
1. Threshold Function - Rigid binary style function
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/03%20-%20Threshold.png" width="400" alt="Threshold"></div>

2. Sigmoid Function - Smooth, good for output Layers that predict probability
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/04%20-%20Sigmoid.png" width="400" alt="Sigmoid"></div>

3. Rectifier Function - Gradually increases as input Value increases
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/05%20-%20Rectifier.png" width="400" alt="Rectifier"></div>

4. Hyperbolic Tangent Function - Similar to Sigmoid Function but values can go below zero
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/06%20-%20Tanh.png" width="400" alt="Tanh"></div>


<br></br>
Different layers of a Neural Net can use different Activation Functions.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/07%20-%20NN%20Activation%20Example.png" width="600" alt="Activation"></div>


<br></br>
<br></br>

# Cost Function

<br></br>
The Cost Function is a plot of the differences between the target and the network's output, which we try to minimize through weight adjustments (Backpropagation) in epochs (one training cycle on the Training Set). Once input information is fed through the network and a y_hat output estimate is found (Forward-propagation), we take the error and go back through the network and adjust the weights (Backpropagation Algorithm). The most common cost function is the Quadratic (Root Mean Square) cost:


<br></br>
$$
Cost = \frac{(\hat y - y)^2}{2} = \frac{(Wighted Estimate - Actual)^2}{2} 
$$


<br></br>
Read this [Deep Learning Book](http://neuralnetworksanddeeplearning.com/index.html) and this [List of Cost Functions Uses](https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications?).


<br></br>
<br></br>

# Batch Gradient Descent

<br></br>
This is a Cost minimization technique that looks for downhill slopes and works on Convex Cost Functions. The function can have any number of dimensions, but we are only able to visualize up to three dimensions.


### 1D Gradient Descent
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/09%20-%20Gradient%20Descent%201D.png" width="600" alt="Gradient-Descent-1D"></div>


### 2D Gradient Descent
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/10%20-%20Gradient%20Descent%202D.png" width="300"  alt="Gradient-Descent-2D"></div>


### 3D Gradient Descent
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/11%20-%20Gradient%20Descent%203D.png" width="600" alt="Gradient-Descent-3D"></div>



<br></br>
<br></br>

# Reinforcement Learning (Stochastic Gradient Descent)

<br></br>
This method is faster & more accurate than Batch Gradient Descent.


<br></br>
In order to avoid the Local Minimum trap, we can take more sporadic steps in random directions to increase the likelihood of finding the Global Minimum. We can achieve this by adjusting weights one row at a time (Stochastic Gradient Descent) instead of all-at-once (Batch Gradient Descent). Read 'Neural Network in 13 lines of Python.pdf' in the research papers section.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/12%20-%20Local%20Min%20Trap.png" alt="Local-Minimum"></div>


<br></br>
These are the steps for Stochastic Gradient Descent:
1. Initialize weights to small numbers close to 0 (but NOT 0)
2. Input first row of Observation Data into input layer
3. Forward-propagate: Apply weights to inputs to get predicted result 'y_hat'
4. Compute Error = 'y_hat' - 'y_actual'
5. Back-propagate: Update weights according to the Learning Rate and how much they're responsible for the Error.
6. Repeat steps 1-5 after each observation (Reinforcement Learning), or after each batch (Batch Gradient Descent)
7. Epoch is the Training Set passing through the Artificial Neural Network, more Epochs yield improved results.


<br></br>
<br></br>

# Evaluating the Artificial Neural Network

<br></br>
Be careful when measuring the accuracy of a model. Bias and Variance can differ every time the model is evaluated. To solve this problem, we can use K-Fold Cross Validation which splits the data into multiple segments and averages overall accuracy.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/13%20-%20Bias-Variance%20Tradeoff.png" width="400" alt="Bias-Variance"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Jupyter%20Notebook/Images/14%20-%20K-Fold%20Cross%20Validation.png" width="400" alt="Cross-Validation"></div>



<br></br>
<br></br>

# Overfitting

<br></br>
Overfitting is when your model is over-trained on the Training Set and isn't generalized enough. This reduces performance on Test Set predictions.


<br></br>
Indicators of overfitting:

1. Training and Test Accuracies have a large difference
2. Observing High Accuracy Variance when applying K-Fold Cross Validation


<br></br>
Solve overfitting with "Dropout Regularization", this randomly disables Neurons through iterations so they don't grow too dependent on each other. This helps the Neural Network learns several independent correlations from the data.


<br></br>
<br></br>

# Sample Problem - Bank Database Prediction

<br></br>
Let's test our knowledge of Artificial Neural Networks by solving a real world problem. Take a look at 'Bank_Customer_Data.csv' in the Data folder of this repository. This technique can be applied to any or any customer oriented business data set.


<br></br>
### Problem Description:

A Bank (or any business) is trying to improve customer retention. The Bank engineers have put together a table of data about their customers (Name, Age, Location, Income, etc). They also have data on whether customers left the Bank or stayed with them (last column of data).


<br></br>
The Bank is trying to build a Machine Learning model that predicts the likelihood of a customer leaving before it actually happens so they can work on improving customer satisfaction.


<br></br>
<br></br>

### Code

<br></br>
You can run the code online with [Google Colab](https://colab.research.google.com/drive/1fkkPPombnFH7_A8dlOkia2P0SZWMVt7o) which is a web based Jupyter Notebook environment and doesn't require installations. 


<br></br>
The better alternative is to download the code and run it with 'Jupyter Notebook' or copy the code into the 'Spyder' IDE found in the [Anaconda Distribution](https://www.anaconda.com/download/). 'Spyder' is similar to MATLAB, it allows you to step through the code and examine the 'Variable Explorer' to see exactly how the data is parsed and analyzed. Jupyter Notebook also offers a [Jupyter Variable Explorer Extension](http://volderette.de/jupyter-notebook-variable-explorer/) which is quite useful for keeping track of variables.


<br></br>
```shell
$ git clone https://github.com/AMoazeni/Machine-Learning-Database-Prediction.git
$ cd Machine-Learning-Database-Prediction
```

<br></br>
<br></br>
<br></br>
<br></br>

In [1]:
# Artificial Neural Network
# Part 1 - Import Data and Extract input 'x' and output 'y'

# Pip or Conda Install these libraries in the Terminal
# Install Theano (U. Montreal, GPU or CPU parallel Float Point computation)
# Install Tensorflow (Google, same as above)
# Install Keras (Combines the above 2 libraries with a high-level API)

# Numpy is a high speed Math computation library
import numpy as np

# Matplotlib is used for plotting graphs
import matplotlib.pyplot as plt

# Pandas is a database analysis tool
import pandas as pd

# LabelEncoder is used to encode binary categorical data into numbers (male/female -> 0/1)
# OneHotEncoder is used to encode categorical data with more than 2 possible options (France, Spain, Germany)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# Split the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split

# Feature Scaling eases computation by standardizing input data
from sklearn.preprocessing import StandardScaler


# Your CSV data URL link goes here
data_url = 'https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Data/Bank_Customer_Data.csv'

# Importing the dataset
dataset = pd.read_csv(data_url)

# Display data to make sure it has been imported
display(dataset)

# Extract input Independent Variable (Matrix of Features and Observations)
# Row number, customer ID, and name are not useful, so they're excluded
x = dataset.iloc[:, 3:13]

# Extract output Dependant Variables. The last column shows whether a customer left of stayed with the bank
y = dataset.iloc[:, 13]

print('X input values:')
display(x)

print('Y output values:')
display(y)

# Use the '.values' method to convert data from Pandas Dataframes to NumPy arrays
x = x.values
y = y.values


Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7,0.00,2,1,1,10062.80,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4,142051.07,2,0,1,74940.50,0
9,10,15592389,H?,684,France,Male,27,2,134603.88,1,1,1,71725.73,0


X input values:


Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,619,France,Female,42,2,0.00,1,1,1,101348.88
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58
2,502,France,Female,42,8,159660.80,3,1,0,113931.57
3,699,France,Female,39,1,0.00,2,0,0,93826.63
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.10
5,645,Spain,Male,44,8,113755.78,2,1,0,149756.71
6,822,France,Male,50,7,0.00,2,1,1,10062.80
7,376,Germany,Female,29,4,115046.74,4,1,0,119346.88
8,501,France,Male,44,4,142051.07,2,0,1,74940.50
9,684,France,Male,27,2,134603.88,1,1,1,71725.73


Y output values:


0       1
1       0
2       1
3       0
4       0
5       1
6       0
7       1
8       0
9       0
10      0
11      0
12      0
13      0
14      0
15      0
16      1
17      0
18      0
19      0
20      0
21      0
22      1
23      0
24      0
25      0
26      0
27      0
28      0
29      0
       ..
9970    0
9971    0
9972    0
9973    0
9974    0
9975    1
9976    0
9977    0
9978    0
9979    0
9980    0
9981    1
9982    1
9983    0
9984    0
9985    0
9986    0
9987    0
9988    0
9989    0
9990    0
9991    1
9992    0
9993    0
9994    0
9995    0
9996    0
9997    1
9998    1
9999    0
Name: Exited, Length: 10000, dtype: int64

In [2]:
# Artificial Neural Network
# Part 2 - Data pre-processing

# In the Bank example: Convert France/Germany/Spain into 0/1/2
labelencoder_x_1 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])

# In the Bank example: Convert Female/Male into 0/1
labelencoder_x_2 = LabelEncoder()
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])

# Country data is categorical but not ordinal (order doesn't matter), create dummy variables with OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features = [1])

# Make all Dependent X objects have the same type (Float 64)
x = onehotencoder.fit_transform(x).toarray()

# Remove first column to avoid Dummy Variable trap
# Two columns of binary data is enough to describe 3 categories (France, Spain, Germany)
x = x[:, 1:]


# Create Training and Test sets and apply Feature Scaling (standardize)

# Encode the Dependent Variable
# In Bank example we don't need to encode Dependent variables because it's already Binary
# If there are more outputs to examine, uncomment the following lines of code
#labelencoder_y = LabelEncoder()
#y = labelencoder_y.fit_transform(y)

# Test_size = 0.2 means 80% of data for training, 20% test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

# Feature Scaling
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)


In [3]:
# Artificial Neural Network
# Part 3 - Artificial Neural Network Architecture

# Importing the Keras library
import keras

# Sequential is used to initialize NN
from keras.models import Sequential

# Dense is used to build Deep layers
from keras.layers import Dense

# Dropout  is used to prevent overfitting, by using Dropout Regularization
from keras.layers import Dropout

# Initialising the ANN Sequentially (can also initialize as Graph)
# We use Sequential Classifier because we have successive layers
classifier = Sequential()

# Adding the input layer and the first hidden layer
# This step initializes the Wights to small random numbers
# 'Units' is the number of hidden layers (begin with average of Input & Output layers = 11+1/2 = 6)
# 'Kernel_initializer': Initialize weights as small random numbers
# 'Input_dim': number Independent Variables
# 'Activation': Rectifier Activation Function ('relu') for Hidden Layers, Sigmoid Function for Output Layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

# Add Dropout Regularization to first layer to prevent overfitting
# 'p': Fraction of Neurons to drop. Start with 0.1 (10% dropped) and increment by 0.1 until overfitting is solved, don't go over 0.5
classifier.add(Dropout(rate = 0.1))

# Add the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dropout(rate = 0.1))

# Add the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.add(Dropout(rate = 0.1))

# Compile the ANN
# 'optimizer': Algorithm used to find the best Weights. 'adam' is a popular Stochastic Gradient Descent Algorithm
# 'loss' = 'binary_crossentropy' is useful for Binary Outputs with logarithmic functions
# 'loss' = 'categorical_crossentropy' is useful for 3+ categorical Outputs
# 'metrics' =  Used to evaluate the ANN, requires list. We use 1 metric called 'accuracy'  
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
# Experiment to find best 'batch_size' and 'epochs'
classifier.fit(x_train, y_train, batch_size = 10, epochs = 10)


Using TensorFlow backend.


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1a33800be0>

In [7]:
# Artificial Neural Network
# Part 4 - Making predictions and evaluating the model

# Predicting the Test set results
# This gives a vector of probablities of Customers leaving the bank
# You can rank the probabilities of customers most likely to leave the bank
y_pred = classifier.predict(x_test)


# Choose a threshold of which customers leave or stay (use 50% as a starting threshold)
# This line converts probabilities into True/False
y_pred = (y_pred > 0.5)


# Predicting a single new observation
# Predict if the customer with the following informations will leave the bank:
# Geography: France
# Credit Score: 600
# Gender: Male
# Age: 40
# Tenure: 3
# Balance: 60000
# Number of Products: 2
# Has Credit Card: Yes
# Is Active Member: Yes
# Estimated Salary: 50000
# sc.transform Feature Scales the new prediction so the model will understand it
# Set 1 element as a float64 to set all to float64

# Change this sample input to test a new prediction
sample_input = [0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]

new_prediction = classifier.predict(sc.transform(np.array([sample_input])))
new_prediction = (new_prediction > 0.5)


# Making the Confusion Matrix
# Tells you the number of correct vs. incorrect observations
# In the Confusion Matrix we get [1,1] + [2,2] Correct Predictions
# In the Confusion Matrix we get [1,2] + [2,1] Incorrect Predictions
# Compute accuracy = correct predictions / total predictions
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Measure accuracy percentage of the Training Set
accuracy = (cm[0,0] + cm[1,1])/2000*100

print('This model was trained with an accuracy of', accuracy, '%\n')

print('For a new sample input:\n', sample_input, '\n')

print('Prediction - Will this customer leave the Bank?\n', 'Result = ', new_prediction, '\n')


This model was trained with an accuracy of 83.0 %

For a new sample input:
 [0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000] 

Prediction - Will this customer leave the Bank?
 Result =  [[False]] 



In [8]:
# Artificial Neural Network
# Part 5 - Improve and Tune Hyper-Parameters for the ANN (takes a long time to compute)

# This method is the most robust architecture, allows you to find the best hyper-parameters and accuracies

# Dropout Regularization to reduce overfitting if needed
# GridSearch tries several Tuning Hyper Parameters to find the best ones

# K-Fold Cross Validation breaks up the data into 'K' chunks
# It then trains 'K' times, choosing a different chunk every time, this improves accuracy


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense

# Import data and extract input x and output y
data_url = 'https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Database-Prediction/master/Data/Bank_Customer_Data.csv'
dataset = pd.read_csv(data_url)
x = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Data pre-processing
labelencoder_x_1 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])

labelencoder_x_2 = LabelEncoder()
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])

onehotencoder = OneHotEncoder(categorical_features = [1])
x = onehotencoder.fit_transform(x).toarray()
x = x[:, 1:]

# Prepare training and test sets, also apply feature scaling
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)


# This function has an input (Optimizer) so we can try different ones
# 'Adam' and 'rmsprop' (also good for RNN) are good optimizers for stochastic gradient descent
def build_classifier(optimizer):
    classifier = Sequential()
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier


# Build Neural Network Classifier with K-Fold Cross Validation training, tune Hyper-Parameters here
# Try 'epochs': [100, 500] for major improvements to accuracy
# Try 'cv = 10' for increased K-Validation segmentation
classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'batch_size': [25, 32],
              'epochs': [5, 10],
              'optimizer': ['adam', 'rmsprop']}

grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 2)


# Fit Model to data using grid_search to try various Hyper Parameter
grid_search = grid_search.fit(x_train, y_train)
# Output best parameters
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_


print('\n\n\nThis model was trained with an accuracy of %.2f%%\n' % (best_accuracy*100))

print('\nThe best hyper-parameters for this model:\n', best_parameters, '\n')

print('\nUse the above Hyper-Parameters to retrain your model and make improved predictions.')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
E