<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Artificial-Neural-Network" data-toc-modified-id="Artificial-Neural-Network-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Artificial Neural Network</a></span></li><li><span><a href="#Loss-function" data-toc-modified-id="Loss-function-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Loss function</a></span></li><li><span><a href="#Optimizer" data-toc-modified-id="Optimizer-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Optimizer</a></span></li><li><span><a href="#Weight-Regularization" data-toc-modified-id="Weight-Regularization-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Weight Regularization</a></span></li><li><span><a href="#Dropout" data-toc-modified-id="Dropout-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Dropout</a></span></li><li><span><a href="#Training-a-neural-network" data-toc-modified-id="Training-a-neural-network-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Training a neural network</a></span><ul class="toc-item"><li><span><a href="#Step1:-import-the-data" data-toc-modified-id="Step1:-import-the-data-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Step1: import the data</a></span></li><li><span><a href="#Step2:-Transform-the-data" data-toc-modified-id="Step2:-Transform-the-data-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Step2: Transform the data</a></span></li><li><span><a href="#Step3:-Construct-the-tensor" data-toc-modified-id="Step3:-Construct-the-tensor-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Step3: Construct the tensor</a></span></li><li><span><a href="#Step4:-Model-creation" data-toc-modified-id="Step4:-Model-creation-6.4"><span class="toc-item-num">6.4&nbsp;&nbsp;</span>Step4: Model creation</a></span></li><li><span><a href="#Step5:-Train-and-evaluate-the-model" data-toc-modified-id="Step5:-Train-and-evaluate-the-model-6.5"><span class="toc-item-num">6.5&nbsp;&nbsp;</span>Step5: Train and evaluate the model</a></span></li><li><span><a href="#Step6:-Improve-the-model" data-toc-modified-id="Step6:-Improve-the-model-6.6"><span class="toc-item-num">6.6&nbsp;&nbsp;</span>Step6: Improve the model</a></span></li></ul></li></ul></div>

# Artificial Neural Network

<img src='artifacts/ann.png'/>

Here Features are the input and labels are the ouput.<br>
Here ANN is composed of 4 main components:
- Layers: all the learning occurs in the layers. There are 3 layers 1) Input 2) Hidden and 3) Output
- feature and label: Input data to the network(features) and output from the network (labels)
- loss function: Metric used to estimate the performance of the learning phase
- optimizer: Improve the learning by updating the knowledge in the network

Each layer consists of neurons or nodes. Each neuron has 2 parts:
- input
- activation function

# Loss function
After you have defined the hidden layers and the activation function, you need to specify the loss function and the optimizer.

For binary classification, it is common practice to use a binary cross entropy loss function. In the linear regression, you use the mean square error.

The loss function is an important metric to estimate the performance of the optimizer. During the training, this metric will be minimized. You need to select this quantity carefully depending on the type of problem you are dealing with.

# Optimizer
The loss function is a measure of the model's performance. The optimizer will help improve the weights of the network in order to decrease the loss. There are different optimizers available, but the most common one is the Stochastic Gradient Descent.

The conventional optimizers are:
- Momentum optimization,
- Nesterov Accelerated Gradient,
- AdaGrad,
- Adam optimization

# Weight Regularization
A standard technique to prevent overfitting is to add constraints to the weights of the network. The constraint forces the size of the network to take only small values. The constraint is added to the loss function of the error. There are two kinds of regularization:

L1: Lasso: Cost is proportional to the absolute value of the weight coefficients

L2: Ridge: Cost is proportional to the square of the value of the weight coefficients

# Dropout
Dropout is an odd but useful technique. A network with dropout means that some weights will be randomly set to zero. Imagine you have an array of weights [0.1, 1.7, 0.7, -0.9]. If the neural network has a dropout, it will become [0.1, 0, 0, -0.9] with randomly distributed 0. The parameter that controls the dropout is the dropout rate. The rate defines how many weights to be set to zeroes. Having a rate between 0.2 and 0.5 is common.

# Training a neural network

## Step1: import the data

The MNIST dataset is the commonly used dataset to test new techniques or algorithms. This dataset is a collection of 28x28 pixel image with a handwritten digit from 0 to 9. Currently, the lowest error on the test is 0.27 percent with a committee of 7 convolutional neural networks.

In [1]:
import numpy as np
import tensorflow.compat.v1 as tf
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
tf.disable_eager_execution()
np.random.seed(5)

In [2]:
mnist = fetch_openml('mnist_784', data_home='data/mnist')
print(mnist.data.shape)
print(mnist.target.shape)

(70000, 784)
(70000,)


In [3]:
X_train, X_test, Y_train, Y_test = train_test_split(mnist.data, mnist.target, test_size = 0.2, random_state =42)
print(X_train.shape)  # 56000 examples of 28x28=784 pixel data
print(X_test.shape)   # 14000 examples of 28*28=784 pixel data
Y_train = Y_train.astype(int)
Y_test = Y_test.astype(int)
print(Y_train.shape)
print(Y_test.shape)

(56000, 784)
(14000, 784)
(56000,)
(14000,)


## Step2: Transform the data

we will use min max transformation<br>
$\frac{X- X_{min}}{X_{max} - X_{min}}$

In [4]:
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float64))
X_test_scaled = scaler.fit_transform(X_test.astype(np.float64))

## Step3: Construct the tensor

In [8]:
feature_column = [tf.feature_column.numeric_column('X', shape=X_train_scaled.shape[1])]

## Step4: Model creation

In [10]:
model = tf.estimator.DNNClassifier(feature_columns=feature_column,
                                  hidden_units= [300, 100],
                                  n_classes=10,
                                  model_dir='logs/11_ann')

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'logs/11_ann', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9754b5df90>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


## Step5: Train and evaluate the model

In [14]:
def get_input_fn(x_ndarray, y_nd_array, num_epochs = None, batch_size = 128, shufle=False):
    return tf.estimator.inputs.numpy_input_fn(x = {'X': x_ndarray},
                                             y = y_nd_array,
                                             batch_size = batch_size,
                                             num_epochs=num_epochs,
                                             shuffle=shufle)

In [15]:
!rm -rf logs/11_ann
model.train(input_fn=get_input_fn(X_train_scaled, Y_train, None, 128, False), steps=1000)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Instructions for updating:
Use `tf.cast` instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into logs/11_ann/model.ckpt.
INFO:tensorflow:loss = 302.03247, step = 0
INFO:tensorflow:global_step/sec: 138.782
INFO:tensorflow:loss = 36.20991, step = 100 (0.724 sec)
I

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f975440e610>

In [17]:
model.evaluate(input_fn=get_input_fn(X_test_scaled, Y_test, 1, 128, False), steps=1000)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-02-19T20:51:24Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from logs/11_ann/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [100/1000]
INFO:tensorflow:Finished evaluation at 2020-02-19-20:51:25
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.9694286, average_loss = 0.10564723, global_step = 1000, loss = 13.446011
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: logs/11_ann/model.ckpt-1000


{'accuracy': 0.9694286,
 'average_loss': 0.10564723,
 'loss': 13.446011,
 'global_step': 1000}

## Step6: Improve the model

We will use regularization to improve the model.<br>
We will use an Adam optimizer with a dropout rate of 0.3, L1 of X and L2 of y. In TensorFlow, you can control the optimizer using the object train following by the name of the optimizer. TensorFlow is a built-in API for Proximal AdaGrad optimizer.

To add regularization to the deep neural network, you can use tf.train.ProximalAdagradOptimizer with the following parameter

- Learning rate: learning_rate
- L1 regularization: l1_regularization_strength
- L2 regularization: l2_regularization_strength

In [19]:
model_improved = tf.estimator.DNNClassifier(feature_columns=feature_column,
                                           hidden_units=[300,100],
                                           dropout=0.3,
                                           n_classes=10,
                                           optimizer= tf.train.ProximalAdagradOptimizer(learning_rate=0.01,
                                                                                       l1_regularization_strength=0.01,
                                                                                       l2_regularization_strength=0.01),
                                           model_dir='logs/11_ann_improved')

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'logs/11_ann_improved', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f972c2bf990>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [20]:
!rm -rf logs/11_ann_improved
model_improved.train(input_fn=get_input_fn(X_train_scaled, Y_train, None, 128, False), steps=1000)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into logs/11_ann_improved/model.ckpt.
INFO:tensorflow:loss = 303.69342, step = 0
INFO:tensorflow:global_step/sec: 200.584
INFO:tensorflow:loss = 46.474987, step = 100 (0.499 sec)
INFO:tensorflow:global_step/sec: 261.891
INFO:tensorflow:loss = 15.754684, step = 200 (0.382 sec)
INFO:tensorflow:global_step/sec: 230.001
INFO:tensorflow:loss = 20.9094, step = 300 (0.436 sec)
INFO:tensorflow:global_step/sec: 235.361
INFO:te

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f972c2b7710>

In [21]:
model_improved.evaluate(input_fn=get_input_fn(X_test_scaled, Y_test, 1, 128, False), steps=1000)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-02-19T21:57:25Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from logs/11_ann_improved/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [100/1000]
INFO:tensorflow:Finished evaluation at 2020-02-19-21:57:27
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.9582857, average_loss = 0.14352892, global_step = 1000, loss = 18.267319
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: logs/11_ann_improved/model.ckpt-1000


{'accuracy': 0.9582857,
 'average_loss': 0.14352892,
 'loss': 18.267319,
 'global_step': 1000}