#### This is an exercise notebook for myslf to learn CNN.
#### The material I used of this exercise is listed below:
    https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
    https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
    https://machinelearningmastery.com/train-neural-networks-with-noise-to-reduce-overfitting/
    https://machinelearningmastery.com/introduction-to-weight-constraints-to-reduce-generalization-error-in-deep-learning/
    https://machinelearningmastery.com/weight-regularization-to-reduce-overfitting-of-deep-learning-models/
    https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
    https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/
    Hands on ML chapter 11fgh

# 1. Generalization, Underfitting, Overfitting
   
   ## 1.1 Generalization
    The central challenge in machine learning is that we must perform well on new, previously unseen inputs — not just those on which our model was trained. The ability to perform well on previously unobserved inputs is called generalization.
    We use methods like a train/test split or k-fold cross-validation only to estimate the ability of the model to generalize to new data.
   
   ## 1.2 Underfitting & Overfitting
    Underfit Model: A model with too little capacity cannot learn the problem. An underfit model has high bias and low variance.
    Overfit Model: A model with too much capacity can learn it too well and overfit the training dataset. An overfit model has low bias and high variance.The model learns the training data too well and performance varies widely with new unseen examples or even statistical noise added to examples in the training dataset.
    Both cases result in a model that does not generalize well. A model that suitably learns the training dataset and generalizes well to the old out dataset.
<img src="overfitting.png" alt="Drawing" style="width: 400px;"/> 

   ## 1.3 Address
    Address Underfit: Increase capacity, the ability of a model to fit variety of functions, adding more layers.
    Address Underfit: Train the network on more examples OR constrain the complexity of the network. 
                                                            -by changing the network structure (number of weights)
                                                            -by changing the network parameters (values of weights)
    Structure: Grid search to find suitable nodes or layers / Remove directly
    Patameter: More common. Small parameters suggest a less complex and, in turn, more stable model that is less sensitive to statistical fluctuations in the input data. Large weighs tend to cause sharp transitions in the [activation] functions and thus large changes in output for small changes in inputs.

# 2. Regularization
    Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error. Regularization is one of the central concerns of the field of machine learning, rivaled in its importance only by optimization.
    Modern CNN: use early stopping and dropout, in addition to a weight constraint.
    Modern RNN: use early stopping with a backpropagation-through-time-aware version of dropout and a weight constraint.

## 2.1 Weight Regularization (weight decay):
    Penalize the model during training based on the magnitude of the weights.
   

## 2.2 Activity Regularization: 
    Penalize the model during training base on the magnitude of the activations.


## 2.3 Weight Constraint: 
    Constrain the magnitude of weights to be within a range or below a limit.


## 2.4 Dropout: 
    Probabilistically remove inputs during training.

### 2.4.1 Keras
    https://keras.io/layers/core/
    https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/
    keras.layers.Dropout(rate, noise_shape=None, seed=None)
    *rate: float between 0 and 1. Fraction of the input units to drop.
    
    keras.layers.SpatialDropout1D(rate) / keras.layers.SpatialDropout2D(rate)
    This version performs the same function as Dropout, however it drops entire 1D/2D feature maps instead of individual elements.

### 2.4.2 Tensorflow
    https://www.tensorflow.org/api_docs/python/tf/nn/dropout
    https://www.tensorflow.org/tutorials/estimators/cnn
    For each element of x, with probability rate, outputs 0, and otherwise scales up the input by 1 / (1-rate). The scaling is such that the expected sum is unchanged.
    tf.nn.dropout(x, keep_prob=None, noise_shape=None, seed=None, name=None, rate=None)
    

In [1]:
# Classify MINIST using CNN
# Architecture would be 1C, 2P, 3C, 4P, 5D, 5D

# import packages
from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.INFO)

In [2]:
# Define the model, including architectur, predictions, loss, tranining, evaluation
def cnn_model(features, labels, mode):
    # Input layer
    input_layer  = tf.reshape(features['X'], [-1, 28, 28,1]) # -1 here basically means: I do not have time to calculate all the dimensions, so infer the one for me. Because x * 28 * 28 * 1 = 784, so  -1 = 1
    
    # convolutional layer C1
    conv1 = tf.layers.conv2d(
        inputs = input_layer,
        filters = 32,
        kernel_size = [5, 5],
        padding = 'same',
        activation = tf.nn.relu
    )
    
    # pooling layer P2
    pool1 = tf.layers.max_pooling2d(inputs = conv1,
                                    pool_size = [2, 2],
                                    strides = 2
                                   ) # Max pooling layer for 2D inputs (e.g. images). This will be deprecated in future version. Use keras.layers.MaxPooling2D instead
    
    # convolutional layer C3
    conv2 = tf.layers.conv2d(
    inputs = pool1,
    filters = 64,
    kernel_size = [5, 5],
    padding = 'same',
    activation = tf.nn.relu)
    
    # pooling layer P4
    pool2 = tf.layers.max_pooling2d(inputs = conv2,
                                    pool_size = [2, 2],
                                    strides = 2
                                   )
    
    # dense layer
    pool2_flat = tf.reshape(pool2, [-1,7*7*64]) # breakds the spatial structure of the data and transforms tridimensional tensor into a monodimensional tensor(a vector)
    dense = tf.layers.dense(inputs = pool2_flat, 
                            units = 2014, 
                            activation = tf.nn.relu)
    dropout = tf.layers.dropout(inputs = dense, 
                                rate = 0.4, 
                                training=mode == tf.estimator.ModeKeys.TRAIN)
    
    # logits layer
    logits = tf.layers.dense(inputs = dropout, units =10) # 10 classes for MINIST classification
    
    predictions = {
        # generate predictions of the class
        'classes': tf.argmax(input = logits, axis = 1), # Returns the index with the largest value across axes of a tensor
        # print the p of predictions
        'probilities': tf.nn.softmax(logits, name = 'softmax_tensor')
    }
    
    if mode == tf.estimator.ModeKeys.PREDICT: # standard keys are defined: TRAIN, EVAL, PREDICT
        return tf.estimator.EstimatorSpec(mode = mode, 
                                          predictions = predictions)
    
    
    # calculate loss
    loss = tf.losses.sparse_softmax_cross_entropy(labels = labels,
                                                 logits = logits)
    
    # configure the training Op
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimize = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimize.minimize(loss = loss, # training optimization operation, add operations to minimize loss by updating var_list
                                    global_step = tf.train.get_global_step()) # The global step variable
        return tf.estimator.EstimatorSpec(mode = mode, 
                                          loss = loss, 
                                          train_op = train_op)
    
    # add evaluation metrics
    eval_metric_ops = {
        'accuracy': tf.metrics.accuracy(labels = labels,
                                       predictions = predictions['classes'])
    }
    return tf.estimator.EstimatorSpec(mode = mode,
                                     loss = loss,
                                     eval_metric_ops = eval_metric_ops)


In [3]:
# load train and eval data
((train_data, train_labels), (eval_data, eval_labels)) = tf.keras.datasets.mnist.load_data()

# format input data
train_data = train_data/np.float32(225)
train_labels = train_labels.astype(np.int32)

eval_data = eval_data/np.float32(255)
eval_labels = eval_labels.astype(np.int32)

In [4]:
# create estimator
mnist_classifier = tf.estimator.Estimator(model_fn = cnn_model,
                                         model_dir = "/tmp/mnist_convnet_model"
                                         )

I0803 08:32:59.268401 4590163392 estimator.py:1790] Using default config.
I0803 08:32:59.271323 4590163392 estimator.py:209] Using config: {'_model_dir': '/tmp/mnist_convnet_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x6300196a0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [5]:
# set up logging to track progress during training
tensors_to_log = {'probilities':'softmax_tensor'}
logging_hook = tf.train.LoggingTensorHook(
    tensors = tensors_to_log, every_n_iter = 50)

In [6]:
# Train the Model
train_input = tf.estimator.inputs.numpy_input_fn(
    x = {'X':train_data},
    y = train_labels,
    batch_size = 100,
    num_epochs = None,
    shuffle = True # Avoid shuffle at prediction time.
) # Returns input function that would feed dict of numpy arrays into the model.

# train one step and display the probabilities
mnist_classifier.train(input_fn = train_input, 
                       steps = 1, 
                       hooks = [logging_hook])

W0803 08:32:59.307542 4590163392 deprecation.py:323] From /Users/feifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0803 08:32:59.376926 4590163392 deprecation.py:323] From /Users/feifan/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_queue_runner.py:62: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
W0803 08:32:59.378787 4590163392 deprecation.py:323] From /Users/feifan/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feedin

I0803 08:33:00.901278 4590163392 basic_session_run_hooks.py:262] loss = 1.8128418, step = 1002
I0803 08:33:00.902952 4590163392 basic_session_run_hooks.py:606] Saving checkpoints for 1002 into /tmp/mnist_convnet_model/model.ckpt.
W0803 08:33:00.989573 4590163392 deprecation.py:323] From /Users/feifan/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I0803 08:33:01.164240 4590163392 estimator.py:368] Loss for final step: 1.8128418.


<tensorflow_estimator.python.estimator.estimator.Estimator at 0x10a5e48d0>

In [7]:
# train the model longer
# Training CNNs is computationally intensive. 
# To increase the accuracy of your model, increase the number of steps passed to train(), like 20,000 steps.
mnist_classifier.train(input_fn=train_input, steps=1000)

I0803 08:33:01.211869 4590163392 estimator.py:1145] Calling model_fn.
I0803 08:33:01.512956 4590163392 estimator.py:1147] Done calling model_fn.
I0803 08:33:01.516400 4590163392 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I0803 08:33:01.607152 4590163392 monitored_session.py:240] Graph was finalized.
I0803 08:33:01.609923 4590163392 saver.py:1280] Restoring parameters from /tmp/mnist_convnet_model/model.ckpt-1002
I0803 08:33:01.659487 4590163392 session_manager.py:500] Running local_init_op.
I0803 08:33:01.665563 4590163392 session_manager.py:502] Done running local_init_op.
I0803 08:33:01.852787 4590163392 basic_session_run_hooks.py:606] Saving checkpoints for 1002 into /tmp/mnist_convnet_model/model.ckpt.
I0803 08:33:02.188526 4590163392 basic_session_run_hooks.py:262] loss = 1.8116452, step = 1003
I0803 08:33:15.314663 4590163392 basic_session_run_hooks.py:692] global_step/sec: 7.61822
I0803 08:33:15.315727 4590163392 basic_session_run_hooks.py:260] loss = 1.6459488,

<tensorflow_estimator.python.estimator.estimator.Estimator at 0x10a5e48d0>

In [8]:
# evaluate the model, test accuracy on the Mnist test set

eval_input = tf.estimator.inputs.numpy_input_fn(
    x = {'X':eval_data},
    y = eval_labels,
    num_epochs = 1,
    shuffle = False
)

eval_results = mnist_classifier.evaluate(input_fn = eval_input)
print(eval_results)

I0803 08:35:34.846904 4590163392 estimator.py:1145] Calling model_fn.
I0803 08:35:34.966660 4590163392 estimator.py:1147] Done calling model_fn.
I0803 08:35:34.987770 4590163392 evaluation.py:255] Starting evaluation at 2019-08-03T08:35:34Z
I0803 08:35:35.068368 4590163392 monitored_session.py:240] Graph was finalized.
I0803 08:35:35.070278 4590163392 saver.py:1280] Restoring parameters from /tmp/mnist_convnet_model/model.ckpt-2002
I0803 08:35:35.115182 4590163392 session_manager.py:500] Running local_init_op.
I0803 08:35:35.125952 4590163392 session_manager.py:502] Done running local_init_op.
I0803 08:35:38.260951 4590163392 evaluation.py:275] Finished evaluation at 2019-08-03-08:35:38
I0803 08:35:38.261798 4590163392 estimator.py:2039] Saving dict for global step 2002: accuracy = 0.8747, global_step = 2002, loss = 0.5693401
I0803 08:35:38.307658 4590163392 estimator.py:2099] Saving 'checkpoint_path' summary for global step 2002: /tmp/mnist_convnet_model/model.ckpt-2002


{'accuracy': 0.8747, 'loss': 0.5693401, 'global_step': 2002}


## 2.5 Noise: 
    Add statistical noise to inputs during training.


## 2.6 Early Stopping: 
    Monitor model performance on a validation set and stop training when performance degrades.