Details about the definitions below:
------
 - **`keras.models.Sequential`**: A sequence of layers
 - **`keras.layers.Flatten`**: Flatten a ndarry into a 1-dimensional array
 - **`keras.layers.Dense`**: A dense neural network layer
 - **`model.evaluate(test_data)`** vs **`model.predict(test_data)`**: Predict will output the prediction (e.g. the final output estimate of *y* (e.g. the output is the vector *y*), whereas evaluate will calculate the loss function and any metrics (the output is a vector of loss function + these metrics).

**Optimizer**:
 - `'adam'`: 
 - `'sgd'`: Stochastic Gradient Descent

**Loss**: The loss function to minimize. Options are:
 - `'mean_squared_error'` (like it sounds)

**Metrics**: These are quantities that are calculated alongside the evaluation, for your convenience. You can add as many of these as you like. Examples:
 - Probabilistic metrics:
   - XXXCrossEntropy: 
   - KLDivergence:
   - Poisson:
 - Regression metrics:
   - MeanSquaredError (and variations)
   - MeanSquaredLogarithmicError
   
**Callbacks**: During model fitting, a callback is run after each epoch to e.g. stop the training once the descired accuracy is reached.


Front Matter
========

This is a quick notebook to explore the outputs of the Keras model.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

import pandas as pd
import seaborn as sns

%load_ext tensorboard
%matplotlib inline


Make an example dataset
=========

In [None]:
nPoints = 1000
x_sig = 1 * np.random.randn(nPoints) + 1
y_sig = 1 * np.random.randn(nPoints) + 1
x_bkg = 1 * np.random.randn(nPoints*10) - 1
y_bkg = 1 * np.random.randn(nPoints*10) - 1
label = np.append(np.ones(len(x_sig)),np.zeros(len(x_bkg))).astype(int)

In [None]:
df = pd.DataFrame(list(zip(np.append(x_sig,x_bkg),
                           np.append(y_sig,y_bkg),
                           label)),columns=['x','y','label'])

In [None]:
sns.pairplot(df,vars=['x', 'y'],hue='label');

Plot correlations of variables
----------

In [None]:
tmp = sns.heatmap(df[df['label'] == 1][['x','y']].corr(),annot=True,vmin=-1,vmax=1)
b, t = plt.ylim() # discover the values for bottom and top
plt.ylim(b + 0.5, t - 0.5) # update the ylim(bottom, top) values
plt.show()

Make the Keras Model, compile and fit
-------------

In [None]:
model = tf.keras.models.Sequential([
#    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(5, name='first_dense', input_shape=(2,), activation='relu'),
    tf.keras.layers.Dense(5, name='middle_dense', activation='relu'),
#    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, name='last_dense', activation='sigmoid') # sigmoid for binary; softmax for multi?
    ])

##### TODO:
#probability_model = tf.keras.Sequential([
#  model,
#  tf.keras.layers.Softmax()
#])

In [None]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
model.fit(
    df[['x','y']].values,
    df['label'].values,
    batch_size=64,
    epochs=20,
    verbose=2)

Visualize the Model Structure
----------

In [None]:
#import pydot as pyd
#import graphviz
#tf.keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True,
#    rankdir='TB', expand_nested=True, dpi=96)
#model.summary()

In [None]:
# version 1
# from IPython.display import SVG
# SVG(tf.keras.utils.model_to_dot(model,expand_nested=True).create(prog='dot', format='svg'))
# version 2
#tf.keras.utils.plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True,expand_nested=True)

Evaluate the performance
===========

In [None]:
print(model.metrics_names)
model.evaluate(df[['x','y']].values,df['label'].values,verbose=2)

In [None]:
# This sets things as 1 or 0
# based on whether or not is is above or below 0.5
df['binary_prediction'] = model.predict_classes(df[['x','y']].values)
# This is the continuous variable
df['discriminant'] = model.predict(df[['x','y']].values)

Confusion Matrix
-------

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion_matrix(df['binary_prediction'].values,df['label'].values)

Plotting the raw discriminants
--------------

In [None]:
plt.hist(df[df['label'] ==0]['discriminant'],bins=20,label='bkg',density=True,histtype='step',lw=2)
plt.hist(df[df['label'] ==1]['discriminant'],bins=20,label='sig',density=True,histtype='step',lw=2)
plt.legend()
plt.yscale('log')

ROC Curve
----------

In [None]:
from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(df['label'].values, df['discriminant'].values)
from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

In [None]:
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Keras (area = {:.3f})'.format(auc_keras))
plt.ylabel('signal efficiency (true positive rate)')
plt.xlabel('bkg efficiency (false positive rate)')

2-D Gradients
----------

In [None]:
#Since probability is a continuous value from 0 to 1, we are getting many contours.
#If your visualization is restricted to 2 classes (output is 2D softmax vector) you can use this simple code

def plotModelOut(x,y,model):
    '''
    x,y: 2D MeshGrid input
    model: Keras Model API Object
    '''
    grid = np.stack((x,y))
    grid = grid.T.reshape(-1,2)
    outs = model.predict(grid)
    y1 = outs.T[0].reshape(x.shape[0],x.shape[0])
    plt.contourf(x,y,y1,cmap='binary')
    plt.colorbar()

    #plt.show()

xx, yy = np.meshgrid(np.linspace(-4,4,100), np.linspace(-4,4,100))
plotModelOut(xx,yy,model)

plt.scatter(df['x'][df['label'] == 0],df['y'][df['label'] == 0])
plt.scatter(df['x'][df['label'] == 1],df['y'][df['label'] == 1])

plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()

The Hello World of Neural Networks
--------

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])

In [None]:
model.compile(optimizer='sgd',loss='mean_squared_error')

In [None]:
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0],dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0],dtype=float)

In [None]:
model.fit(xs, ys, epochs=500)

In [None]:
print(model.predict([10.0,11.0]))

MNIST Dataset
-------

In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

In [None]:
print(train_images.shape,test_images.shape)

Normalize the data:
-----

In [None]:
train_images = train_images / 255.0
test_images  = test_images / 255.0

In [None]:
plt.imshow(train_images[0])
#print(train_images[0])
#print(train_labels[0])

In [None]:
model = keras.Sequential([keras.layers.Flatten(), # Optional: specifying shape, e.g. input_shape=(28,28)
                          keras.layers.Dense(512,activation=tf.nn.relu), # Try 128, 1024, ...
                          keras.layers.Dense(256,activation=tf.nn.relu), # Try 128, 1024, ...
                          keras.layers.Dense(10,activation=tf.nn.softmax)]) # 10 should match the output type

In [None]:
model.compile(optimizer= tf.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Add a callback to our fitting step in order to sto 
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy')>0.6):
      print("\nReached 60% accuracy so cancelling training!")
      self.model.stop_training = True

callback = myCallback()

In [None]:
model.fit(train_images,train_labels,epochs=5,callbacks=[callback])

In [None]:
model.evaluate(test_images,test_labels)

In [None]:
# These predictions are the scores for each of the 10 categories, for the first test image.
# As you can see, "9" is the winner.
classifications = model.predict(test_images)
print(classifications[0])

Handwriting MNIST
---------

In [None]:
# GRADED FUNCTION: train_mnist
def train_mnist():
    # Please write your code only where you are indicated.
    # please do not remove # model fitting inline comments.

    # YOUR CODE SHOULD START HERE
    class myCallback(tf.keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>=0.99):
          print("\nReached 99% accuracy so cancelling training!")
          self.model.stop_training = True

    callback = myCallback()
    # YOUR CODE SHOULD END HERE

    mnist = tf.keras.datasets.mnist

    (x_train, y_train),(x_test, y_test) = mnist.load_data(path='mnist.npz')
    # YOUR CODE SHOULD START HERE
    x_train = x_train / 255.0 # normalize the data to between 0 and 1
    x_test  = x_test / 255.0 # normalize the data to between 0 and 1
    # YOUR CODE SHOULD END HERE
    model = tf.keras.models.Sequential([
        # YOUR CODE SHOULD START HERE
        keras.layers.Flatten(),
        keras.layers.Dense(128,activation=tf.nn.relu),
        keras.layers.Dense(10,activation=tf.nn.softmax)
        # YOUR CODE SHOULD END HERE
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    # model fitting
    history = model.fit(# YOUR CODE SHOULD START HERE
        x_train,y_train,epochs=11,callbacks=[callback]
              # YOUR CODE SHOULD END HERE
    )
    # model fitting
    return history.epoch, history.history['accuracy'][-1]

train_mnist()

In [None]:
(x_train, y_train),(x_test, y_test) = mnist.load_data(path='mnist.npz')

In [None]:
print(y_train[0])
plt.imshow(x_train[0])

Convolutional Neural Networks
----------

**Note that the Coursera people wanted you to run this using GPUs in Colab. If you run it locally, your computer's fan starts running.**

 - The `CONV2D` layer runs a bunch of filters (the number of filters is specified by the first argument) on top of the 
   - Note that specifying `'relu'` will toss out negative numbers. Interesting.
 - The `MaxPool2D` layer finds the maximum value in a window.
 
An interesting idea I heard was to **start with not so many filters in the first Conv2D layer**, and then as you continue to pool (and therefore reduce the size of the image) you should **add more filters to learn in later Conv2D layers.**. Therefore, the number of parameters does not explode, and you have more opportunities for discovering interesting features in the final convolutional layers.

In [None]:
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0

test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

model.fit(training_images, training_labels, epochs=5)
test_loss = model.evaluate(test_images, test_labels)

In [None]:
f, axarr = plt.subplots(3,4,figsize=(12,8))

FIRST_IMAGE=0
SECOND_IMAGE=7
THIRD_IMAGE=26
CONVOLUTION_NUMBER = 3

from tensorflow.keras import models

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)

for x in range(0,4):
    f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[0,x].grid(False)
    f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[1,x].grid(False)
    f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[2,x].grid(False)

Assignment: Convolutional Neural Networks and (handwriting) MNIST
----------

Very similar to what you did above. Some interesting code snippet though:

```python
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
```

Something to look into.... interesting.