Details about the definitions below:
========

Gloassary of Terms
----------
**Sample, Batch, and Epoch**: A *sample* is defined as one element (event) of a dataset. A *batch* is a set of samples, after which the weights are updated (e.g. after backpropagation is performed). An *epoch* is generally defined as one full pass over the dataset.

**Static vs Dynamic models**: A **static** model is one in which training is performed before deployment. A **dynamic** model is one in which the model is continuously updated while in use.

**tf.GradientTape**: This is a way to trace matrix operations for computing gradients later. Seems like it is more important for very low-level programming in TensorFlow, but interesting nonetheless to understand how things are working.

**Trainable variables**: The designation of a `trainable=True/False` variable determines whether a variable will need to be differentiated, e.g. during backpropagation. Some obeserver variables, such as a training step counter, do not need to be trainable.

**tf.data.Dataset**:

**tf.cast**:

**tf.newaxis**:

- **`keras.models.Sequential`**: A sequence of layers
 - **`keras.layers.Flatten`**: Flatten a ndarry into a 1-dimensional array
 - **`keras.layers.Dense`**: A dense neural network layer
 - **`model.evaluate(test_data)`** vs **`model.predict(test_data)`**: Predict will output the prediction (e.g. the final output estimate of *y* (e.g. the output is the vector *y*), whereas evaluate will calculate the loss function and any metrics (the output is a vector of loss function + these metrics).

**Optimizer**:
 - `'adam'`: 
 - `'sgd'`: Stochastic Gradient Descent
 - keras.optimizers.RMSprop(lr=0.001): The learning rate is adapted to each of the free parameters, based on a rolling average of the magnitudes of recent gradients for that rate.

**Loss**: The loss function to minimize. Options are:
 - `'mean_squared_error'` (like it sounds)
 - `Huber Loss Function`: ???

**Metrics**: These are quantities that are calculated alongside the evaluation, for your convenience. You can add as many of these as you like. Examples:
 - Probabilistic metrics:
   - XXXCrossEntropy: 
   - KLDivergence:
   - Poisson:
 - Regression metrics:
   - MeanSquaredError (and variations)
   - MeanSquaredLogarithmicError
   
**Callbacks**: During model fitting, a callback is run after each epoch to e.g. stop the training once the descired accuracy is reached.




Front Matter
========

This is a quick notebook to explore the outputs of the Keras model.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

import pandas as pd
import seaborn as sns

%load_ext tensorboard
%matplotlib inline


Make an example dataset
=========

In [None]:
nPoints = 1000
x_sig = 1 * np.random.randn(nPoints) + 1
y_sig = 1 * np.random.randn(nPoints) + 1
x_bkg = 1 * np.random.randn(nPoints*10) - 1
y_bkg = 1 * np.random.randn(nPoints*10) - 1
label = np.append(np.ones(len(x_sig)),np.zeros(len(x_bkg))).astype(int)

In [None]:
df = pd.DataFrame(list(zip(np.append(x_sig,x_bkg),
                           np.append(y_sig,y_bkg),
                           label)),columns=['x','y','label'])

In [None]:
sns.pairplot(df,vars=['x', 'y'],hue='label');

Plot correlations of variables
----------

In [None]:
tmp = sns.heatmap(df[df['label'] == 1][['x','y']].corr(),annot=True,vmin=-1,vmax=1)
b, t = plt.ylim() # discover the values for bottom and top
plt.ylim(b + 0.5, t - 0.5) # update the ylim(bottom, top) values
plt.show()

Make the Keras Model, compile and fit
-------------

In [None]:
model = tf.keras.models.Sequential([
#    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(5, name='first_dense', input_shape=(2,), activation='relu'),
    tf.keras.layers.Dense(5, name='middle_dense', activation='relu'),
#    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, name='last_dense', activation='sigmoid') # sigmoid for binary; softmax for multi?
    ])

##### TODO:
#probability_model = tf.keras.Sequential([
#  model,
#  tf.keras.layers.Softmax()
#])

In [None]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
history = model.fit(df[['x','y']].values,
                    df['label'].values,
                    batch_size=64,
                    epochs=20,
                    verbose=5)

In [None]:
print(history.params)
print(history.history.keys())
print(list('%.3f'%a for a in history.history['accuracy']))
#dir(history)

Visualize the Model Structure
----------

In [None]:
#import pydot as pyd
#import graphviz
#tf.keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True,
#    rankdir='TB', expand_nested=True, dpi=96)
#model.summary()

In [None]:
# version 1
# from IPython.display import SVG
# SVG(tf.keras.utils.model_to_dot(model,expand_nested=True).create(prog='dot', format='svg'))
# version 2
#tf.keras.utils.plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True,expand_nested=True)

Evaluate the performance
===========

In [None]:
print(model.metrics_names)
model.evaluate(df[['x','y']].values,df['label'].values,verbose=2)

In [None]:
# This sets things as 1 or 0
# based on whether or not is is above or below 0.5
df['binary_prediction'] = model.predict_classes(df[['x','y']].values)
# This is the continuous variable
df['discriminant'] = model.predict(df[['x','y']].values)

Confusion Matrix
-------

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion_matrix(df['binary_prediction'].values,df['label'].values)

Plotting the raw discriminants
--------------

In [None]:
plt.hist(df[df['label'] ==0]['discriminant'],bins=20,label='bkg',density=True,histtype='step',lw=2)
plt.hist(df[df['label'] ==1]['discriminant'],bins=20,label='sig',density=True,histtype='step',lw=2)
plt.legend()
plt.yscale('log')

ROC Curve
----------

In [None]:
from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(df['label'].values, df['discriminant'].values)
from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

In [None]:
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Keras (area = {:.3f})'.format(auc_keras))
plt.ylabel('signal efficiency (true positive rate)')
plt.xlabel('bkg efficiency (false positive rate)')

2-D Gradients
----------

In [None]:
#Since probability is a continuous value from 0 to 1, we are getting many contours.
#If your visualization is restricted to 2 classes (output is 2D softmax vector) you can use this simple code

def plotModelOut(x,y,model):
    '''
    x,y: 2D MeshGrid input
    model: Keras Model API Object
    '''
    grid = np.stack((x,y))
    grid = grid.T.reshape(-1,2)
    outs = model.predict(grid)
    y1 = outs.T[0].reshape(x.shape[0],x.shape[0])
    plt.contourf(x,y,y1,cmap='binary')
    plt.colorbar()

    #plt.show()

xx, yy = np.meshgrid(np.linspace(-4,4,100), np.linspace(-4,4,100))
plotModelOut(xx,yy,model)

plt.scatter(df['x'][df['label'] == 0],df['y'][df['label'] == 0])
plt.scatter(df['x'][df['label'] == 1],df['y'][df['label'] == 1])

plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()

The Hello World of Neural Networks
--------

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])

In [None]:
model.compile(optimizer='sgd',loss='mean_squared_error')

In [None]:
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0],dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0],dtype=float)

In [None]:
model.fit(xs, ys, epochs=500)

In [None]:
print(model.predict([10.0,11.0]))

MNIST Dataset
-------

In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

In [None]:
print(train_images.shape,test_images.shape)

Normalize the data:
-----

In [None]:
train_images = train_images / 255.0
test_images  = test_images / 255.0

In [None]:
plt.imshow(train_images[0])
#print(train_images[0])
#print(train_labels[0])

In [None]:
model = keras.Sequential([keras.layers.Flatten(), # Optional: specifying shape, e.g. input_shape=(28,28)
                          keras.layers.Dense(512,activation=tf.nn.relu), # Try 128, 1024, ...
                          keras.layers.Dense(256,activation=tf.nn.relu), # Try 128, 1024, ...
                          keras.layers.Dense(10,activation=tf.nn.softmax)]) # 10 should match the output type

In [None]:
model.compile(optimizer= tf.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Add a callback to our fitting step in order to sto 
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy')>0.6):
      print("\nReached 60% accuracy so cancelling training!")
      self.model.stop_training = True

callback = myCallback()

In [None]:
model.fit(train_images,train_labels,epochs=5,callbacks=[callback])

In [None]:
model.evaluate(test_images,test_labels)

In [None]:
# These predictions are the scores for each of the 10 categories, for the first test image.
# As you can see, "9" is the winner.
classifications = model.predict(test_images)
print(classifications[0])

Handwriting MNIST
---------

In [None]:
# GRADED FUNCTION: train_mnist
def train_mnist():
    # Please write your code only where you are indicated.
    # please do not remove # model fitting inline comments.

    # YOUR CODE SHOULD START HERE
    class myCallback(tf.keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>=0.99):
          print("\nReached 99% accuracy so cancelling training!")
          self.model.stop_training = True

    callback = myCallback()
    # YOUR CODE SHOULD END HERE

    mnist = tf.keras.datasets.mnist

    (x_train, y_train),(x_test, y_test) = mnist.load_data(path='mnist.npz')
    # YOUR CODE SHOULD START HERE
    x_train = x_train / 255.0 # normalize the data to between 0 and 1
    x_test  = x_test / 255.0 # normalize the data to between 0 and 1
    # YOUR CODE SHOULD END HERE
    model = tf.keras.models.Sequential([
        # YOUR CODE SHOULD START HERE
        keras.layers.Flatten(),
        keras.layers.Dense(128,activation=tf.nn.relu),
        keras.layers.Dense(10,activation=tf.nn.softmax)
        # YOUR CODE SHOULD END HERE
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    # model fitting
    history = model.fit(# YOUR CODE SHOULD START HERE
        x_train,y_train,epochs=11,callbacks=[callback]
              # YOUR CODE SHOULD END HERE
    )
    # model fitting
    return history.epoch, history.history['accuracy'][-1]

train_mnist()

In [None]:
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data(path='mnist.npz')

In [None]:
print(y_train[0])
plt.imshow(x_train[0])

Convolutional Neural Networks
----------

**Note that the Coursera people wanted you to run this using GPUs in Colab. If you run it locally, your computer's fan starts running.**

 - The `CONV2D` layer runs a bunch of filters (the number of filters is specified by the first argument) on top of the 
   - Note that specifying `'relu'` will toss out negative numbers. Interesting.
 - The `MaxPool2D` layer finds the maximum value in a window.
 
An interesting idea I heard was to **start with not so many filters in the first Conv2D layer**, and then as you continue to pool (and therefore reduce the size of the image) you should **add more filters to learn in later Conv2D layers.**. Therefore, the number of parameters does not explode, and you have more opportunities for discovering interesting features in the final convolutional layers.

In [None]:
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0

test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

model.fit(training_images, training_labels, epochs=5)
test_loss = model.evaluate(test_images, test_labels)

In [None]:
f, axarr = plt.subplots(3,4,figsize=(12,8))

FIRST_IMAGE=0
SECOND_IMAGE=7
THIRD_IMAGE=26
CONVOLUTION_NUMBER = 3

from tensorflow.keras import models

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)

for x in range(0,4):
    f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[0,x].grid(False)
    f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[1,x].grid(False)
    f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[2,x].grid(False)

Assignment: Convolutional Neural Networks and (handwriting) MNIST
----------

Very similar to what you did above. Some interesting code snippet though:

```python
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
```

Something to look into.... interesting.

In [None]:
# GRADED FUNCTION: train_happy_sad_model
def train_happy_sad_model():
    # Please write your code only where you are indicated.
    # please do not remove # model fitting inline comments.

    DESIRED_ACCURACY = 0.999

    class myCallback(tf.keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>0.999):
          print("\nReached 99.9% accuracy so cancelling training!")
          self.model.stop_training = True
         # Your Code

    callbacks = myCallback()
    
    # This Code Block should Define and Compile the Model. Please assume the images are 150 X 150 in your implementation.
    model = tf.keras.models.Sequential([
        # Your Code Here
        tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    #model.summary()
    #return
    
    from tensorflow.keras.optimizers import RMSprop

    model.compile(optimizer=RMSprop(lr=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
        
    # This code block should create an instance of an ImageDataGenerator called train_datagen 
    # And a train_generator by calling train_datagen.flow_from_directory

    from tensorflow.keras.preprocessing.image import ImageDataGenerator

    train_datagen = ImageDataGenerator(rescale=1/255)

    # Please use a target_size of 150 X 150.
    train_generator = train_datagen.flow_from_directory(
        '/tmp/h-or-s',
        target_size=(150, 150),
        class_mode='binary')
        # Your Code Here)
    # Expected output: 'Found 80 images belonging to 2 classes'
    
    # This code block should call model.fit_generator and train for
    # a number of epochs.
    # model fitting
    history = model.fit(
        train_generator,
        steps_per_epoch=10,
        epochs=15,
        verbose=1,
        callbacks=[callbacks])
          # Your Code Here)
    # model fitting
    return history.history['acc'][-1]

Some cool code to visualize intermediate Convolutional Layers:
--------

```python
import numpy as np  
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300))  # this is a PIL image
x = img_to_array(img)  # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape)  # Numpy array with shape (1, 150, 150, 3)

# Rescale by 1/255
x /= 255

# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers[1:]]

# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
    if len(feature_map.shape) == 4:
        # Just do this for the conv / maxpool layers, not the fully-connected layers
        n_features = feature_map.shape[-1]  # number of features in feature map
        # The feature map has shape (1, size, size, n_features)
        size = feature_map.shape[1]
        # We will tile our images in this matrix
        display_grid = np.zeros((size, size * n_features))
        for i in range(n_features):
            # Postprocess the feature to make it visually palatable
            x = feature_map[0, :, :, i]
            x -= x.mean()
            x /= x.std()
            x *= 64
            x += 128
            x = np.clip(x, 0, 255).astype('uint8')
            # We'll tile each filter into this big horizontal grid
            display_grid[:, i * size : (i + 1) * size] = x
        # Display the grid
        scale = 20. / n_features
        plt.figure(figsize=(scale * n_features, scale))
        plt.title(layer_name)
        plt.grid(False)
        plt.imshow(display_grid, aspect='auto', cmap='viridis')
```

Data Augmentation
=========

Image data can be augmented using parameters in the ImageDataGenerator:

```python
# This code has changed. Now instead of the ImageGenerator just rescaling
# the image, we also rotate and do other operations
# Updated to do image augmentation
train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
```

Transfer Learning
=========

 - `include_top`: whether to include a fully-connected layer at the top of the model, or just skip directly to the convolutions
 - The weights file with 'notop' will be loaded -- note that this kind of goes with `weights=None` for the pre-trained model.
 - Finally, we set the pre-trained model layers to be non-trainable.

```python
from tensorflow.keras import layers
from tensorflow.keras import Model
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
  
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()

last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output
```

- **Finally, we add layers, starting from the last output -- notice how there is some kind of daisy-chain situation going on here:**

```python
from tensorflow.keras.optimizers import RMSprop

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])
```

Plotting Epochs
=========

```python
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend(loc=0)
plt.figure()

plt.show()
```

Switching from Binary to multi-classifier
======

 - The ImageDataGenerator will need `class_mode` to change from "binary" to "categorical"
 - The final layer of your Model will need to be `Dense(3,activation='softmax')` (3 for the number of categories!)
 - In Compiling the model, you should use a loss function of `'categorical_crossentropy'`

Word-based Encodings
=========

 - Tokenize the words.
 