# Module 5: Multilayer Perceptron

As you are encountering complex models, the lirbaries to use them get more and more complex.
Please do not feel like you have to master everything within these notebooks.
Consider them as introductory and _tip of the iceberg_. 
Neural networks are complex and take many months to master.
In subsequent labs you will see higher-level assemblies of networks using alternative API.

In this lab you will learn about ...

TensorFlow API reference
+ [tf.truncated_normal](https://www.tensorflow.org/api_docs/python/tf/truncated_normal)
+ [tf.Variable](https://www.tensorflow.org/api_docs/python/tf/Variable)
+ [tf.placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder)
+ [tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer)
+ [tf.Session](https://www.tensorflow.org/api_docs/python/tf/Session)


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import scale, LabelBinarizer
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Random seed for numpy
np.random.seed(18937)

## Artificial neuron

We can define parameters inside the model using `tf.Variable()`.
They will be initialized using a ["truncated" normal distribution](https://www.tensorflow.org/api_docs/python/tf/truncated_normal).
Then define the concept of a [neuron](https://en.wikipedia.org/wiki/Artificial_neuron) based on its mathematical formula.

$$ y_k = \varphi \left( \sum_{j=0}^{m}{w_{kj}x_j} +b_k \right) $$

The most basic units within tensorflow are [tf.constant](https://www.tensorflow.org/api_docs/python/tf/constant), [tf.Variable](https://www.tensorflow.org/api_docs/python/tf/Variable) and [tf.placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder).
They are the data structures that hold multidimensional arrays in the TensorFlow framdwork.

The difference between a `tf.Variable` and a `tf.placeholder` is that `tf.placeholders` are 
variables that can be assigned by a "driver" program, e.g. a script that invokes TensorFlow APIs and runs the training.
Placeholders will always need to be fed some external information. 

In [None]:
Weights = lambda shape: tf.Variable(tf.truncated_normal(shape, seed = 0x3fe69))
Biases = lambda shape: tf.Variable(tf.truncated_normal(shape, seed = 0xac5b0))

## Generate two-blob data

In [None]:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=600, centers=2, n_features=2, random_state = 76533)
X = scale(X, with_std = False) # Center X
plt.figure(figsize=(7,7))
plt.scatter(X[:,0], X[:,1], c=y)


## Fit one neuron

Training one neuron to classify the two-blobs data. 

This section aims to demonstrate the workflow of building and training a model using TensorFlow. 
Click [here](http://playground.tensorflow.org/#activation=sigmoid&batchSize=10&dataset=gauss&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=1&seed=0.99526&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false) to view the problem scenario on TensorFlow Playground.
This is the same site that was used for some of the lecture videos.

### Step 1: Input/output

During training, we should use [tf.placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) to "feed" information into the neural network model.
In this example, the model will be fed with features and labels for supervised learning.

In [None]:
features = tf.placeholder("float", (None, X.shape[1]))
labels = tf.placeholder("float", (None, ))

### Step 2: Computation graph

This is the part where you describe the dataflow.
A [Graph](https://www.tensorflow.org/api_docs/python/tf/Graph) contains a set of [tf.Operation](https://www.tensorflow.org/api_docs/python/tf/Operation) objects, 
which represent units of computation;
and [tf.Tensor](https://www.tensorflow.org/api_docs/python/tf/Tensor) objects, 
which represent the units of data that flow between operations.

TensorFlow Operations are callable objects that take Tensors and return Tensors.
TensorFlow Tensors are in fact lazy datastructures representing multidimensional arrays that won't hold values until evaluation.

The following cell defines a graph containing one neuron as well as the loss function.
In addition, the backpropagation is also defined by **optimizer** and **training** as part of the graph.

In [None]:
def one_neuron_graph():
    # Create a single neuron for this model.
    neuron = lambda x: tf.sigmoid(tf.add(tf.matmul(x, Weights((X.shape[1], 1))), Biases((1, ))))

    # Predictions are made by this neuron.
    predictions = neuron(features)

    # Loss function to be optimized. We use mean squared error.
    loss = tf.losses.mean_squared_error(labels, tf.squeeze(predictions))

    # An optimizer defines the operation for updating parameters within the model.
    # Learning rate = 0.5
    optimizer = tf.train.GradientDescentOptimizer(0.5)

    # Training is defined as minimizing the loss function using gradient descent.
    training = optimizer.minimize(loss)
    
    return [training, loss, predictions]

### Step 3: Run computation graph with TensorFlow Session

In [None]:
class OneNeuron(object):
    def __init__(self, session):
        self.context = [session] + one_neuron_graph()

    def fit(self, X, y, N_BATCH = 32):
        sess, training, loss, _  = self.context
        
        # An array recording training loss
        training_loss = []
        
        # Training loop
        for epoch in range(10):
            epoch_loss = []
            for i in range(0, X.shape[0], N_BATCH):
                _, batch_loss = sess.run([training, loss], feed_dict={
                    features: X[i:i+N_BATCH],
                    labels: y[i:i+N_BATCH]
                })
                epoch_loss.append(batch_loss)
            training_loss.append(np.mean(epoch_loss))
        
        self.training_loss = training_loss
    
    def predict(self, X, N_BATCH = 32):
        sess, _, _, predictions  = self.context
        
        y_pred = []
        for i in range(0, X.shape[0], N_BATCH):
            batch_prediction = sess.run(predictions, feed_dict={
                features: X[i:i+N_BATCH]
            })
            y_pred.extend(batch_prediction.squeeze())
        return np.array(y_pred)

In [None]:
with tf.Session() as sess:
    # Create OneNeuron model
    one_neuron = OneNeuron(sess)
    
    # Initialize variables
    sess.run(tf.global_variables_initializer())
    
    # Training
    one_neuron.fit(X, y)
        
    # Evaluation       
    print('accuracy', accuracy_score(y, one_neuron.predict(X)>=0.5))
    
    # Plot training loss
    plt.figure(figsize=(6,4))
    plt.title('loss')
    plt.xticks(range(len(one_neuron.training_loss)))
    plt.plot(range(len(one_neuron.training_loss)), one_neuron.training_loss)
        
    # Plot decision plane
    x1, x2 = np.meshgrid(np.linspace(-6, 6, 120), np.linspace(-6, 6, 120))
    Z = np.array(one_neuron.predict(np.column_stack([x1.ravel(), x2.ravel()]))).reshape(x1.shape)
    plt.figure(figsize=(7,7))
    plt.imshow(Z, interpolation='nearest',
        extent=(x1.min(), x1.max(), x2.min(), x2.max()), vmin=0.0, vmax=1.0,
        aspect='equal', origin='lower', cmap='binary'
    )
    plt.scatter(X[:,0], X[:,1], c=y)

## Two-layer feedforward network

In this section, we will fit a two-layer feedforward neural network on the red wine dataset and deal with some 
practical problems in applying neural network to a real-world dataset before we move on to more advanced models.

### Load Dataset

Load dataset from files into multi-dimensional array.

In [None]:
# Dataset location
DATASET = '/dsa/data/all_datasets/wine-quality/winequality-red.csv'
assert os.path.exists(DATASET)

# Load and shuffle
dataset = pd.read_csv(DATASET, sep=';').sample(frac = 1).reset_index(drop=True)

# Pull features and labels
X = scale(np.array(dataset.iloc[:, :-1]))
y = np.array(dataset.quality)

# Create training/validation split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

print('Class distribution:', {i: np.sum(y==i) for i in np.unique(dataset.quality)})
dataset.describe()

### Dense layers

Any layers that are not the output layer are referred to as hidden layers.
So in this model we will have a hidden layer and an output layer, each containing a few artificial neurons.
In this model, the neurons are fully connected between layers. 
Therefore, these layers are called dense layers.

First we define concept of a dense layer.

In [None]:
def Dense(n, activation):
    return lambda x: activation(
        tf.matmul(x, Weights((x.get_shape().as_list()[1], n))) + Biases((n, )))

In [None]:
class FeedForwardNN(object):
    def __init__(self, session):
        # Two-layer FeedForwardNN
        hidden_layer = Dense(10, tf.sigmoid)
        output_layer = Dense(6, tf.identity)
        predictions = output_layer(hidden_layer(features))

        # Loss function
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))

        # An optimizer defines the operation for updating parameters within the model.
        optimizer = tf.train.GradientDescentOptimizer(0.03)

        # Training is defined as minimizing the loss function using gradient descent.
        training = optimizer.minimize(loss)
        
        self.context = [session, training, loss, predictions]
        
    def fit(self, X_train, y_train, N_BATCH=32):
        sess, training, loss, _  = self.context
        label_encoding=LabelBinarizer()
        label_encoding.fit(y)
        
        training_loss = []
        for epoch in range(200):
            epoch_loss = []
            for i in range(0, X_train.shape[0], N_BATCH):
                _, batch_loss = sess.run([training, loss], feed_dict={
                    features: X_train[i:i+N_BATCH],
                    labels: label_encoding.transform(y_train[i:i+N_BATCH])
                })
                epoch_loss.append(batch_loss)
            training_loss.append(np.mean(epoch_loss))
        self.training_loss = training_loss
        self.label_encoding = label_encoding
        
    def predict(self, X_test, N_BATCH=32):
        sess, _, _, predictions  = self.context
        
        y_pred = []
        for i in range(0, X_test.shape[0], N_BATCH):
            batch_prediction = sess.run(predictions, feed_dict={
                features: X_test[i:i+N_BATCH]
            })
            class_probablity = self.label_encoding.inverse_transform(np.exp(batch_prediction))
            y_pred.extend(class_probablity)
        return np.array(y_pred)

In [None]:
with tf.Session() as sess:
    features = tf.placeholder("float", (None, 11))
    labels = tf.placeholder("float", (None, 6))
    feedforward = FeedForwardNN(sess)
    sess.run(tf.global_variables_initializer())
    feedforward.fit(X_train, y_train)
    
    plt.figure(figsize=(6,4))
    plt.title('loss')
    plt.plot(range(len(feedforward.training_loss)), feedforward.training_loss)
    
    plt.figure(figsize=(4,4))
    y_pred = feedforward.predict(X_test)
    print('accuracy', accuracy_score(y_test, y_pred))
    plt.imshow(confusion_matrix(y_test, y_pred))

## Multilayer perceptron (MLP)

In [None]:
class MultilayerPerceptron(FeedForwardNN):
    def __init__(self, session, features, labels):
        # Two-layer FeedForwardNN
        hidden_layer = tf.layers.dense(features, 10, tf.tanh)
        hidden_layer2 = tf.layers.dense(hidden_layer, 8, tf.tanh)
        hidden_layer3 = tf.layers.dense(hidden_layer2, 8, tf.sigmoid)
        predictions = tf.layers.dense(hidden_layer3, 6)

        # Loss function
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))

        # An optimizer defines the operation for updating parameters within the model.
        optimizer = tf.train.AdamOptimizer()

        # Training is defined as minimizing the loss function using gradient descent.
        training = optimizer.minimize(loss)
        
        self.context = [session, training, loss, predictions]
        
    def fit(self, X_train, y_train, N_BATCH=32):
        sess, training, loss, _  = self.context
        label_encoding=LabelBinarizer()
        label_encoding.fit(y)
        
        training_loss = []
        for epoch in range(500):
            epoch_loss = []
            for i in range(0, X_train.shape[0], N_BATCH):
                _, batch_loss = sess.run([training, loss], feed_dict={
                    features: X_train[i:i+N_BATCH],
                    labels: label_encoding.transform(y_train[i:i+N_BATCH])
                })
                epoch_loss.append(batch_loss)
            training_loss.append(np.mean(epoch_loss))
        self.training_loss = training_loss
        self.label_encoding = label_encoding

In [None]:
with tf.Session() as sess:
    features = tf.placeholder("float", (None, 11))
    labels = tf.placeholder("float", (None, 6))
    mlp = MultilayerPerceptron(sess, features, labels)
    sess.run(tf.global_variables_initializer())
    mlp.fit(X_train, y_train)
    
    plt.figure(figsize=(6,4))
    plt.title('loss')
    plt.plot(range(len(mlp.training_loss)), mlp.training_loss)
    
    plt.figure(figsize=(4,4))
    y_pred = mlp.predict(X_test)
    print('accuracy', accuracy_score(y_test, y_pred))
    plt.imshow(confusion_matrix(y_test, y_pred))


This code may seem overwhelming to take in the first time through.
Please take a break and revisit later to give it another read.

# Save your notebook!