# Non-Linear Classifiers

Today’s class will introduce modern neural network models, commonly known as deep learning models. We will learn the concept of computation graph, a general way of describing complex functions as composition of simpler functions. We will also learn about Backpropagation, a generic solution for gradient-descent based optimization in computation graphs.

<b> Exercise 3.1 </b>
<br>
To ease-up the upcoming implementation exercise, examine and comment the following implementation of a log-linear model and its gradient update rule. Start by loading Amazon sentiment corpus used in day 1

In [13]:
import numpy as np
import lxmls.readers.sentiment_reader as srs 
from lxmls.deep_learning.utils import AmazonData
corpus=srs.SentimentCorpus("books")
data = AmazonData(corpus=corpus)

In [15]:
print corpus.train_X

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [4]:
print corpus.train_y

[[0]
 [1]
 [0]
 ...
 [1]
 [0]
 [1]]


In [11]:
print "Number of instances to test", len(corpus.train_y)
print "Number of instances that belong to class 1: ",(corpus.train_y==1).sum()
print "Number of instances that belong to class 0: ",(corpus.train_y==0).sum()

Number of instances to test 1600
Number of instances that belong to class 1:  800
Number of instances that belong to class 0:  800


> Compare the following numpy implementation of a log-linear model with the derivations seen in the previous sections. Introduce comments on the blocks marked with # relating them to the corresponding algorithm steps.

In [21]:
from lxmls.deep_learning.utils import Model, glorot_weight_init, index2onehot 
import numpy as np
from scipy.misc import logsumexp

class NumpyLogLinear(Model):
    def __init__(self, **config):
        # Initialize parameters
        weight_shape = (config['input_size'], config['num_classes'])
        # after Xavier Glorot et al
        self.weight = glorot_weight_init(weight_shape, 'softmax')
        self.bias = np.zeros((1, config['num_classes']))
        self.learning_rate = config['learning_rate']

    def log_forward(self, input=None):
        """Forward pass of the computation graph"""
        
        #weighted sums of the node input plus a bias.  Sum(w*x)+b
        z = np.dot(input, self.weight.T) + self.bias 
        
        # Softmax implemented in log domain
        log_tilde_z = z - logsumexp(z, axis=1)[:, None]
        
        return log_tilde_z
    
    def predict(self, input=None):
        """Prediction: most probable class index"""
        return np.argmax(np.exp(self.log_forward(input)), axis=1)

    def update(self, input=None, output=None): 
        """Stochastic Gradient Descent update"""
        
        #compute class probabilities
        class_probabilities = np.exp(self.log_forward(input))
        batch_size, num_classes = class_probabilities.shape
        
        #Error derivative 
        I = index2onehot(output, num_classes)
        error = (class_probabilities - I) / batch_size
        
        #Gradient of the cost ∇F with respect to the weights 
        gradient_weight = np.zeros(self.weight.shape) 
        for l in range(batch_size):
            gradient_weight += np.outer(error[l, :], input[l, :])
            
        #Gradient of the cost ∇F with respect to the bias 
        gradient_bias = np.sum(error, axis=0, keepdims=True)
        
        #update our parameters estimates with gradient descent (W ← W − η∇WF; b ← b − η∇bF,)
        self.weight = self.weight - self.learning_rate * gradient_weight
        self.bias = self.bias - self.learning_rate * gradient_bias

> Instantiate model and data classes. Check the initial accuracy of the model. This should be close to 50% since we are on a binary prediction task and the model is not trained yet.

In [28]:
# Instantiate model
model = NumpyLogLinear(
    input_size=corpus.nr_features,
    num_classes=2,
    learning_rate=0.05
)
# Define number of epochs and batch size
num_epochs = 10
batch_size = 30
# Instantiate data iterators
train_batches = data.batches('train', batch_size=batch_size)
test_set = data.batches('test', batch_size=None)[0]
# Check initial accuracy
hat_y = model.predict(input=test_set['input'])
accuracy = 100*np.mean(hat_y == test_set['output']) 
print("Initial accuracy %2.2f %%" % accuracy)

Initial accuracy 54.25 %


Importing `logsumexp` from scipy.misc is deprecated in scipy 1.0.0. Use `scipy.special.logsumexp` instead.


> Train the model with simple batch stochastic gradient descent. Be sure to understand each of the steps involved, including the code running inside of the model class. We will be wokring on a more complex version of the model in the upcoming exercise.

In [29]:
# Epoch loop
for epoch in range(num_epochs):

    # Batch loop
    for batch in train_batches:
        model.update(input=batch['input'], output=batch['output'])

    # Prediction for this epoch
    hat_y = model.predict(input=test_set['input'])

# Evaluation
accuracy = 100*np.mean(hat_y == test_set['output'])
print("Epoch %d: accuracy %2.2f %%" % (epoch+1, accuracy))

Importing `logsumexp` from scipy.misc is deprecated in scipy 1.0.0. Use `scipy.special.logsumexp` instead.


Epoch 400: accuracy 84.75 %


<b> Exercise 3.2  </b>
<br>Instantiate the feed-forward model class and optimization parameters. This models follows the architecture described in Algorithm 10.

In [31]:
# Model
geometry = [corpus.nr_features, 20, 2]
activation_functions = ['sigmoid', 'softmax']
# Optimization
learning_rate = 0.05
num_epochs = 10
batch_size = 30
# Instantiate model
from lxmls.deep_learning.numpy_models.mlp import NumpyMLP 

model = NumpyMLP(
    geometry=geometry,
    activation_functions=activation_functions,
    learning_rate=learning_rate
)