## Tutorial 27: Neural networks, deep learning, and keras

In this tutorial, you will get a very basic introduction to neural networks
and how to build them in Python. Let us start by loading all of our standard
modules and scripts.

In [None]:
import wiki
import iplot
import wikitext

import numpy as np

import numpy as np
import matplotlib.pyplot as plt
import sklearn

In [None]:
assert wiki.__version__ >= 6
assert wikitext.__version__ >= 2
assert iplot.__version__ >= 3

For today, we will once again take links from the "important publications in
philosophy" page to build a corpus for prediction. We will make a `WikiCorpus`
object to simplify the computation of metrics for the page. Below I have removed
two pages that give our Windows users some trouble.

In [None]:
np.random.seed(0)
links = wikitext.get_internal_links('List_of_important_publications_in_philosophy')['ilinks']
links.remove("What_Is_it_Like_to_Be_a_Bat?")
links.remove("What_is_Life?_(Schrödinger)")
links = np.random.permutation(links)
wcorp = wikitext.WikiCorpus(links, num_clusters=15, num_topics=15)

And, again, we will grab two potential response variables (one continuous variable
and one categorical one) and stack them together in a numeric numpy array.

In [None]:
num_ilinks = wcorp.meta['num_ilinks'].values
lan_version = np.array(['ru' in x for x in wcorp.meta['langs']], dtype=np.int)

num_sections = wcorp.meta['num_sections'].values
num_images = wcorp.meta['num_images'].values
num_elinks = wcorp.meta['num_elinks'].values
num_langs = wcorp.meta['num_langs'].values
num_chars = np.array([len(x) for x in wcorp.meta['doc'].values])

x = np.stack([num_sections, num_images, num_elinks, num_langs, num_chars], axis=1)

## Neural networks

Neural networks, or deep learning, is often made to sound like a fancy, scary,
impossible to understand thing. I try to think of them as just another way of
building a predictive model (albeit, an important one). I cannot go into too
much detail given the time-constraint, but let's talk about the basic idea of
a small neural network: its a sequence of chained together linear models.

What's the benefit of putting together multiple linear models? Think of this
very simple description of a single input (x) a single output (y) and one single
"hidden" layer with two "hidden" parameters (z1 and z2):

<img src="img/nn2.png" alt="drawing" width="740"/>

You'd be correct in thinking this is silly. Any feasible output y could be
described directly without requiring these two hidden values. Visually, we can
see that any combination of two linear models just gives another linear model:

<img src="img/nn1.png" alt="drawing" width="400"/>

However, we have one minor modification to make. Rather than using the raw output
of the linear regressions (z1 and z2), we apply a function called a Rectified
Linear Unit, or ReLU. It is a really fancy name of taking the positive part of
the function. If we do this, then we can get a non-linear output y from a chain
of linear models:

<img src="img/nn3.png" alt="drawing" width="400"/>

In fact, it turns out, with enough hidden layers a neural network is a universal
function approximator. That is, it can approximate any (reasonably smooth)
function.

## Building deep learning models

To start actually building a neural network, we need a few functions from keras.
Let's load them in here: 

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import normalize

Now, we will see how to build predictive models using neural networks and the
**keras** module. As a starting point, we need to normalize the data matrix 
x so that each column has unit norm.

In [None]:
x = normalize(x)
y = normalize(num_ilinks).transpose()

y_train = y[:325, :]
y_test  = y[325:, :]
x_train = x[:325, :]
x_test  = x[325:, :]

To build the actual model, we start with an empty sequential model:

In [None]:
model = Sequential()

And then add an *input* layer. This tells keras how many columns
are in x and how many hidden z's we want in the first layer. Let's
just use 2 hidden values like our toy example. Notice that I set
the 'relu' activation function.

In [None]:
model.add(Dense(units=2, activation='relu', input_dim=5))

Finally, I'll add the output layer. Our response value y has only a
single column, so this layer just has one unit. 

In [None]:
model.add(Dense(units=1))

We can see the entire model by printing out the model summary:

In [None]:
model.summary()

Before trying to learn the parameters in the model from our training data,
we need to *compile* the layers. This makes it much faster to train when
using large datasets.

In [None]:
model.compile(loss='mse', optimizer='sgd')

Finally, we can fit the data using our training data. Keras allows us
to directly pass the validation data to see how well the function works.
Note that this algorithm does not have a specific analytic solution and
requires us to simulate the solution. This is what the *epochs* parameter
controls.

In [None]:
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Prediction works similar to the sklearn functions.

In [None]:
pred = model.predict(x_train)

And we can see what the actual weights are as follows. Here are the
weights from the first layer:

In [None]:
model.layers[0].get_weights()

And here are the weights from the second layer:

In [None]:
model.layers[1].get_weights()

## A deeper model

We can construct much larger and deeper models using keras. Here is
a model with four hidden layers with 32 hidden states in each.

In [None]:
model = Sequential()
model.add(Dense(units=32, activation='relu', input_dim=5))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=1))

In [None]:
model.compile(loss='mse', optimizer='sgd')

In [None]:
model.fit(x_train, y_train, epochs=25, validation_data=(x_test, y_test))

### Neural networks for classification

We can, and more often than not do, build neural networks for classification
tasks. The easiest way to make this happen is by converting a categorical output
to a *one-hot encoding* by building a matrix with one column per category. This
can be done with the `to_categorical` function from keras.

In [None]:
from keras.utils import to_categorical

In [None]:
y = to_categorical(lan_version)
y[:10,]

We can then split this into a training and testing set.

In [None]:
y_train = y[:325, :]
y_test  = y[325:, :]

Now, if we build a neural network we need to make two changes:
first, the final layer needs to have two units, and secondly the
final layer needs a special activation function. The special
activation function is called a "softmax" and ensures that the
two values are positive numbers that add up to one.

In [None]:
model = Sequential()
model.add(Dense(units=32, activation='relu', input_dim=5))
model.add(Dense(units=2, activation='softmax'))

We also use some different parameters in the model compilation function:

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='RMSprop',
              metrics=['accuracy'])

Fitting the model works exactly the same way.

In [None]:
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

### Conclusions

I'll admit that neither of these toy problems work very well with neural
networks. These are not the kinds of problems designed to work well with
them. I will also admit that we really do not have the kind of time (nor
can I assume that mathematical background) needed to really learn about
how to build neural networks in MATH289. I hope, though, that you get
something out of these notes. We will be using neural networks in the next
tutorial and I think you'll find them, in that form, quite accessable.