# Theano XOR Example

Hello world! Let's get started with some **deep learning**. This example is the bottom rung of the ladder to **deep learning**. To start understanding **deep learning**, as with many things, it is always a good idea to try to formulate a mental model of the problem. I would recommend taking a look at [this blog post](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) for an excellect explanation of the core concepts behind **deep learning**.

![Hype train](images/hype_train.png)

First, let's set up our imports.

In [1]:
from __future__ import print_function

import theano
import theano.tensor as T
import numpy as np
import time

Initialize our input data `X` and output data `y`

In [2]:
X = theano.shared(value=np.asarray([[0, 1], [1, 0], [0, 0], [1, 1]]), name='X')
y = theano.shared(value=np.asarray([[0], [0], [1], [1]]), name='y')
print('X: {}\ny: {}'.format(X.get_value(), y.get_value()))

X: [[0 1]
 [1 0]
 [0 0]
 [1 1]]
y: [[0]
 [0]
 [1]
 [1]]


Instantiate a Numpy random number generator and seed the built-in one.

In [3]:
np.random.seed(42)
rng = np.random.RandomState(1234)

A helper method for generating the matrices (as Theano shared variables) for a single layer can be written as follows.

In [4]:
def layer(*shape):
    assert len(shape) == 2
    mag = 4. * np.sqrt(6. / sum(shape))
    W_value = np.asarray(rng.uniform(low=-mag, high=mag, size=shape), dtype=theano.config.floatX)
    b_value = np.asarray(np.zeros(shape[1], dtype=theano.config.floatX), dtype=theano.config.floatX)
    W = theano.shared(value=W_value, name='W_{}'.format(shape), borrow=True, strict=False)
    b = theano.shared(value=b_value, name='b_{}'.format(shape), borrow=True, strict=False)
    return W, b

Now that we have this helper method, let's generate the weights.

In [5]:
W1, b1 = layer(2, 5)
W2, b2 = layer(5, 1)
print(W1.get_value())

[[-2.28477995  0.90440604 -0.46122329  2.1135257   2.07365784]
 [-1.68430669 -1.65563108  2.23583464  3.39323698  2.78436792]]


With these weights, we can build our network. The hidden layer uses a `relu` activation function while the output layer uses a `sigmoid` activation function. From the outputs, we calculate the `cost` to minimize as the mean squared error between the calculated output and target output. Finally, we can get the network to start **deep learning** by applying our learning rule. As with many neural networks, we update each weight parameter in the direction that reduces the cost function.

In [6]:
output = T.nnet.sigmoid(T.dot(T.nnet.relu(T.dot(X, W1) + b1), W2) + b2) # The whole network
cost = T.mean((y - output) ** 2) # Mean squared error
updates = [(p, p - 0.1 * T.grad(cost, p)) for p in [W1, W2, b1, b2]] # Subgradient descent optimizer

Next, let's construct Theano functions for training and testing. The training function simply applies the learning updates, while the testing function outputs the `cost` tensor.

In [7]:
train = theano.function(inputs=[], outputs=[], updates=updates)
test = theano.function(inputs=[], outputs=cost)

Lastly, let's train and evaluate our network; lo and behold, it learns to separate the linearly nonseparable data points!

In [8]:
print('Cost before:', test())
start = time.time()
for i in range(10000):
    train()
end = time.time()
print('Cost after:', test())
print('Time (s):', end - start)

Cost before: 0.475550048714
Cost after: 0.000385103087629
Time (s): 0.422646999359
