# Introduction to neural networks


In this session we will start with a simple toy implementation of a neural network and apply it to the XOR problem. In the second part we will learn how to use the [Keras toolkit](https://keras.io/) to define, train and use a practical neural network model.

## XOR

Let's start with the [XOR problem](https://en.wikipedia.org/wiki/XOR_gate). 

In [None]:
import numpy
%pylab inline --no-import-all

### Exercise 7.1
Define the function `xor`, which which takes a Nx2 array, where each row is an input to the logical XOR. It outputs an array of size N with the corresponding outputs.

Given `X = numpy.array([[0, 0],      
                 [0, 1],      
                 [1, 0],      
                 [1, 1]])`
                 
`xor(X)` should output `[0, 1, 1, 0]`

In [None]:
def xor(X):
    #.........


In [None]:
X = numpy.array([[0, 0],      # FALSE
                 [0, 1],      # TRUE
                 [1, 0],      # TRUE
                 [1, 1]])     # FALSE
y = xor(X)
print(y)

In [None]:
pylab.scatter(X[:,0], X[:,1], c=y, s=200)

## Neural network
We can define a simple two layer neural network by hand which solves the XOR classification problem. The network has parameters $\mathbf{W}$ and $\mathbf{U}$, and computes the following:

$$Y = \sigma(U(\sigma(WX^T))$$

Where $\mathbf{X}$ is the input array, with shape Nx2, $\mathbf{W}$ is a 2x2 matrix, and $\mathbf{U}$ is a 1x2 matrix. The result is a 1xN matrix (i.e. a single row vector) of XOR values.

### Exercise 7.2

Define function `sigma` which returns one if the input is greater than or equal to 0.5, and zero otherwise.

Given `X = numpy.array([[0.1, 0.3], [0.5, 0.7]])`
        
`sigma(X)` should output 

`[[0. 0.]
[1. 1.]]`

In [None]:
def sigma(X):
    #...............
    

In [None]:
z = numpy.random.uniform(0,1,(3,2))
print(z)
print(sigma(z))

### Exercise 7.3

Define function `nnet` which takes the weight matrices W and U, and the input X, and returns the result Y computed according to the formula above.

Given `X = numpy.array([[0, 0],      
                 [0, 1],      
                 [1, 0],      
                 [1, 1]])`
 
`W = numpy.array([[1,-1],
                 [-1,1]])`
                 
`U = numpy.array([1,1])`

`nnet(W, U, X)` should output `[0, 1, 1, 0]`

In [None]:
def nnet(W,U,X):
    #..........................................


Define the weights:

In [None]:
W = numpy.array([[1,-1],
                 [-1,1]])
U = numpy.array([1,1])

Check what it outputs

In [None]:
y_pred = nnet(W, U, X)
print(y)
print(y_pred)


And plot the outputs as a function of inputs.

In [None]:
# Create a grid of points for plotting
shape=(20,20)
grid = numpy.array([ [i,j] for i in numpy.linspace(0,1,shape[0]) 
                               for j in numpy.linspace(0,1,shape[1]) ])
# Apply the neural net to all the points
y_pred = nnet(W, U, grid)
pylab.pcolor(y_pred.reshape((20,20)))
pylab.colorbar()
pylab.xticks([])
pylab.yticks([])

## Training XOR NN with Keras

We'll now learn how to build a simple neural network in Keras. 

In [None]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import Adam

# If the above lines give you an error because of the newer Tensorflow version, 
# you can try use the following imports and comment out the above lines:

#from tensorflow.keras.models import Sequential
#from tensorflow.keras.layers import Dense, Dropout, Activation
#from tensorflow.keras.optimizers import Adam

model = Sequential()
# Add two hidden layers with 4 hidden units each, and the tanh activation.

model.add(Dense(4, input_dim=2, activation='tanh'))
model.add(Dense(4, activation='tanh'))

# The final layer is the output layer with an inverse logit activation function.
model.add(Dense(1, activation='sigmoid'))

# Use the Adam optimizer. Adam works similar to regular SGD, 
# but with some important improvements: https://arxiv.org/abs/1412.6980
optimizer = Adam(lr=0.02)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

We can now train the model, specifying number of epochs, size of the minibatch, and whether to print extra information.

In [None]:
model.fit(X, y, epochs=100, batch_size=1, verbose=1)

In [None]:
print("   x1          x2          F(x1, x2)")
print(np.hstack([X, model.predict(X)]))

In [None]:
# Apply the neural net to all the points
y_pred = model.predict(grid)
pylab.pcolor(y_pred.reshape((20,20)))
pylab.colorbar()
pylab.xticks([])
pylab.yticks([])

## Regression with NN on iris

We will now define and train a neural network model for regression on the iris data.

### Load data

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
data = load_iris()
# Inputs
X = numpy.array(data.data[:,0:3], dtype='float32')
# Output
y = numpy.array(data.data[:,3], dtype='float32')


X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=1/3, random_state=999)
print(X_train.shape)
print(y_train.shape)

### Exercise 7.4


Define a multilayer perceptron with the following specifications:
- Hidden layer 1: size 16, activation: tanh
- Hidden layer 2: size 16, activation: tanh
- Output layer: size 1, activation: linear

Compile it using the following specifications:
- optimizer: Adam
- loss: mean squared error

Train the network, and try to find a good value of learning rate by monitoring the loss.

Compute mean absolute error and r-squared the validation data.

In [None]:
#..................................



In [None]:
from sklearn.metrics import mean_absolute_error, r2_score
y_pred = model.predict(X_val)
print(mean_absolute_error(y_val, y_pred))
print(r2_score(y_val, y_pred))

## Classification

Let's now do classification. The target is a categorical vector. It will need to be transformed to an array of dummies. This transform is also called one-hot encoding.
This can be done manually, but sklearn.preprocessing has some utilities that make it simple:
- OneHotEncoder
- LabelBinarizer


In [None]:
# Inputs
X = numpy.array(data.data, dtype='float32')
# Output
y = numpy.array(data.target, dtype='int32')
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=1/3, random_state=999)

# One-hot Indicator array for classes
from sklearn.preprocessing import LabelBinarizer
onehot = LabelBinarizer()
Y_train = onehot.fit_transform(y_train)
Y_val   = onehot.transform(y_val)

print(Y_train[:10,:])

### Exercise 7.5

Define a multilayer perceptron with the following specifications:
- Hidden layer 1: size 16, activation: tanh
- Hidden layer 2: size 16, activation: tanh
- Output layer: size 3, activation: softmax

NB: softmax is a generalization of inverse logit to more than 2 classes. It converts class scores to class probabilities, while making sure than they sum up to 1:

```
def softmax(x):
    z = numpy.exp(x)
    return z/numpy.sum(z)
```

Compile it using the following specifications:
- optimizer: Adam
- loss: categorical_crossentropy

Train the network, and try to find a good value of learning rate by monitoring the loss.
Use the method `.predict_classes` to predict the targets on validation data.
Compute the classification accuracy using `accuracy_score` from `sklearn.metrics` on validation data.

In [None]:
#.....................................


In [None]:
#.....................................


### Exercise 7.6


Train a neural network classifier on the handwritten digits dataset. 
This dataset comes with scikit learn and can be accessed as follows:

In [None]:
from sklearn.datasets import load_digits
digits = load_digits()
images_and_labels = list(zip(digits.images, digits.target))

for index, (image, label) in enumerate(images_and_labels[:10]):
    pylab.subplot(2, 5, index + 1)
    pylab.axis('off')
    pylab.imshow(image,cmap=plt.cm.gray_r)
    pylab.title('%i' % label)

The targets are in `digits.target` and the pixel values flattened into an array are in `digits.data`.

Train a classifier on the first 1000 of the images, and evaluate on the rest. 
Before testing the neural network model, check the classification error rate of a logistic regression classifier as a baseline using `LogisticRegression` from `sklearn.linear_model`.


Remember to convert the targets to the one-hot representation for training the neural network using `LabelBinarizer`.

Some things to try when training a neural network model for this dataset:

- start with two or three hidden layers
- use between 32 to 128 units in each layer
- try different learning rates in the Adam optimizer (lr=0.001, lr=0.0001) and monitor the loss function
- train for at least 100 epochs
- try the `relu` activation function instead of `tanh`



In [None]:
# .....................


In [None]:
#.....................................
