# *Deep Learning* - Chapter 6

## Theory


![](img/OH3gI.png)

<b>Output activation</b><br>
The function applied to the node in the output layer
* Binary output (yes/no): Sigmoid
* Multi-class output (red, green blue): Softmax
* Gaussian output: linear

<b>Activation functions</b><br>
The function applied to the nodes in the hidden layer. A full list available at https://en.wikipedia.org/wiki/Activation_function
Most commonly used: 

* sigmoid
* tanh
* Rectified Linear Unit (ReLu)

<b>Cost functions</b><br>
The optimization goal. Most commonly used:
* Mean-Squared error: continous output
* Cross-entropy: binary / categorical output

<b>Back-propagation</b><br>
Update the weights of the neural network with the negative derivative of the error multipled by the learning rate





## Practical

This section just shows how to use Keras to learn a neural network the XOR function. [Keras](keras.io) is a high-level neural network API that builds on [TensorFlow](https://www.tensorflow.org/) or [Theano](http://deeplearning.net/software/theano/). Be aware that both TensorFlow and Theano WILL use your GPU if available, so your computer can run quite warm or the battery be quickly drained

First off, lets verify that a linear model cannot learn the XOR function



In [26]:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense
from sklearn import linear_model
import keras.callbacks

In [27]:
# Truth table
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

In [28]:
# Make a linear model w. predictions
reg = linear_model.LinearRegression()
reg.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [29]:
# Should predict everything to be 0.5
reg.predict(X)

array([[ 0.5],
       [ 0.5],
       [ 0.5],
       [ 0.5]])

In [45]:
# This is the Neural Network using the Keras language

# Just instantiate a new object
model = Sequential() 

# This is the first hidden layer. There are 2 hidden nodes (+1 bias node). The input dimension is two
model.add(Dense(2, input_dim=2, activation='tanh', kernel_initializer='glorot_uniform')) 

# This is the final output layer. There is one binary output
model.add(Dense(1, activation='sigmoid'))

# The model will be optimized using Stochastic Gradient Descent, the target is to minimuze the cross entropy
model.compile(loss='binary_crossentropy', optimizer='sgd')

In [46]:
# Fit the model
model.fit(X, y, batch_size=1, epochs=3000, verbose=0)
print(model.predict_classes(X))

[[0]
 [1]
 [1]
 [0]]
