# Introduction to Deep Learning Using Keras

[Keras](https://keras.io/) is a high-level API for deep learning. It is written in Python, and can run on top of [Theano](http://deeplearning.net/software/theano/) or [TensorFlow](https://www.tensorflow.org/), two very popular libraries for neural networks in Python. It allows users to implement deep learning models very fast and with minimum effort. In the past years, the contribution of Keras to research in deep learning has been significant since it has allowed researchers to go from ideas to results with the least possible delay.

In this part of the lab, we will implement a simple feedforward neural network to perform classification on a synthetic dataset, of two classes. Your first objective is to create this dataset. It will consist of 200 points in the 2-dimensional space $(N = 200, d = 2)$. Each point will belong either to class 0 or to class 1 (100 points per class), drawn from a Gaussian distribution: 

$$ \mathbf{x}_i \sim \mathcal{N}(\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k) $$

for class $k$. For class 0, we have $\boldsymbol{\mu}_0 = [1,1]$ and the covariance matrix is:

$$ \boldsymbol{\Sigma}_1 = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.5 \end{bmatrix} $$

For class 1, $\boldsymbol{\mu}_1 = [-1,-1]$ and $\boldsymbol{\Sigma}_1 = \boldsymbol{\Sigma}_0$. To generate these values make use of the [`randn`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html) function of NumPy that returns a sample from the "standard normal" distribution as follows: 

```python
sd * np.random.randn(...) + mu
```

In [None]:
import numpy as np

N = 200
d = 2
num_classes = 2

X = np.zeros((N, d))
y = np.zeros(N, dtype=np.int64)


#your code here

After generating the 200 points, plot them in a 2-dimensional plane using [`scatter`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter). Use the same color for points belonging to the same class.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

#your code here

Then, split the dataset into a training and a test set using the [`train_test_split`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function of scikit-learn. Set the proportion of the dataset to be included in the test set to 0.2.

In [None]:
from sklearn.model_selection import train_test_split

#your code here

Now you will use Keras to implement a simple feedforward neural network. In Keras, of particular importance is the notion of a model. The model is the data structure upon which the neural network is built. The most common type of model is the Sequential model, which corresponds to a linear stack of layers. We next initialize a Sequential model.

In [None]:
from tensorflow.keras.models import Sequential

model = Sequential()

After creating a Sequence, we can add layers to it. In this example, we will add a hidden layer and the output layer. The hidden layer will consist of 64 hidden units and the output of each neuron will be activated by the [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) activation function. The output layer will contain 1 neuron which corresponds to the probability that an instance belongs to class 1 of our problem. 
Both these layers are fully-connected neural network layers and can be implemented using the [Dense](https://keras.io/layers/core/#dense) class of Keras.

In [None]:
from tensorflow.keras.layers import Dense

#your code here

After defining the model, we compile it to configure its learning process. More specifically, we can specify the loss function and the optimizer and its parameters.

In [None]:
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.optimizers import SGD

model.compile(loss=binary_crossentropy, optimizer=SGD(lr=0.01, momentum=0.9))

Once compiled, we can train the model by iterating on the training data in batches.

In [None]:
model.fit(X_train, y_train, epochs=5, batch_size=16)

Once trained, we can use your model to generate predictions on new data. Predictions are real values between 0 and 1. Set predictions larger than 0.5 to 1 and predictions smaller than 0.5 to 0.

In [None]:
y_pred = model.predict(X_test, batch_size=16)[:,0]

#your code here

Finally, we will calculate the accuracy of the model by comparing the predictions against the ground truth class labels. Use the [`accuracy_score`](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) function of scikit-learn to compute the accuracy.

In [None]:
from sklearn.metrics import accuracy_score

#your code here

Using the `pcolormesh` function, we can plot the decision surface of the network with regard to the input space. This is demonstrated as follows. Use your code from above (using `scatter`) to plot the data points over this surface.

In [None]:
plt.figure()
xx1, xx2 = np.meshgrid(
    np.linspace(min(X[:,0]-1),max(X[:,0]+1),num=100), 
    np.linspace(min(X[:,1]-1),max(X[:,1]+1),num=100)
)
p_y = model.predict(np.c_[xx1.ravel(), xx2.ravel()])
plt.pcolormesh(xx1, xx2, p_y.reshape(*xx1.shape), cmap='RdBu')
plt.colorbar()
plt.title("Surface $P(Y | \mathbf{X})$")
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.scatter(X[:,0], X[:,1], c=y)

Create the XOR problem. Generate half of the datapoints of class 0 by sampling from a Gaussian with mean $\boldsymbol{\mu}_1 = [1,1]$ and covariance matrix:

$$ \boldsymbol{\Sigma}_1 = \begin{bmatrix} 0.3 & 0 \\ 0 & 0.3 \end{bmatrix} $$

The rest of the datapoints of class 0 are drawn from a Gaussian with mean $\boldsymbol{\mu}_2 = [-1,-1]$ and $\boldsymbol{\Sigma}_2 = \boldsymbol{\Sigma}_1$. For class 1, generate half of the datapoints by sampling from a Gaussian with mean $\boldsymbol{\mu}_3 = [1,-1]$ and $\boldsymbol{\Sigma}_3 = \boldsymbol{\Sigma}_1$, and the rest of the datapoints from a Gaussian with mean $\boldsymbol{\mu}_4 = [-1,1]$ and $\boldsymbol{\Sigma}_4 = \boldsymbol{\Sigma}_1$. Run the code again, and note how the decision surface changes.

In [None]:
#your code here

You will next implement a simple feedforward neural network and apply it to a multi-class classification problem. More specifically, you will experiment with the Iris dataset. This is perhaps the best-known dataset to be found in the pattern recognition and machine learning literature. The dataset is relatively small and contains 150 samples in total. Each sample corresponds to an iris plant and there are 3 classes in total which correspond to 3 different types of iris plants (Setosa, Versicolour, and Virginica). There are 4 features, namely Sepal Length, Sepal Width, Petal Length and Petal Width. Note that since this is a mult-class classification problem, the activation function in the output layer cannot be the sigmoid function, while the loss function is the categorical crossentropy.


In [None]:
from sklearn.datasets import load_iris
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.losses import categorical_crossentropy

X, y = load_iris(return_X_y=True)
y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

#your code here