# Exercise 2 - Introduction to (Deep) Neural Networks

This exercise uses some images and information from https://github.com/YaleATLAS/CERNDeepLearningTutorial

# Table of Contents

* [1 Introduction to Keras](#Introduction-to-Keras)
* [2 Breast cancer dataset](#Breast-cancer-dataset)
    * [Loading the dataset](#Loading-the-dataset)
    * [Plotting the dataset](#Plotting-the-dataset)
    * [Preparing the dataset](#Preparing-the-dataset)
* [3 Training a dense neural network](#3Training-a-dense-neural-network)
    * [Neural network model](#Neural-network-model)
    * [Build a simple neural network](#Build-a-simple-neural-network)

# 1 Introduction to Keras

  <a href='http://keras.io'><img src='https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png' style="height:100px;"></a>
 *  Modular, powerful and intuitive Deep Learning Python library built on
 <center>
 <a href='http://deeplearning.net/software/theano/'><img src='./images/theano-logo.png' style="height:50px; display:inline;"></a> and
 <a href='https://www.tensorflow.org/'><img src='./images/tf-logo.png' style="height:60px; display:inline"></a> and
 <a href='https://docs.microsoft.com/en-us/cognitive-toolkit/'><img src='https://developer.nvidia.com/sites/default/files/akamai/cuda/images/deeplearning/cntk.png' style="height:80px; display:inline"></a>
 </center>



> <i>Developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. </i>
<div align="right">
  https://keras.io
</div>

*  Minimalist, user-friendly interface
*  Extremely well documented, lots of <a href='https://github.com/fchollet/keras/tree/master/examples'>working examples</a>
*  Very shallow learning curve $\rightarrow$ by far one of the best tools for both beginners and experts
*  Open-source, developed and maintained by a community of contributors, and publicly hosted on <a href='https://github.com/fchollet/keras'>GitHub</a>
*  Extensible: possibility to customize layers
 
From the Keras website: 
<img src='./images/keras_principles.jpg' style="width:800px;">

# 2 Breast cancer dataset

## Loading the dataset

### Task 1: For this exercise we want to use the breast cancer dataset from sci-kit learn. Prepare the dataset in the following way:
* Load the dataset (`load_breast_cancer`), inspect it and create a pandas `DataFrame`.
* How many example and how many features do we have? What are the names of the classes? How many examples of each class do we have?
* Plot the mean radius and the mean smoothness of the training data in a 2D scatter plot for the two classes

## Plotting the dataset

Pandas has also some nice built-in plotting features, for instance you can plot the histograms of the features:

In [None]:
df[breast_cancer.feature_names[0:10]].hist(alpha=0.8, figsize=(10, 10))
plt.show()

What if we are interested in the different shape between the two classes? We could simply add the target to the DataFrame:

In [None]:
df = df.assign(target=breast_cancer.target)
df.keys()

And then use the useful groupby function and plot a kernel density estimate (kde) plot: 

In [None]:
df.groupby("target")["mean radius"].plot(kind='kde', figsize=(10, 10))
plt.legend(['malignant', 'benign'], loc='upper right')
plt.xlabel('mean radius')

Similarly, we could also plot the histogram: 

In [None]:
df.groupby("target")["mean radius"].hist(fill=False, figsize=(10, 10))
plt.legend(['malignant', 'benign'], loc='upper right')
plt.xlabel('mean radius')

From a DataFrame you can even plot the full scatter plot matrix:

In [None]:
from pandas.plotting import scatter_matrix
scatter_matrix(df[breast_cancer.feature_names[0:10]], c=breast_cancer.target, alpha=0.8, figsize=(20, 20), s=20)
plt.show()

Some of the input features seem highly correlated, so it usually makes sense to quantify their correlation to the other features.
We will now use seaborn: statistical data visualization to obtain the (linear) correlations between the input features.

https://seaborn.pydata.org/

In [None]:
import seaborn as sns
plt.figure(figsize=(10,10))
sns.heatmap(df[breast_cancer.feature_names[0:10]].corr(), annot=True, square=True, cmap='coolwarm')
plt.show()

## Preparing the dataset

Just like scikit-learn, Keras, takes as inputs the following objects:
 *  <h3>Design matrix $X$</h3>
 an `ndarray` of dimensions `[nb_examples, nb_features]` containing the distributions to be used as inputs to the model. Each row is an object to classify, each column corresponds to a specific variable.
 *  <h3>Target vector $y$</h3>
 an `array` of dimensions `[nb_examples]` containing the truth labels indicating the class each object belongs to (for classification), or the continuous target values (for regression).
 *  <h3>Weight vector $w$</h3> 
 (optional) an `array` of dimensions `[nb_examples]` containing the weights to be assigned to each example
 
The indices of these objects must map to the same examples.

### Task 2: Create design matrix and target vector for the first 10 features. Split the data into 70% training data and 30% testing data

It is common practice to scale the inputs to neural nets such that they have approximately similar ranges. Without this step, you might end up with variables whose values span very different orders of magnitude. This will create problems in the NN convergence due to very wild fluctuations in the magnitude of the internal weights. To take care of the scaling, we use the `sklearn` `StandardScaler`:
<a href='http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html'><img src='./images/standardscaler.jpg' style="width:800px"></a>

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 3 Training a dense neural network

## Neural network model

### Dense layer structure
 * Densely connected layer, where all inputs are connected to all outputs
 *  Linear transformation of the input vector $x \in \mathbb{R}^n$, which can be expressed using the $n \times m$ matrix $W \in \mathbb{R}^{n \times m}$ as:
<center> $u = Wx + b$ </center>
where $b \in \mathbb{R}^m$ is the bias unit

 *  All entries in both $W$ and $b$ are trainable
 *  In Keras:
 ```
keras.layers.Dense(
                    units,
                    activation=None,
                    use_bias=True,
                    kernel_initializer='glorot_uniform',
                    bias_initializer='zeros', kernel_regularizer=None,
                    bias_regularizer=None,
                    activity_regularizer=None,
                    kernel_constraint=None,
                    bias_constraint=None
)
```

 *  `input_dim` (or `input_shape`) are necessary arguments for the 1st layer of the net:

```python
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
 ```



### Activation functions
 *  Mathematical way of quantifying the activation state of a node $\rightarrow$ whether it's firing or not
 *  Non-linear activation functions are the key to Deep Learning
 *  Allow NNs to learn complex, non-linear transformations of the inputs
 *  Some popular choices:
 <img src='./images/activations_table.jpg' style='width:700px'>
 <img src='./images/activation.jpg' style='width:800px'>
 Available activations (https://keras.io/activations/#available-activations):
* **softmax**, **elu**, **selu**, **softplus**, **softsign**, **relu**, **tanh**, **sigmoid**, **hard_sigmoid**, **linear**

 Advanced Activation (https://keras.io/layers/advanced-activations/):
* **LeakyReLU**, **PReLU**


### Loss functions
* Mathematical way of quantifying how much ŷ deviates from y
* Dictates how strongly we penalize certain types of mistakes
* Cost of inaccurately classifying an event
* Many loss functions available in kears (https://keras.io/losses/)

<img src="./images/loss.jpg">

## Build a simple neural network

In [None]:
from keras.models import Sequential

model = Sequential()

In [None]:
from keras.layers import Dense

In [None]:
model.add(Dense(units=11, activation='relu', input_dim=10))
model.add(Dense(units=1, activation='sigmoid'))


In [None]:
model.summary()

Let's visualize our net:

In [None]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
import matplotlib.image as mpimg
SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))

OK, that is a rather simple model, but let's define a loss function, an optimizer, a performance metric and compile it:

In [None]:
model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

### Training

In order to train the model, we pass the training data to the fit function. However, part of the training data will be used as validation data, which is used during the training to evaluate the training process. 

In [None]:
# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
history= model.fit(X_train, y_train, validation_split=0.3, epochs=100, batch_size=8)

During the training process we have saved the loss and the accuracy of the training and validation data:

In [None]:
print(history.history.keys())

We can now plot the loss evolution over the training epochs for the training and validation dataset:

In [None]:
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='upper right')
plt.show()

Similarly, we can plot the accuracy 

In [None]:
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='lower right')
plt.show()


### Evaluation

Let's evaluate the loss and accuracy on our test data:

In [None]:
loss_and_metrics = model.evaluate(X_test, y_test, batch_size=8)
print loss_and_metrics

Let's predict classes for our test data:

In [None]:
print 'Testing...'
y_pred = model.predict(X_test, verbose = True, batch_size=8)

In [None]:
# predictions
y_pred

### Task 3: Plot the output prediction for malignant and benign breast cancer showing the separation between these two classes.

How do we decide now to which class the test example needs to assigned based on our prediction? Intuitively, we could simply convert our predictions into classes by using a threshold of 0.5:

In [None]:
y_cls=np.where(y_pred > 0.5, 1, 0)
print y_cls

In [None]:
y_cls = model.predict_classes(X_test, batch_size=1)
print y_cls

### Task 4: Use the scikit learn metrics to evaluate the model

Now, let's use scikit learn also to plot the ROC curve and calculate the AUC:

In [None]:
from sklearn.metrics import roc_curve, auc
fpr, tpr, thresholds = roc_curve(y_test, y_pred.ravel())
auc = auc(fpr, tpr)
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='NN (area = {:.3f})'.format(auc))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

## Task 5 (Bonus): Change the neural network model and study the impact on the performance
* Make the neural network wider
* Make the neural network deeper
* Change the activation function of the hidden nodes
* Change the activation function of the output node
* Change the loss function, which ones are allowed?
* Which neural network gives the best performance?