### Tensorflow Problem Sheet
In this problem sheet I will be using keras with tensorflow to predict the species of Iris from a flowers sepal length and width and a petals length and width.

The aim of this problem sheet is to get a better understanding of how tensorflow works.

Code for this solution has been adapted from: https://github.com/emerging-technologies/keras-iris

### What is TensorFlow?

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. 

It is an extremely powerful library capable of working with Both CPUs and GPUs. Deep Neural networks are designed, trained and run using this library. 

In this problem sheet we are going to use Keras API to simplify a lot of stuff for us.

### What is Keras?

Keras is a high-level neural networks API, written in Python and capable of running on top of __TensorFlow__, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go __from idea to result with the least possible delay__ is key to doing good research. If you wish to read more about Keras I'd suggest you read the following article; [Guide to the Functional API](https://keras.io/getting-started/functional-api-guide/)

In [12]:
import csv
import numpy as np
import keras as kr
from keras.models import load_model

### Iris Dataset - what is it?

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus".

The Iris data set contains 50 samples each of 3 different species of flowers:
 - Iris Setosa
 - Iris Virginica
 - Iris Versicolor
 
The data has 4 measurements from each sample:
 - Sepal length
 - Sepal Width
 - Petal length
 - Petal width

So let's start with loading and formatting the iris data into a format that tensorflow likes.

1. Load the Iris dataset. (Data obtained from: https://github.com/mwaskom/seaborn-data/blob/master/iris.csv)

2. The inputs are four floats: sepal length, sepal width, petal length, petal width.

3. Outputs are initially individual strings: setosa, versicolor or virginica.

In [13]:
iris = list(csv.reader(open('iris.csv')))[1:]
inputs  = np.array(iris)[:,:4].astype(np.float)
outputs = np.array(iris)[:,4]

### What are caterogical variables?

In many practical Data Science activities, the data set will contain categorical variables. These variables are typically stored as text values which represent various traits. Some examples include color (“Red”, “Yellow”, “Blue”), size (“Small”, “Medium”, “Large”) or geographic designations (State or Country). Regardless of what the value is used for, the challenge is determining how to use this data in the analysis. 

We are going to use Keras to encode integers as binary caterogical variables using __to_categorical__ method.

### to_caterogical method

<div class="alert alert-block alert-info">keras.utils.to_categorical(y, num_classes=None)</div>

Converts a class vector (integers) to binary class matrix.

E.g. for use with categorical_crossentropy.

#### Arguments

* y: class vector to be converted into a matrix (integers from 0 to num_classes).
* num_classes: total number of classes.

#### Returns

* A binary matrix representation of the input.


So, first of all we are going to convert the output strings to integers, and then we will encode the category integers as binary categorical variables using the above method. Then, we are going to split the input and output data sets into training and test subsets. 

In [27]:
outputs_vals, outputs_ints = np.unique(outputs, return_inverse=True)
outputs_cats = kr.utils.to_categorical(outputs_ints)
inds = np.random.permutation(len(inputs))
train_inds, test_inds = np.array_split(inds, 2)
inputs_train, outputs_train = inputs[train_inds], outputs_cats[train_inds]
inputs_test,  outputs_test  = inputs[test_inds],  outputs_cats[test_inds]

### What is the neural network

Neural Networks have been in the spotlight for quite some time now. For a more detailed explanation on neural network and deep learning read [here](https://www.analyticsvidhya.com/blog/2016/08/evolution-core-concepts-deep-learning-neural-networks/). Its “deeper” versions are making tremendous breakthroughs in many fields such as image recognition, speech and natural language processing etc.

The main question that arises is when to and when not to apply neural networks? You have to keep a few things in mind:

__Firstly, neural networks require clear and informative data (and mostly big data) to train.__ Try to imagine Neural Networks as a child. It first observes how its parent walks. Then it tries to walk on its own, and with its every step, the child learns how to perform a particular task. It may fall a few times, but after few unsuccessful attempts, it learns how to walk. If you don’t let it walk, it might not ever learn how to walk. The more exposure you can provide to the child, the better it is.

__It is prudent to use Neural Networks for complex problems such as image processing.__ Neural nets belong to a class of algorithms called representation learning algorithms. These algorithms break down complex problems into simpler form so that they become understandable (or “representable”).

__When you have appropriate type of neural network to solve the problem.__ Each problem has its own twists. So the data decides the way you solve the problem. For example, if the problem is of sequence generation, recurrent neural networks are more suitable. Whereas, if it is image related problem, you would probably be better of taking convolutional neural networks for a change.

__Last but not the least, hardware requirements are essential for running a deep neural network model.__ Neural nets were “discovered” long ago, but they are shining in the recent years for the main reason that computational resources are better and more powerful. If you want to solve a real life problem with these networks, get ready to buy some high-end hardware!

![Graph](https://www.analyticsvidhya.com/wp-content/uploads/2016/08/Artificial-Intelligence-Neural-Network-Nodes-670x440.jpg)

Now we will create our own neural network for Iris dataset. We will create with creating a model, add initial layer with 4 nodes and a hidden layer with 16 nodes. We will apply the sigmoid activation function to that layer. But wait, what's sigmoid activation? A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. Often, sigmoid function refers to the special case of the logistic function shown in the first figure and defined by the formula:

![formula image](https://wikimedia.org/api/rest_v1/media/math/render/svg/9537e778e229470d85a68ee0b099c08298a1a3f6)

We will add another layer, connected to the one with 16 nodes, containing three output nodes. Lastly we will use softmax activation function there. 

What is softmax function?

Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.

What's the difference between them two?

* Softmax: Used for the multi-classification task.
* Sigmoid: Used for the binary classification task.

![difference](https://i0.wp.com/dataaspirant.com/wp-content/uploads/2017/03/SigmoidVsSoftmax-compressor.jpg?w=700)

In [16]:
model = kr.models.Sequential()
model.add(kr.layers.Dense(16, input_shape=(4,)))
model.add(kr.layers.Activation("sigmoid"))
model.add(kr.layers.Dense(3))
model.add(kr.layers.Activation("softmax"))

* We will next configure the model for training. 

* Uses the adam optimizer and categorical cross entropy as the loss function.

* Add in some extra metrics - accuracy being the only one.

#### What's adam optimizer?

Adam is an optimization algorithm that can used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.

Adam was presented by Diederik Kingma from OpenAI and Jimmy Ba from the University of Toronto in their 2015 ICLR paper (poster) titled “Adam: A Method for Stochastic Optimization“. I will quote liberally from their paper in this post, unless stated otherwise.

In [17]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

* Fit the model using our training data.
* Evaluate the model using the test data set.
* Output the accuracy of the model.

### fit 
Trains the model for a fixed number of epochs (iterations on a dataset).

### evaluate
Returns the loss value & metrics values for the model in test mode.

Computation is done in batches.

In [23]:
model.fit(inputs_train, outputs_train, epochs=100, batch_size=1, verbose=1)
loss, accuracy = model.evaluate(inputs_test, outputs_test, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

In [24]:
print("\n\nLoss: %6.4f\tAccuracy: %6.4f" % (loss, accuracy))



Loss: 0.1053	Accuracy: 0.9600


Predict the class of a single flower.

In [25]:
prediction = np.around(model.predict(np.expand_dims(inputs_test[0], axis=0))).astype(np.int)[0]
print("Actual: %s\tEstimated: %s" % (outputs_test[0].astype(np.int), prediction))
print("That means it's a %s" % outputs_vals[prediction.astype(np.bool)][0])

Actual: [0 1 0]	Estimated: [0 1 0]
That means it's a versicolor


Save the model to a file for later use, and try loading it from saved file.

### Saving models
How to save a Keras model?

Saving/loading whole models (architecture + weights + optimizer state)

It is not recommended to use pickle or cPickle to save a Keras model.

We can use __model.save(filepath)__ to save a Keras model into a single HDF5 file which will contain:

* the architecture of the model, allowing to re-create the model
* the weights of the model
* the training configuration (loss, optimizer)
* the state of the optimizer, allowing to resume training exactly where you left off.

You can then use __keras.models.load_model(filepath)__ to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration (unless the model was never compiled in the first place).

In [28]:
model.save("iris_nn.h5")
model = load_model("iris_nn.h5")

References used for this problem sheet

1. https://www.analyticsvidhya.com/blog/2016/10/an-introduction-to-implementing-neural-networks-using-tensorflow/
2. https://www.analyticsvidhya.com/blog/2016/08/evolution-core-concepts-deep-learning-neural-networks/
3. https://en.wikipedia.org/wiki/Sigmoid_function
4. http://dataaspirant.com/2017/03/07/difference-between-softmax-function-and-sigmoid-function/
5. https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
6. https://keras.io/models/model/
7. https://keras.io/getting-started/faq/