# Using Iris Data Set with TensorFlow

Iris is perhaps the best known dataset which originated from a 1936 research paper by a British statistician and biologist Ronald Fisher. The dataset is often used for testing out machine learning algorithms and visualizations e.g. Scatter Plot. The data set contains 150 rows of data, each row contains the following data for each flower sample: sepal length, sepal width, petal length, petal width in centimeters., and the 3 types of flower species.
#### References:
- UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/iris
- Wikipedia, Iris flower Data Set, https://en.wikipedia.org/wiki/Iris_flower_data_set
- TensorFlow, Iris tutorial, https://www.tensorflow.org/get_started/estimator

We will be using [Keras](https://keras.io/) which is a high level neural networks API running on top of [TensorFlow](https://www.tensorflow.org/) for this problem sheet.

## 1. Use TensorFlow to create a model
Use Tensorflow to create a model to predict the species of Iris from a flower’s sepal width, sepal length, petal width, and petal length.

In [21]:
#importing python libraries
import numpy as np
import csv
import keras as kr

#Load dataset from csv
# Open and read through file
iris = list(csv.reader(open('IRIS.csv')))[1:] # [1:] = ignore the first row and start from 2nd row 

# We need to separate the data into 2 arrays, inputs and outputs

# inputs contains sepal length, sepal width, petal length, petal width converted as floats
inputs = np.array(iris)[:,:4].astype(np.float) # [:,:4] = give us all the rows and the first 4 columns in rows

# outputs contains the 3 species as strings: setosa, versicolor and virginica
outputs = np.array(iris)[:,4] # [:,4] selects the last column which is the species

# Converting the output strings to integers.
outputs_vals, outputs_ints = np.unique(outputs, return_inverse=True)
# output_vals represents the strings
# output_ints represents the integers
# The 1st string corresponds with the 1st interger in the array

# species are represented as integers, with 0 denoting setosa, 
# 1 denoting versicolor and 2 denoting virginica

# Encoding the integers as binary categorical variables.
# basically creating a binary matrix 
# E.g. if output_ints integer is 0 then encoded into 1,0,0 or if 1 then its 0,1,0 or if 2 then its 0,0,1
outputs_cats = kr.utils.to_categorical(outputs_ints)
# This means that if the output is:
# (1,0,0) = setosa
# (0,1,0) = versicolor
# (0,0,1) = virginica

# Creating model and a neural network
# model is used to organise layers
model = kr.models.Sequential() # using sequential model which is a linear stack of layers

# stacking 4 layers. 
# Add an initial layer with 4 input nodes and a hidden layer with 16 nodes/neurons.
model.add(kr.layers.Dense(16, input_shape=(4,)))
# Applying the sigmoid activation function to that layer.
model.add(kr.layers.Activation("sigmoid"))
# Adding another layer, connected to the layer with 16 nodes/neurons, containing 3 output nodes 
model.add(kr.layers.Dense(3))
# Using the softmax activation function here to ensure the output values are in range of 0 and 1.
model.add(kr.layers.Activation("softmax"))


## 2. Split the data into training and testing
Split the data set into a training set and a testing set.

In [22]:
# Split the input and output data sets into training and test subsets
inds = np.random.permutation(len(inputs)) # Shuffling the array.. randomly change order of the indicies

#Split the array into 2. first batch of indicies go into train and 2nd batch go into test
train_inds, test_inds = np.array_split(inds, 2)

# Organising the data into training and testing groups.
# inputs_train takes in the shuffled train_inds
# outputs_train takes in the shuffled train_inds in the encoded binary matrix array
inputs_train, outputs_train = inputs[train_inds], outputs_cats[train_inds]
# inputs_test takes in the shuffled test_inds
# outputs_test takes in the shuffled test_inds in the encoded binary matrix array
inputs_test,  outputs_test  = inputs[test_inds],  outputs_cats[test_inds]



## 3. Train the model
Use the testing set to train your model.

In [23]:
# Configure the model for training.
# Uses the adam optimizer and categorical cross entropy as the loss function.
# Add in some extra metrics - accuracy being the only one.
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])


# Fit the model using our training data.
# input_train = training data
# output_train = target data
# epochs means how many times to train
# batch_size is going to take one training example at a time
# verbose is used to log the model being trained
model.fit(inputs_train, outputs_train, epochs=100, batch_size=1, verbose=1)



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x2191b8eda20>

## 4. Test the model
Use the testing set to test your model, clearly calculating and displaying the error rate.

In [24]:
# Evaluate the model using the test data set.
# model.evaluate compare answers
# inputs_test = input data
# output_test = target (label) data
# verbose=1 means in verbose mode 1 which is a progress bar
# Verbosity mode: 0 = silent, 1 = progress bar, 2 = one line per epoch.
loss, accuracy = model.evaluate(inputs_test, outputs_test, verbose=1)

# Output the accuracy of the model.
print("\n\nLoss: %6.4f\tAccuracy: %6.4f" % (loss, accuracy))




Loss: 0.1364	Accuracy: 0.9867


## Prediction

In [25]:
# Predict the class of a single flower.
# using model.predict
# 
prediction = np.around(model.predict(np.expand_dims(inputs_test[0], axis=0))).astype(np.int)[0]


print("Actual: %s\tEstimated: %s" % (outputs_test[0].astype(np.int), prediction))
print("That means it's a %s" % outputs_vals[prediction.astype(np.bool)][0])

Actual: [0 0 1]	Estimated: [0 0 1]
That means it's a virginica


In [26]:
# Save the model to a file for later use. 
# faster to run next time
model.save("iris_nn.h5")
# Load the model again with: model = load_model("iris_nn.h5")