# Using Iris Data Set with TensorFlow

Iris is perhaps the best known dataset which originated from a 1936 research paper by a British statistician and biologist Ronald Fisher. The dataset is often used for testing out machine learning algorithms and visualizations e.g. Scatter Plot. Each row of the table represents an iris flower, including its 3 types of species and measurements of its botanical parts, sepal and petal, in centimeters.
#### References:
- UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/iris
- Wikipedia, Iris flower Data Set, https://en.wikipedia.org/wiki/Iris_flower_data_set

## 1. Use TensorFlow to create a model
Use Tensorflow to create a model to predict the species of Iris from a flower’s sepal width, sepal length, petal width, and petal length.

In [5]:
#importing python libraries
import numpy as np
import csv
import keras as kr

#Load dataset from csv
# Open and read through file
iris = list(csv.reader(open('IRIS.csv')))[1:] # [1:] = ignore the first row and start from 2nd row 

# We need to separate the data into 2 arrays, inputs and outputs

# inputs contains sepal length, sepal width, petal length, petal width converted as floats
inputs = np.array(iris)[:,:4].astype(np.float) # [:,:4] = give us all the rows and the first 4 columns in rows

# outputs contains the 3 species as strings: setosa, versicolor and virginica
outputs = np.array(iris)[:,4] # [:,4] selects the last column which is the species

# Converting the output strings to integers.
outputs_vals, outputs_ints = np.unique(outputs, return_inverse=True)
# output_vals represents the strings
# output_ints represents the integers
# The 1st string corresponds with the 1st interger in the array

# Encoding the integers as binary categorical variables.
# basically creating a binary matrix 
# E.g. if output_ints integer is 0 then encoded into 1,0,0 or if 1 then its 0,1,0 or if 2 then its 0,0,1
outputs_cats = kr.utils.to_categorical(outputs_ints)
# This means that if the output is:
# (1,0,0) = setosa
# (0,1,0) = versicolor
# (0,0,1) = virginica

# Creating model and a neural network
# model is used to organise layers
model = kr.models.Sequential() # using sequential model which is a linear stack of layers

# stacking 4 layers. 
# Add an initial layer with 4 input nodes and a hidden layer with 16 nodes/neurons.
model.add(kr.layers.Dense(16, input_shape=(4,)))
# Applying the sigmoid activation function to that layer.
model.add(kr.layers.Activation("sigmoid"))
# Adding another layer, connected to the layer with 16 nodes/neurons, containing 3 output nodes 
model.add(kr.layers.Dense(3))
# Using the softmax activation function here to ensure the output values are in range of 0 and 1.
model.add(kr.layers.Activation("softmax"))


## 2. Split the data into training and testing
Split the data set into a training set and a testing set. You should investigate the best way to do this, and list any online references used in your notebook. If you wish to, you can write some code to randomly separate the data on the fly.

In [None]:
# Split the input and output data sets into training and test subsets
inds = np.random.permutation(len(inputs)) # Shuffling the array.. randomly change order of the indicies

#Split the array into 2. first batch of indicies go into train and 2nd batch go into test
train_inds, test_inds = np.array_split(inds, 2)

# Organising the data into training and testing groups.
# inputs_train takes in the shuffled train_inds
# outputs_train takes in the shuffled train_inds in the encoded binary matrix array
inputs_train, outputs_train = inputs[train_inds], outputs_cats[train_inds]
# inputs_test takes in the shuffled test_inds
# outputs_test takes in the shuffled test_inds in the encoded binary matrix array
inputs_test,  outputs_test  = inputs[test_inds],  outputs_cats[test_inds]



## 3. Train the model
Use the testing set to train your model.

## 4. Test the model
Use the testing set to test your model, clearly calculating and displaying the error rate.