# Tensorflow - Using the Iris Data Set
This notebook contains solutions to the Tensorflow problem sheet completed as part of my course work for the module Emerging Technologies, for further documentation refer to the following link: https://github.com/RicardsGraudins/Tensorflow-Iris

## *Exercises:*
## 01. Use Tensorflow to create model
*Use Tensorflow to create a model to predict the species of Iris from a flower’s sepal width, sepal length, petal width, and petal length.*

In [190]:
import tensorflow as tf

#Placeholders
#Input has 4 features - Sepal width & length, Petal width & length
#Output has to be 1 out of 3 - Setosa / Virginica / Versicolor
x = tf.placeholder(tf.float32,shape=[None,4])
y_ = tf.placeholder(tf.float32,shape=[None, 3])

#Variable weight & bias
#Weight - 4 input features 3 output classes
#Bias - 3 output classes
W = tf.Variable(tf.zeros([4,3]))
b = tf.Variable(tf.zeros([3]))

#Model
#Softmax function for multiclass classification
y = tf.nn.softmax(tf.matmul(x, W) + b)

#Loss function
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
cross_entropy = cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

#Optimiser
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)
#train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

#Accuracy of the model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 02. Split the data into training and testing
*Split the data set into a training set and a testing set. You should investigate the best way to do this, and list any online references used in your notebook. If you wish to, you can write some code to randomly separate the data on the fly.*

In [191]:
import numpy as np
import pandas as pd

#Import Iris data set
data = pd.read_csv('iris.csv', names=['col1','col2','col3','col4','col5'])

#Print 5 rows of data
#pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', 5)
#data

#Print first 5 rows of data 
data.head(5)

Unnamed: 0,col1,col2,col3,col4,col5
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [192]:
#Print the number of each specie in the data set
data["col5"].value_counts()

setosa        50
versicolor    50
virginica     50
Name: col5, dtype: int64

In [193]:
#Convert col5 strings to ints
setosa = np.asarray([1,0,0])
versicolor = np.asarray([0,1,0])
virginica = np.asarray([0,0,1])

data['col5'] = data['col5'].map({'setosa': setosa, 'versicolor': versicolor,'virginica':virginica})
data.head(5)

Unnamed: 0,col1,col2,col3,col4,col5
0,5.1,3.5,1.4,0.2,"[1, 0, 0]"
1,4.9,3.0,1.4,0.2,"[1, 0, 0]"
2,4.7,3.2,1.3,0.2,"[1, 0, 0]"
3,4.6,3.1,1.5,0.2,"[1, 0, 0]"
4,5.0,3.6,1.4,0.2,"[1, 0, 0]"


In [194]:
#The first 50 samples are setosa followed by 50 versicolor and lastly 50 virginica
#Need to shuffle the data to have a diverse set of testing & training data
data = data.iloc[np.random.permutation(len(data))]
data.head(5)

Unnamed: 0,col1,col2,col3,col4,col5
49,5.0,3.3,1.4,0.2,"[1, 0, 0]"
2,4.7,3.2,1.3,0.2,"[1, 0, 0]"
40,5.0,3.5,1.3,0.3,"[1, 0, 0]"
75,6.6,3.0,4.4,1.4,"[0, 1, 0]"
96,5.7,2.9,4.2,1.3,"[0, 1, 0]"


In [195]:
#The data is now shuffled but need to set a proper index
data = data.reset_index(drop=True)
data.head(5)

Unnamed: 0,col1,col2,col3,col4,col5
0,5.0,3.3,1.4,0.2,"[1, 0, 0]"
1,4.7,3.2,1.3,0.2,"[1, 0, 0]"
2,5.0,3.5,1.3,0.3,"[1, 0, 0]"
3,6.6,3.0,4.4,1.4,"[0, 1, 0]"
4,5.7,2.9,4.2,1.3,"[0, 1, 0]"


In [196]:
#The data is now shuffled with proper index
#Split the data into training & testing

#Training data
x_input = data.loc[0:105,['col1','col2','col3','col4']]
temp = data['col5']
y_input = temp[0:106]

#Testing data
x_test = data.loc[106:149,['col1','col2','col3','col4']]
y_test = temp[106:150]

# 03. Train the model
*Use the testing set to train your model.*

In [197]:
#Create the session
sess = tf.InteractiveSession()
#Initialize the variables
tf.global_variables_initializer().run()

#Set the epoch(number of iterations)
epoch = 1000

#Train the model
for step in range(epoch):
    sess.run([train_step,cross_entropy], feed_dict={x: x_input, y_:[t for t in y_input.as_matrix()]})

# 04. Test the model
*Use the testing set to test your model, clearly calculating and displaying the error rate.*

In [198]:
#Testing on a sample from the testing data @ index 145
a = data.loc[145,['col1','col2','col3','col4']]
b = a.values.reshape(1,4)

largest = sess.run(tf.argmax(y,1), feed_dict={x: b})[0]
if largest == 0:
    print("The Flower is :Iris-Setosa")
elif largest == 1:
    print("The Flower is :Iris-Versicolor")
else :
    print("The Flower is :Iris-Virginica")

#Double checking if that is the correct flower by looking at 145 col 5
data.tail(5)

The Flower is :Iris-Virginica


Unnamed: 0,col1,col2,col3,col4,col5
145,6.4,3.1,5.5,1.8,"[0, 0, 1]"
146,5.1,2.5,3.0,1.1,"[0, 1, 0]"
147,6.1,2.6,5.6,1.4,"[0, 0, 1]"
148,6.0,2.2,4.0,1.0,"[0, 1, 0]"
149,6.4,2.7,5.3,1.9,"[0, 0, 1]"


In [199]:
#Determine the accuracy using the testing data
print(sess.run(accuracy,feed_dict={x: x_test, y_:[t for t in y_test.as_matrix()]}))

0.977273


# References
https://github.com/emerging-technologies/keras-iris   
https://www.tensorflow.org/get_started/mnist/beginners    
https://www.tensorflow.org/get_started/mnist/pros   
https://pandas.pydata.org/pandas-docs/stable/10min.html