### Tensorflow Problem Sheet
In this problem sheet I will be using keras with tensorflow to predict the species of Iris from a flowers sepal length and width and a petals length and width.


The aim of this problem sheet is to get a better understanding of how tensorflow works.

### What is Tensorflow and Keras?

Tensorflow is a popular software library for dataflow programming across a range of tasks. Tensorflow is open-source and is developed by the Google Brain Team. Tensorflow is a symbolic math library and is also used for machine learning applications such as neaural networks [1]. I will be using Tensorflow's Python API but it is available for a range of languages.

Keras is an open source neural network library written in Python developed by a Google engineer: Francois Chollet. Keras acts like a "library on top of a library" as it is capable of running on top of MXNet, Deeplearning4j, Tensorflow, CNTK or Theano. Keras takes the functionality in core Tensorflow and adds a higher-level of abstraction to it, making it easier to experiment with deep neaural networks [2].


### Create the Tensorflow model

I'm using Keras so instead of importing tensorflow, I can import Keras which uses tensorflow as the backend.
I also import additional useful libraries such as numpy for dealing with complicated arrays and csv to read the iris csv dataset.

In [1]:
import numpy as np
import keras as kr
import csv
import re

Using TensorFlow backend.


The iris dataset contains 150 rows of data, the dataset I'm using is ordered. The first 50 are setosa, the next 50 are versicolor and the last 50 are virginica. Each row contains 5 different pieces of information about the flower: the sepal length, the sepal width, the petal length, the petal width and finally the iris class (e.g: setosa, virginica etc.)

I can use the 'csv' library to read in the iris dataset and to store it into relevant numpy arrays to later use that data to create the model.

In [2]:
# Initiate iris as a list with the conents of IRIS_dataset.csv line by line starting on the first line ([0:])
iris = list(csv.reader(open('IRIS_dataset.csv')))[0:]

# Expected to be 150
print("Length of the list 'iris':",len(iris))

# Expected to be of class versicolor and first 4 indexs to be float variables.
print("51st element of list 'iris':",iris[50])

Length of the list 'iris': 150
51st element of list 'iris': ['7.0', '3.2', '4.7', '1.4', 'Iris-versicolor']


Now that the iris dataset has been loaded into an array successfully, I will split the data into input and outputs.

From looking at the data above, the first 4 floats look like they should be the input because they make up the class of Iris. If the first 4 elements are the inputs then the 5th element is the output (class of iris).

I can use numpy to create an array of inputs and outputs.

In [3]:
# Initiate inputs as a numpy array that's a subset of iris - reading the first 4 indices
# as floats representing the sepal length/width and petal length/width
# [:,:4] is numoy notation for reading a 2D array and splicing it.
# It means: "Take all rows of iris, within each row, return the first 4 indices as floats"
inputs = np.array(iris)[:,:4].astype(np.float)

# Initiate outputs as a numpy array that's a subset of iris - reading the last index
# representing the iris class.
# [:,4:] means: "Take all rows, and within each row splice the first 4 indices and return the remaining"
outputs = np.array(iris)[:,4:]

# Expected to be [7.0, 3.2, 4.7, 1.4] & Iris-versicolor - same as the output of the 51st element above.
print("51st element of inputs & outputs:",inputs[50],"&",outputs[50])

51st element of inputs & outputs: [ 7.   3.2  4.7  1.4] & ['Iris-versicolor']


The expected values and actual values above are correct meaning that I have successfully separated the iris list into inputs and outputs arrays.

Now the outputs array needs some work because it's basically an array of recurring strings i.e: "Iris-setosa" x 50, "Iris-versicolor" x 50 etc.

A better representation would be to split this array into two  - one for the unique string that occurs e.g: ["Iris-setosa","Iris-versicolor","Iris-virginica"] and another for the index of where that number occurs e.g: Iris-seotsa occurs between in the range of 0..49 etc.

Numpy's '.unique' function allows to do exactly that.

In [34]:
# Explanation found on - https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
# Initiate output_vals to the unique values that occur in 'outputs'
# Initiate outputs_ints to an array of the indices where the unique values sit.
outputs_vals, outputs_ints = np.unique(outputs, return_inverse=True)

# Expected to print out ["Iris-setosa" "Iris-versicolor" "Iris-virginica"] 
print("Unique values in the array 'outputs':",outputs_vals)
# Expected to print out something like [0 0 0 0 0 .. 1 1 1 1 1 .. 2 2 2 2 2 ..]
print("Where the unique values occur in the array 'outputs':\n%s"%(outputs_ints))
# Expected to be Iris-versicolor
print("Class of iris that sits on the 51st index of outputs_ints:",outputs_vals[outputs_ints[50]])

Unique values in the array 'outputs': ['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']
Where the unique values occur in the array 'outputs':
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
Class of iris that sits on the 51st index of outputs_ints: Iris-versicolor


### Citation

[1]https://en.wikipedia.org/wiki/TensorFlow

[2]https://en.wikipedia.org/wiki/Keras

### End