# What is the Iris Dataset?

Acording to wikipedia, the iris dataset is s a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.

The dataset included 50 samples from each of the three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample, the length and the width of the sepals and petals, in centimeters.

Using a combination of this information, Fisher developed a linear discriminant model to distinguish the species from each other.


## Iris Setosa
![Image of Iris setosa](https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/330px-Kosaciec_szczecinkowaty_Iris_setosa.jpg)

## Iris Versicolor
![Image of Iris versicolor](https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/330px-Iris_versicolor_3.jpg)

## Iris Verginica
![Image of Iris virginica](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/330px-Iris_virginica.jpg)

## Visualising The Data

We can use a python library called panda which is a library for data manipulation and analysis, to import the dataset and investiagte further into what the data set looks like.

In [1]:
# Import Panda
import pandas as pd

# Load the iris data set from a URL.
df = pd.read_csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")

# Print out the contents of the csv file
df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


We can also use a handy python library called seaborn which is a statistical data visualization library to create nice plots for us to see how each features characteristics make it unique to its iris species.

In [2]:
# Manage Import
import seaborn as sns

# Import matplotlib because seaborn is based of that, otherwise graphs will not show
import matplotlib.pyplot as plt

sns.pairplot(df, hue="species")



<seaborn.axisgrid.PairGrid at 0x1e165da24a8>

## Using Keras and tenserflow to train our mini neural network

### What is Keras?

Keras is a high-level neural networks API, which runs on top of tensorflow.

In [3]:
# Imports 

import keras as kr
import pandas as pd
import sklearn.preprocessing as pre 
import sklearn.model_selection as mod

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Next we need to load in our Iris dataset

In [4]:
## Read Dats
df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv')

##print(df)

## Split our dataset into an array of these attributes for input into our nural network
inputs = df[['petal_length','petal_width','sepal_length','sepal_width']]

##print(inputs)


### Next we want to convert our species in the dataset to binary numbers so our nural network can understand it easier and more efficient

In [5]:
## LabelBinarizer converts our species to a binary format number , each species will have a unique number 

## Example setosa -> 1,0,0 versicolor -> 0,1,0 virginica -> 0,0,1

encoder = pre.LabelBinarizer()
encoder.fit(df['species'])
outputs = encoder.transform(df['species'])

print(outputs) ## As you can see each of our outputs (Species) is now in a binary number format

[[1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 1 0]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 [0 0 1]
 

### Building our model

The overall idea for our model is that if we give our model this -> [5.1(Petal_Length),3.5(Petal_Width),1.4(Sepal_Length),0.2(Sepal Width)] we want to give us an output of => [1,0,0] (Setosa)

In [6]:
## There are two types of models we can use, sequential and Functional , Sequential is fine in most cases unless:
## multiple different input sources,
## produce multiple output destinations, or
## models that re-use layers.

## Sequential Models lets us add to our models layer-by-layer
model = kr.models.Sequential()

## Layer for our inputs units is basically our neurons
model.add(kr.layers.Dense(units=64, activation='relu',input_dim=4))

## Layer for our outputs 
## Softmax is used for probability distribution as we will be given a probability that it is either of the species
model.add(kr.layers.Dense(units=3, activation='softmax'))

## Neural networks don't optimise from accuracy , they optimise from loss , the goal is to minimise loss
model.compile(loss='categorical_crossentropy',optimizer='sgd',metrics=['accuracy'])





## Split our data

In [7]:
## We want to split our data into training data and test data
## we use the training data to fit the model and testing data to test it.

## Normally we want our training set to be 70% of the data and the test to be 30% 
## Obviusly the bigger the training set the more accurate our model will be.
inputs_train, inputs_test, outputs_train, outputs_test = mod.train_test_split(inputs, outputs, test_size=0.30)

## Our input train has 105 rows 
print(inputs_train)

## Our test train has 45 rows
print(inputs_test)




     petal_length  petal_width  sepal_length  sepal_width
96            4.2          1.3           5.7          2.9
64            3.6          1.3           5.6          2.9
78            4.5          1.5           6.0          2.9
35            1.2          0.2           5.0          3.2
19            1.5          0.3           5.1          3.8
16            1.3          0.4           5.4          3.9
26            1.6          0.4           5.0          3.4
107           6.3          1.8           7.3          2.9
148           5.4          2.3           6.2          3.4
50            4.7          1.4           7.0          3.2
149           5.1          1.8           5.9          3.0
32            1.5          0.1           5.2          4.1
41            1.3          0.3           4.5          2.3
1             1.4          0.2           4.9          3.0
112           5.5          2.1           6.8          3.0
4             1.4          0.2           5.0          3.6
99            

## Train our model

In [8]:
## We use the model.fit function to train our model,

## We fit the model with our inputs_train and outputs_train 

## We also pass an epoch number which means how many times you go through your training set

## the batch size is the iterations it goes over the data , e.g if the sample size was 150 and the batch_size was 10 
## it would go through the data as follows , 1-10, 10-20 etc...

## Note: the smaller the batch size the less memory is used because your training using less data 
## also.. Typically networks train faster with mini-batches

model.fit(inputs_train,outputs_train,epochs=5,batch_size=10)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1e1690997b8>

## Saving our model

In [9]:
# Saving our model
model.save("Iris_Model");

# Opening our model
New_Model = tf.keras.load_model("Iris_Model");

## Predictions

In [14]:
## We use model.predict to test our model with some data , note this will not be in plain species form
predictions = model.predict(inputs_test)
print(predictions)


## Change our predictions from binary back to their species (The oppisite of encoder.transform from earlier)
predictions = encoder.inverse_transform(predictions)
print(predictions)

## Compare our predictions with the real outputs
predictions == encoder.inverse_transform(outputs_test)

## 39/45 right
(predictions == encoder.inverse_transform(outputs_test)).sum()

[[0.1560715  0.43407384 0.40985462]
 [0.05667948 0.3787543  0.56456625]
 [0.09168334 0.4225641  0.48575258]
 [0.60796034 0.22692421 0.16511542]
 [0.635304   0.21213815 0.15255783]
 [0.05204175 0.39022678 0.55773145]
 [0.5929577  0.24133413 0.16570827]
 [0.15165767 0.43544456 0.4128978 ]
 [0.6473714  0.20613404 0.14649452]
 [0.12577198 0.39921296 0.47501504]
 [0.10137441 0.41706803 0.4815575 ]
 [0.654091   0.20507014 0.14083885]
 [0.15938976 0.4263528  0.4142575 ]
 [0.06802257 0.4273109  0.50466657]
 [0.09782738 0.4378166  0.464356  ]
 [0.5698882  0.25165915 0.1784527 ]
 [0.57981354 0.24040902 0.17977744]
 [0.1469108  0.43482336 0.41826582]
 [0.5574356  0.25614533 0.18641907]
 [0.6325835  0.21864952 0.14876701]
 [0.6171568  0.22619891 0.15664427]
 [0.09669524 0.41778788 0.48551688]
 [0.11569262 0.4236899  0.46061754]
 [0.08178679 0.41171157 0.5065016 ]
 [0.0365931  0.40756217 0.55584466]
 [0.65600246 0.20480445 0.13919301]
 [0.2340128  0.4079812  0.35800603]
 [0.06894407 0.38808358 0.54

39