# The Iris Classification with Tensorflow

In this project, we build and train a model to categorize the Iris flower base the sample data. The algorithm we chose to implement for this project is multilayer perceptron. The Iris data set is available from the University of California at Irvine (UCI) Machine Learning Repository. The dataset consists of information on 150 Iris flowers. We focus on learning three Iris species which are Setosa, Versicolour, and Virginica. The dataset is characterized by five attributes:
    1. Sepal Length in centimeters
    2. Sepal Width in centimeters
    3. Petal Length in centimeters
    4. Petal Width in centimeters
    5. Targeting Class (Setosa, Vericolour, Virginica)

|![setosa](/images/iris-setosa-1.jpeg)|![versicolor](/images/Iris_versicolor_3.jpg)|![virginica](/images/iris_virginica.jpeg)|
|:---|:---:|---:|
|Iris Setosa|Iris Versicolor|Iris Virginica|

## Learning Algorithm

### Multilayer Artificial Neural Network

We designed a Neural Network with very simple structure. The network contain 4 layers which are input, hidden, and output layers. The input layers consists of four nodes. The input for the network is a vector of Iris features which are Sepal Length, Sepal Width, Petal Length, and Petal Width. There are two hidden layers with 10 nodes on each of the layers. We chose to use linear function, Relu, as the activation functions. The predicted probability of the Iris class will send to the output layer which represents Iris flower classes. The figure below shows the model Neuro Network.

![NeuralNet.png](/images/NeuralNet.png)

There are various types of Neuro Networks. In this project, we chose to build the fully connected Neuro Networks, also known as Dense networks. The neurons from the input layer are connected each neurons in the first hidden layer. The same way applys to second and final output layers. There is a single bias unit that is connected to each neurons in the hidden layers. The neurons in the hidden layers compute the weighted sum from the inputs to form the scalar net activation. We write the equation as in the figure below:

\begin{equation*}
\mathbf{y} = \mathbf{\sum_{i=1}^n x_i w_i} + \mathbf{bias}
\end{equation*}

The subscript i indexes neurons in the input layer. The w denotes the input to hidden layer weights at the hidden layer neurons. Such weights are also named synapses and the values of the connections the synaptic weights. The output y can be thought of as a function of input feature vector x. When there are k output neurons, we can think of the network as computing k discriminant functions and classify the input according to which discriminant function is the largest.

In this project, we express the Neuro Network to Mathematical functions as:

\begin{equation*}
\mathbf{\output} = \mathbf{\left(\sum_{j=1}^n w_j_m\left(\sum_{i=1}^n w_i_n\left(\sum_{q=1}^n w_q_o x + bias_q\right) + bias_n\right)+bias_m\right)}
\end{equation*}

The Wjm here denotes the weights in layer m and Win denotes the weights in layer n. The bias_q, bias_n, and bias_m denote the bias vector in layer q, n, and m accordingly. The output is a vector of dimension of three which represents the probability of the three classes.

###  Training Error

The training error for the model that we build and train for this project to be the least mean square error which is sum over the output neurons of the squared difference between the actual desired value z and the actual ouput t. The equation is denoted below:

\begin{equation*}
\mathbf{E\left(w\right)} = \mathbf{\frac{1}{2}} \mathbf{\sum_{k=1}^n \left(z_k-t_k\right)^2}
\end{equation*}

where z and t are the desired value and actual network output. The purpose for applying least mean square error and gradient descent is to train the model learning weights. The weights are initialized with value zeros, and then changed in a direction that will reduce error:
\begin{align}
    \mathbf{\nabla w} = mathbf{-n} mathbf{\frac{\partial\mathbf{E}}{\partial\w}
\end{align}

where n is the learning rate which indicates the relative size of the change in weights. The weight values is updated as follow:

\begin{align}
        \mathbf{W(k+1)}=\mathbf{W(k)} \mathbf{-n} \frac{\partial\mathbf{E}}{\mathbf{\partial w}}
\end{align}

Because the error function is given analytically, it is differentiable. The partial derivative with respect to w is:

\begin{align}
\frac{\partial{\mathbf{E}}}{\partial\mathbf{w}}=-\mathbf{(z-w^t x)x}
\end{align}

Substituting the result into the weight update equation, the resulting function is as below:

\begin{align}
\mathbf{W(k+1)}=\mathbf{W(k)} - \mathbf{n(z-w^tx)x}
\end{align}

where n is the learning rate, range 0<n<2 for the learning process to converge. In practice, we iterate the learning process to a specified threshold. The final computation in the hidden layers is the activation function. We chose the rectifier linear unit (ReLU) as the activation function for our network.

## Training Dataset Processing

In this project, we download the training dataset from "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv" and store it onto the local hard drive.

We use the tf.data.experimental.make_csv_dataset to parse the training dataset. The output is the file consisting a tuple of Iris features and corresponding labels. The size of the file is the number of rows in the batch.

In [None]:
# Total 120 rows of Iris features.
# Chosing first 32 rows of data for training.

train_data_location = "C:/Users/JunnanLu123/iris_training.csv"
train_data_filePt = (train_file_fp, 32, column_names = column_names, label_name = label_name, num_epochs = 1)

In [None]:
# The columns of the data set.

column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

# The first four fields are flower features 
# representing flower measurements.

feature_names = column_names[:-1]

# The last column is the flower label which we
# hope the model could predict in high accuracy.

label_name = column_names[-1]

In [None]:
# Each class is associated with the string name such as setosa.
# The class name are mapped to then list representation such as
# 0, Iris setosa
# 1, Iris versicolor
# 2, virginica

class_names = ['Iris setosa', 'Iris versicolor', 'Iris virginica']

In [None]:
# The input data is a file of CSV format. Therefore, we call
# tf.data.experimental.make_csv_dataset to make the training
# dataset. The make_csv_dataset returns a tf.data.Dataset which
# consists of feature and label pairs.

train_dataset = tf.data.experimental.make_csv_dataset(train_file_fp, 32, column_names = column_names, label_name = label_name, num_epochs = 1)

In [None]:
# The train_dataset object is iterable. The return value is a tuple 
# of features and label pair. The label here denotes the flower
# classes.

self.features, self.label = next(iter(train_dataset))

Each row of the sample data is corresponding to the feature array which is grouped together in a batch. We set the default batch size to 32.

In [None]:
# We call tf.stack to create a combined tensor at 
# the specified dimension. And we pack the features
# into a single array.

def pack(self, features, label):
    self.features = tf.stack(list(features.values()), axis=1)
return features, label

In [None]:
# Each features such as (feature, label) pair mapped into training
# dataset by using tf.data.Dataset.map.

dataset = dataset.map(pack)

We plot the figure to visualize the flower features. The batch size is 32 rows of feature and label pairs.

In [None]:
### Visualize the flower data from the dataset batch.
### The batch size is 32 rows of feature and label pairs.

plt.scatter(self.features['petal_length'], self.features['petal_width'], c=self.label, cmap='viridis')
plt.xlabel('petal_length')
plt.ylabel('petal_width')
plt.show()

The figure on the left displays the distribution of features for sepal length and sepal width. The figure on the right displays the features for petal length and petal width. 

|![sepal][/images/Feature1.png]|![petal][/images/Feature2.png]|
|:---|---:|
|Sepal Length and Width|Petal Length and Width|