# Iris classification
Classification neural networks (NN) will often have as many output neurons as there are possible classes. The NN will return a confidence for each output as a normalized value from 0 to 1.

In [1]:
import pandas as pd
import tensorflow as tf
import numpy as np 
from sklearn import metrics

### Fetching data
We use pandas to read in a csv file over HTTP.

In [9]:
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?']
)
print(df.shape)
df[:5]

(150, 5)


Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Next we convert our data into something we can pass to the NN. We use pandas `get_dummies` to get dummies/indicators for our data set.

In [26]:
columns = ['sepal_l', 'sepal_w', 'petal_l', 'petal_w']
x = df[columns].values

dummies = pd.get_dummies(df['species'])
y = dummies.values
print(y.shape)

(150, 3)


In [27]:
dummies[:5]

Unnamed: 0,Iris-setosa,Iris-versicolor,Iris-virginica
0,1,0,0
1,1,0,0
2,1,0,0
3,1,0,0
4,1,0,0


### Building the NN
For our classification NN, we will use a two layer [dense NN](https://heartbeat.fritz.ai/classification-with-tensorflow-and-dense-neural-networks-8299327a818a) with a 3 neuron output, using a [softmax](https://en.wikipedia.org/wiki/Softmax_function) activation. Softmax is also known as the *normalized exponential function* in the form
$$
\sigma(\vec{z})_i = \frac{e^{z_i}}{\sum^K_{j=1}e^z_j}, \ \ \ \  i \in [1, K]\ \ \text{and} \ \ \vec{z} \in \textbf{R}^K
$$
This function normalizes the vector $\vec{z}$ into a probability distribution of K probabilities, such that $\sigma(\vec{z})_i \in [0, 1]$, and
$$
\sum_{i=1}^K \sigma(\vec{z})_i = 1
$$

In physics, this is the [Boltzmann distribution](https://en.wikipedia.org/wiki/Boltzmann_distribution), with denominator being $Z$, the partition function, and the exponentials mapping to energy with a 'coldness' parameter $\beta = -1 / kT$, such that
$$
e^{z_i} \rightarrow e^{\beta z_i}
$$
This $\beta$ may also exist in NN, and denotes the base.

We choose softmax for the output layer activation, as it normalizes our output for us implicitly.

*Note*: 'dense' in this context is used to descibe that each neuron is fully connected to the next layer.

In [28]:
model = tf.keras.models.Sequential()

model.add( # hidden 1
    tf.keras.layers.Dense(
        50,
        input_dim=x.shape[1],
        activation='relu'
    )
)
model.add( # hidden 2
    tf.keras.layers.Dense(
        25,
        activation='relu'
    )
)
model.add( # output
    tf.keras.layers.Dense(
        y.shape[1],
        activation='softmax'
    )
)

We use [catagorical cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy) as the loss function, for discrete probability distributions
$$
H(p, q) = - \sum_{x \in \chi} p(x) \log q(x)
$$
giving the entropy of distribution $q$ relative to distribution $p$ for a given set (in this case, support $\chi$ is the discrete count of elements in the vectors). The distributions are modelled with logistics functions
$$
\frac{1}{1 + e^{-z}}
$$
where $p$ is the true probability, and $q$ is the predicted probability.

We use this loss function, as we may need to take into account the dimensionality of the output; it is well suited for classification, since the true probability will often be a one-hot vector, thus selects how well the models prediction is (discussed a little [here](https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/categorical-crossentropy)).

In [29]:
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x,y,verbose=0,epochs=100)

<tensorflow.python.keras.callbacks.History at 0x150318ac0>

### Results
We consider the column with the highest probability to be the 'prediction' of the neural network:

In [36]:
# extract species names from raw data 
species = dummies.columns

predictions = model.predict(x)
# cast to int for rounding
pred_as_int = np.around(predictions).astype(np.uint8)
print(f"{predictions[0]} -> {pred_as_int[0]}")

[9.9815530e-01 1.8442589e-03 4.2634110e-07] -> [1 0 0]


We can decode our predictions by selecting the maximal column.

In [39]:
pred_classes = np.argmax(predictions, axis=1)
expd_classes = np.argmax(y, axis=1)

# get 10 random examples
for i in np.random.randint(0, len(pred_classes), 5):
    print(f"--- index {i} -------")
    print(f"Prediction:\t{species[pred_classes[i]]}")
    print(f"Expected:\t{species[expd_classes[i]]}\n")

--- index 127 -------
Prediction:	Iris-virginica
Expected:	Iris-virginica

--- index 70 -------
Prediction:	Iris-versicolor
Expected:	Iris-versicolor

--- index 67 -------
Prediction:	Iris-versicolor
Expected:	Iris-versicolor

--- index 35 -------
Prediction:	Iris-setosa
Expected:	Iris-setosa

--- index 13 -------
Prediction:	Iris-setosa
Expected:	Iris-setosa



We can evaluate the accuracy of our model with an [accuracy score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), which is the fraction of correct predictions:

In [41]:
accuracy = metrics.accuracy_score(
    expd_classes,
    pred_classes
)
print("Accuracy = {}".format(accuracy))

Accuracy = 0.98
