# IRIS dataset classification using ANN

In 1936, the British statistician and biologist Ronald Fisher published The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.

Two of the three species were collected on the Gaspé Peninsula "all from the same pasture, collected on the same day and measured at the same time by the same person with the same instrument".

The dataset consists of 50 samples from each of the three species of iris.

Four traits were measured for each sample: length and width of calyx and petals in centimetres. 

Identification of the fescue species by the dimensions of the different petals

- https://en.wikipedia.org/wiki/Iris_flower_data_set
- https://www.kaggle.com/arshid/iris-flower-dataset

# Task

Your goal is to create a classification neural network to distinguish different species of killer whales.

We will use the same data as in the k-means/SVM chapter, you can build on it.

As you work through this task, the following points will be completed in turn:

- Retrieve the data and display the basic statistics
- Use binary encoding to prepare the data.
- Standardize the input values using StandardScaler
- Split the data into training, test and validation

- Create a neural network model, don't be afraid to experiment with the number of hidden layers and their size
- You will probably need Dense layers, you can try the Drop layer
- Use a relay as the activation function, use a Softmax layer for the output

- When learning, use the categorical_crossentropy cost function
- The optimization algorithm will be ADAM
- You can collect other accuracy metrics

- Save the learned model to a file

- Test the learned model, find R2 and accuracy
- Plot the substitution matrix

- Plot the learning history

- Test the model on your input data (create inference)

# Data retrieval and basic statistics
- data preview
- basic statistics
- NULL values

In [1]:
import pandas as pd
data=pd.read_csv('..\dataset\IRIS.csv')

# Data visualization
- distribution of values

# Data preparation
- binary encoding
- standardization of input variables

## Neural network modelling

Neural network model
* Sequential model - data flows from the input layer through hidden layers to the output layer.
* Dense layer connects each neuron from the previous layer to each neuron in the layer.
* Drop layer randomly sets the input units to 0 at each step during the training period, which helps prevent overfitting.
* RELU and Softmax activation functions are used. 
* The last Softmax function is used in the output layer because this project is a classification project.

* Compilation is the last step in model building.
* The optimizer we use is Adam. Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to iteratively update the network weights based on the training data.
* Cost function - categorical_crossentropy, a special loss function for classification (https://keras.io/api/losses/probabilistic_losses/#categoricalcrossentropy-class).
* Another metric - Accuracy - is computed during learning.


Model: "sequential"
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 16)                80                                                                         
 dropout (Dropout)           (None, 16)                0                                                                          
 dense_1 (Dense)             (None, 16)                272                                                                        
 dropout_1 (Dropout)         (None, 16)                0                                                                          
 dense_2 (Dense)             (None, 3)                 51        
                                                                 
Total params: 403
Trainable params: 403
Non-trainable params: 0

![image.png](attachment:image.png)

# Save the model

# Show learning history

# Model Verification
- Prediction of test values
- Accuracy
- Confusion matrix of training and test data

# Using the model
The answer should therefore be Iris virginica

In [None]:
test_predictions = model.predict([[1,2,1,2]])
print (f"Iris setosa:     {test_predictions[0][0]}")
print (f"Iris versicolor: {test_predictions[0][1]}")
print (f"Iris virginica:  {test_predictions[0][2]}")