# Iris - Species Classification

This notebook demonstrates the usage of a neural network built entirely from scratch using only NumPy. The main project, located in the `examples` folder, showcases how to evaluate the classic Iris dataset to classify flower species using our custom neural network implementation.

We will walk through the process of loading the dataset, preprocessing the data, and using the neural network to make predictions. This example highlights the flexibility and functionality of our self-made neural network.

The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

Kaggle link: https://www.kaggle.com/datasets/uciml/iris 

In [None]:
# Retrieve the modules from the main folder
import sys
sys.path.insert(0, '..')

# Import pandas for loading the dataset
import pandas as pd
import numpy as np

# Import the modules
from neural_network import NeuralNetwork
from utils import StandardScaler, OneHotEncoder, train_test_split, accuracy_score

In [21]:
# Import the dataset
df_iris = pd.read_csv('./datasets/Iris.csv')

df_iris.head(10)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


In [22]:
# Define the features and target variables
X = df_iris.drop(['Id', 'Species'], axis=1)
y = df_iris['Species']

# Print the shape
print(f'Shape of X: {X.shape}')
print(f'Shape of y {y.shape}')

Shape of X: (150, 4)
Shape of y (150,)


In [23]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [24]:
# Apply the encoding for the different species to the training data
encoder = OneHotEncoder()
y_train = encoder.fit_transform(y_train)

In [26]:
# Define the neural network
nn = NeuralNetwork(layers=[4,64,32,16,3], activation='relu', output_activation='softmax', loss='cross_entropy', random_seed=42)

# Train the model
nn.train(X_train, y_train, epochs=10000, learning_rate=0.001)

# Make predictions
y_pred = nn.predict(X_test)

# Print the results
print(f'\nThe accuracy score of the model is: {accuracy_score(y_test, encoder.decode(y_pred)):.5f}')

Epoch: 0, loss: 1.0408009960707987
Epoch: 100, loss: 0.18233432358881585
Epoch: 200, loss: 0.14035904975297822
Epoch: 300, loss: 0.10762410200307433
Epoch: 400, loss: 0.09303426775379472
Epoch: 500, loss: 0.08229653588478401
Epoch: 600, loss: 0.07414433202983722
Epoch: 700, loss: 0.06769209338258604
Epoch: 800, loss: 0.062400637338680086
Epoch: 900, loss: 0.05804743490208458
Epoch: 1000, loss: 0.054407153247596124
Epoch: 1100, loss: 0.05133157846310049
Epoch: 1200, loss: 0.04870167677446249
Epoch: 1300, loss: 0.04642084981988036
Epoch: 1400, loss: 0.044432092278035844
Epoch: 1500, loss: 0.04264918782398259
Epoch: 1600, loss: 0.04105242784170817
Epoch: 1700, loss: 0.03966176108734922
Epoch: 1800, loss: 0.0384318282440061
Epoch: 1900, loss: 0.03732533011940888
Epoch: 2000, loss: 0.03633640162308161
Epoch: 2100, loss: 0.03545108490409183
Epoch: 2200, loss: 0.03465082525943873
Epoch: 2300, loss: 0.03392289488367526
Epoch: 2400, loss: 0.03325783928409135
Epoch: 2500, loss: 0.032648234459321

In [27]:
# Test the decoder to add the predicted species back to the table
full_prediction = encoder.decode(nn.predict(df_iris.drop(['Id', 'Species'], axis=1)))
df_iris['PredictedSpecies'] = full_prediction

In [None]:
# Print the final results
print(f'The accuracy over the full dataset is: {accuracy_score(np.array(df_iris["Species"]), full_prediction):.5f}')

# Show the table
df_iris

The accuracy over the full dataset is: 0.96667


Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species,PredictedSpecies
0,1,5.1,3.5,1.4,0.2,Iris-setosa,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa,Iris-setosa
...,...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica,Iris-virginica
