# **Solving the classification of Iris Dataset : Single Layer Perceptron from Scratch**

1.   List item
2.   List item
**bold text**


Task: To predict the class of the iris flower according to the sepal length, sepal width, petal length and petal width

**What is Iris dataset?**

It is a dataset about classifying three different classes (Setosa, Versicolour, Virginica) of iris flower on the basis of four features such as sepal length, sepal width, petal length and petal width.




In our code, we have 3 classes numerically defined as Setosa : 1, Versicolour : 2, Virginica : 3. We have four 2-featured instances as our input. We have 50 rows of each class, i.e. 150 data rows. We need to predict the class of the iris flower by taking a single layer perceptron with four features.

**What does coding a neural network from scratch imply?**

We do not use inbuilt libraries for machine/ deep learning such as those in scikit-learn, keras or tensor flow.  We use only numpy, pandas, and write functions to make the model, train, fit, predict and evaluate it. 

Let's get started!

## STEP 1. Import the necessary libraries/ functions and get the data ready

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

In [2]:
iris = load_iris()
X = iris.data
y = iris.target

In [3]:
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [4]:
num_classes=len(np.unique(y))
a=np.zeros((y.shape[0], num_classes))

In [5]:
for i in range(y.shape[0]):
  if y[i] == 0:
    a[i][0] = 1
  elif y[i] == 1:
    a[i][1] = 1
  else:
    a[i][2] = 1

#one hot encoding 

In [6]:
X= pd.DataFrame(X,columns=["sepal_length", "sepal_width", "petal_length", "petal_width"])
y = pd.DataFrame(a, columns = ["y0","y1","y2"],dtype="int64")

In [7]:
y.head()

Unnamed: 0,y0,y1,y2
0,1,0,0
1,1,0,0
2,1,0,0
3,1,0,0
4,1,0,0


In [8]:
b=np.array([[1,2,3]])
np.argmax(b,axis=1)

array([2])

In [9]:
#In our code, we want to integrate the bias terms into the matrix of input value. 
#This way, our weighted sum becomes the following: WX instead of WX + b, where b #is a separate, bias vector

#Inserting a new column into the pandas dataframe where every elemnt is 1, in the first position
#X.insert(0,"bias", np.ones(X.shape[0]), True)
X

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


## STEP 2: Write the feed forward function that predicts the outcome using the weights and inputs

This function is used to predict the activated output at a layer, and is called for each instance of X (corresponds to each row in the X matrix). It thus takes in a row of the input features (= the vector of feature values for that instance) and the updated weights as its parameter.

In [10]:
def softmax(x): 

    """Compute softmax values for each sets of scores in x.""" 

    e_x = np.exp(x - np.max(x)) 

    return e_x / e_x.sum(axis=0) 

In [11]:
def relu(x):
    result = np.maximum(0, x) # ReLU activation
        
    return result


In [12]:
def predict(x,w1,w2,b1,b2 ):
  #Initializing a variable to hold the value of the weighted sum (of input features and the corresponding weights) 
  weighted_1_sum=np.zeros((x.shape[0],w1.shape[1])) 
  weighted_2_sum=np.zeros((150,3)) 
  
  ##print("instance in predict(p1,p2):",instance)
  ##print("weights in predict(p1,p2):",weights)
  
  #calculating the weighted sum of the instance features and the corresponding weights as a dot product
  weighted_1_sum=np.dot(x,w1)+b1
  
  ##print("weighted_sum in predict(p1,p2):",weighted_sum)
  
  #Step activation function - maps non-negative inputs (function inputs, the weighted sum here) to 1 and negative inputs to 0. 
  #Using an if statement for the step activation function
  activation_1 = relu(weighted_1_sum)
  
  #activation_1.insert(0,"bias", np.ones(X.shape[0]), True)
  # activation_1 = np.insert(activation_1,0,1,axis = 1)

  weighted_2_sum=np.dot(activation_1,w2)+b2
  ##print ("softmax of weighted sum =",a)

  activation_2=softmax(weighted_2_sum)
  return weighted_2_sum,activation_1

In [13]:
a=np.array([[1,2,3]])
a.shape

(1, 3)

## STEP 3: Define the function that trains the weight using backpropogation using gradient descent. 

Here, we use stochastic gradient descent which means we we go through all the instances one by one and update the weights after each time an instance is passed in the forward direction. 

**Weight update:**

updated weight = old weight - update in weight

> where:

> update in the weight = learning rate * (desired output - actual (or predicted) output) * input
  
Each of the weights (corresponding to each input feature) gets updated

In [14]:
# from math import log
 
# # calculate categorical cross entropy
# def categorical_cross_entropy(target, predicted):
#   sum_score = 0.0
#   print(target)
#   print(len(target))
#   for j in range(len(target)):
#     sum_score += target[j] * log(1e-15 + predicted[j])
# 	return sum_score
#   #https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/

In [15]:
from math import log
def categorical_cross_entropy(target, predicted):
  sum_score = 0.0
  
  ##print("predicted in CCE=",predicted)
  ##print(predicted[0])
  
  for j in range(len(target)):
    sum_score += target.iloc[j] * log(1e-15 + predicted[j])
  return sum_score

In [16]:
#np.ones((X.shape[1], y.nunique())).shape
y_total=y.to_numpy()
len(np.unique(y_total,axis=0))

3

In [17]:
#learning_rate and epochs are hyperparameters here, that is, we specify their values (we try different values to see which one helps our model train / converge the best)
def train_weights(x,y, learning_rate, epochs):
  
  #initializing a weight vector (array of 1 dimension here as there is only one neuron) whose length is equal to the number of input features
  w1 = np.random.randn(4,10) #The no. of input features in x is given by the number of columns x has. It can be obtained from the shape. Run x.shape and see what you get.
  b1=np.zeros((1,10))
  w2 = np.random.randn(10,3) #The no. of input features in x is given by the number of columns x has. It can be obtained from the shape. Run x.shape and see what you get.
  b2=np.zeros((1,3))

  #We look at the error in predictions for all instances, and do that for each epoch
  for epoch in range(epochs): #for every epoch
    total_error = 0.0 #storing the total error for that epoch
    prediction_softmax,activation_hidden = predict(x, w1, w2, b1, b2)
    prediction=np.argmax(prediction_softmax,axis=0)
      
      ##print("shape of prediction_softmax and of prediction=",prediction_softmax.shape,prediction.shape)
      ##print("Prediction softmax in train_weights:",prediction_softmax)
    
    error_2 = y-prediction_softmax
      ##print("error shape=",error.shape)
      ##print("type(error)=",type(error))
     #sum of squared error
    error_2/=150
    
    w2_update = np.dot(activation_hidden.T, error_2)
    #b2_update = np.sum(error_2, axis=0, keepdims=True)
    b2_update = np.sum(error_2, axis=0).values.reshape(1,3)
    print(b2_update)  
        # next backprop into hidden layer
    error_1 = np.dot(error_2, w2.T)
        # backprop the ReLU non-linearity
    error_1[activation_hidden <= 0] = 0
        # finally into W,b
    w1_update = np.dot(x.T, error_1)
    b1_update = np.sum(error_1, axis=0)
    
    w1=w1+learning_rate*w1_update
    b1=b1+learning_rate*b1_update
    w2=w2+learning_rate*w2_update
    print("w2 shape",w2.shape)
    print("learning rate*w2 update",(learning_rate*w2_update).shape)
    print("b2 shape",b2.shape)
    print("learning rate*b2 update",(learning_rate*b2_update).shape)
    b2=b2+learning_rate*b2_update
    ##print(weights)
    
    #printing the epoch number, the learning rate and mean of the total error for that epoch
    print(f'Epochs = {epoch}, learning rate = {learning_rate}') #mean categorical cross entropy error = {mean_error}'
  return w1,w2,b1,b2 #these are the final, trained weights after running through all the epochs - used in prediction

In [0]:
# https://stats.stackexchange.com/questions/235528/backpropagation-with-softmax-cross-entrop
# ∂E∂wij=yi(oj−tj)y

## STEP 4: Define the function that serves as the model 

This function will call the training and predict functions, and computes the final predicted outcome after running through all the epochs. 

In [18]:
from sklearn.model_selection import train_test_split

In [19]:
X_train,X_test,a_train,a_test=train_test_split(X, y, test_size=0.2, random_state=2275)
#https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [20]:
y_test=np.argmax(np.array(a_test),axis=1)

In [21]:
#defining the function that calls the training function, and computes the final outcome
def perceptron(x_train, y_train,x_test, learning_rate, epochs):
  #y_preds = list() #initializing a variable to hold the predicted values (one prediction for each instance) as a list
  w1,w2,b1,b2 = train_weights(x_train, y_train, learning_rate, epochs)
 # for i in range(len(x_test)):
    #for each instance/ row of X, getting the prediction and appending it to the list
  prediction_softmax,activation_hidden = predict(x_test,w1,w2,b1,b2)
  print("prediction_softmax in perceptron()",prediction_softmax)
  prediction=np.argmax(prediction_softmax,axis=1)
  print("prediction in perceptron()",prediction)
  return prediction

## STEP 5: Define a function that evaluates the accuracy score

Now that we have predicted the outcome, we need to compute the accuracy of our model. We use a simple accuracy metric in this code, as described below. 

In [22]:
def accuracy_score(y_desired, y_predicted):
  correct = 0
  
  #computing accuracy as the percent correctness of the predictions among the instances
  #Defined here as the %percent of correct predictions out of all the instances the model predicted for
  print("predicted:",y_predicted)
  print("desired:",y_desired)
  for i in range(len(y_desired)):
    if y_desired[i] == y_predicted[i]:
      correct += 1
  accuracy_score=correct*100.0/len(y_desired)
  return accuracy_score

### STEP 6: Tune the hyperparameters, run the model, obtain the accuracy score. Repeat for different values of the hyperparameters and see what values give you the best performing model

In [23]:
#Iitializing the hyperparameters
learning_rate = 0.01
epochs = 30

#calling the model, predicting the outcome and measuring the accuracy of the prediction
predicted = perceptron(X_train, a_train, X_test, learning_rate, epochs)

print("predicted")
accuracy = accuracy_score(y_test, predicted)
print("\n")

print(f"The accuracy is {accuracy}%.")

[[ 6.36906705 -0.5866628  -5.698111  ]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 0, learning rate = 0.01
[[-18.17845065 -11.29176809  14.34521235]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 1, learning rate = 0.01
[[0.34258893 0.48057701 0.16597085]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 2, learning rate = 0.01
[[0.34241228 0.43672297 0.17432494]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 3, learning rate = 0.01
[[0.34070898 0.41073895 0.18061817]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 4, learning rate = 0.01
[[0.33866499 0.3924056  0.18440102]]
w2 shape (10, 3)
learning rate*w2 update (10, 3)
b2 shape (1, 3)
learning rate*b2 update (1, 3)
Epochs = 5, learning rate = 

In [26]:
final=pd.DataFrame(y_test,columns=["y_test"])
final["y_pred"]=predicted

In [29]:

import matplotlib.pyplot as plt

plt.figure(figsize=(10,10))
plt.xlim(-1,3)
plt.xticks(np.arange(0,3,1))
plt.yticks(np.arange(0,3,1))
plt.ylim(-1,3)
plt.scatter(X[data["y"]==1]["x1"], X[data["y"]==1]["x2"], s=80, c="r", marker=">")

#plt.plot(data[data["y"]==1]["x1"],data[data["y"]==1]["x2"],'o',c="b")
plt.plot(data[data["y"]==0]["x1"],data[data["y"]==0]["x2"],'^',c="r")
plt.title('Graph')
plt.xlabel('x', color='#1C2833')
plt.ylabel('y', color='#1C2833')
plt.legend(loc='upper left')
plt.grid()
plt.show()


NameError: name 'data' is not defined

In [0]:
 # plt.subplot(321)
# plt.scatter(X[data["y"]==1]["x1"], X[data["y"]==1]["x2"], s=80, c="r", marker=">")
# for i in range(len(data)):
#   x1=data.iloc[i,0]
#   x2=data.iloc[i:1]
#   plt.text(x1,x2,f"{x1},{x2}")
#   print("check")