The algorithm uses data from four features to classify flower into one of three types of Iris Flowers. The four features are Sepal Length, Sepal Width, Petal Length, and Petal Width. The three types of Iris Flowers are 'Iris-setosa', 'Iris-versicolor', and 'Iris-virginica'. The discussion on how the algorithm makes the predictions is below

In [1]:
#importing necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split



df = pd.read_csv("Iris.csv") #loading the data into Python
df.head()#checking the contents to manipulate

#Preprocessing the data (making the features and the targets)

X = np.asmatrix(np.copy(df))[:,:5] #getting all the columns of the feature data
X = np.delete(X, 0, axis = 1) #dropping column of index 0 because it is "id".

nameOfTargets = df.Species.unique() #Getting the unique values of the target column for one hot encoding

Y_data = [] #Empty list that will eventually become target data
for i in df.iloc[:,5]: #Iterating through all the targets
    for j in range(nameOfTargets.shape[0]): #for j from 0 to N, where N is the number of items in nameOfTargets
        if i == nameOfTargets[j]:       
            Y_data.append(j)
    #This code segment iterates through the targets (which are Strings), and converts it into 0, 1 or 2.
        #The index number of the item in nameOfTargets is how they will be represented 
        #in the target data. I.E if the value of the target is equal to the FIRST 
        #item of nameOfTargets, the value is represented by the item's INDEX (0).
        
N = len(Y_data) #Getting the number of items in the dataset
Y = np.zeros(N*nameOfTargets.shape[0]).reshape(N,nameOfTargets.shape[0]) 
    #Making the target matrix. The number of rows = number of subjects, number of columns = number of unique targets


for i in range(N): #One Hot Encoding. After the loop finishes, Y will be the final target matrix.
    t = Y_data[i]
    Y[i,t] = 1
    
    
#Standardizing values in the feature matrix X
for i in range(X.shape[1]):
    X[:,i] = (X[:,i].astype(float) - np.mean(X[:,i].astype(float)))/np.std(X[:,i].astype(float))
    
X_new = np.asmatrix(np.copy(df))[:,:5]  
X_new = np.delete(X_new, 0, axis = 1)
#Making a copy of the feature data that is not standardized. This data will be used later in the algorithm's classifier
    #to standardize the input data.
    

X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.4, random_state = 10) #Splitting the data into testing data and training data



#Deep Learning

np.random.seed(1) #making sure the weights are the same every time the cell is rerun (still random)
N,D = X_train.shape #N = num subjects, D = num features
M = 100 #num hidden nodes of the hidden layer
K = nameOfTargets.shape[0] #number of outputs

iteration_num = 1000 #Number of times gradient descent will be performed
a = 0.02 #learning rate

#creating the weights (randomly)
W = np.random.randn(D*M).reshape(D,M)
V = np.random.randn(M*K).reshape(M,K)


#creating the biased terms
b = np.random.randn(M).reshape(1,M) #Generating biased terms for hidden nodes
b_ones = np.ones(N).reshape(N,1) #Making an Nx1 matrix to multiply with the biased terms.
b = np.dot(b_ones,b) #After multiplying, NxM matrix (each row is a subject, each column is a node). The biased terms of each column are the same (biased term of each node does not change for the subject).

c = np.random.randn(K).reshape(1,K) #Repeating the process for the biased terms of the output layer. NxK matrix once completed.
c_ones = np.ones(N).reshape(N,1)
c = np.dot(c_ones, c)

for j in range(iteration_num): #Back Propagation
    
    #feed forward
    z = np.dot(X_train,W) + b
    z = 1/(1 + np.exp(-z.astype(float)))
    predictions = np.exp(np.dot(z,V) + c)

    #softmax
    for i in range(predictions.shape[0]):
        predictions[i,:] = predictions[i,:]/np.sum(predictions[i,:])


    #gradient descent (all from formula)
    
        #Calculating all the partial derivatives of cost
    dV = np.dot(z.T,(y_train - predictions)) 
    dZ = np.dot(np.dot(np.dot((y_train - predictions), V.T).T, z),(1-z.T)) #will be used to calculate dW
    dW = np.dot(X_train.T,dZ.T) 
    db = np.dot(np.dot(np.dot((y_train - predictions), V.T).T, z), (1-z.T)).T.sum(axis = 0) 
    dc = (y_train - predictions).sum(axis = 0) 
    
    
    W += a*dW.astype(float) #Gradient Descent
    V += a*dV.astype(float) 
    b += a*db.astype(float)
    c += a*dc.astype(float)
    
    if j%100 == 0: #Every 100 iterations, print out the cost and accuracy
        total = -np.dot(y_train.T, np.log(predictions))
        cost = total.sum() #Cost of the model
        Accuracy = np.mean(np.round(predictions) == y_train) #Accuracy of the model
        print(cost, Accuracy)
        
        
print(" ")
print("Final Cost and Accuracy of training data: ")
print(cost, Accuracy)
        
#Applying the model to the test data. The X_test data must be put through the softmax function and compared to y_test

#feed forward
z = np.dot(X_test,W) + b[0] #using only b causes a dimension error. Using b[0] gets all the relevant biased terms and makes use of how numpy array addition works.
z = 1/(1 + np.exp(-z.astype(float)))
test_predictions = np.exp(np.dot(z,V) + c[0]) #same thing with c[0]

#softmax
for i in range(test_predictions.shape[0]):
    test_predictions[i,:] = test_predictions[i,:]/np.sum(test_predictions[i,:])
    
test_Acc = np.mean(np.round(test_predictions) == y_test)
test_total = -np.dot(y_test.T, np.log(test_predictions))
test_cost = total.sum()

print(" ")
print("Cost and Accuracy of testing data: ")
print(test_cost, test_Acc)

#Function to classify the flower. Parameters are the Sepal Length, Sepal Width, Petal Length and Petal Width. (Classifier)
def classify(SLen, SWid, PLen, PWid):
    
    data = np.matrix([SLen,SWid, PLen, PWid]) #Converting inputs into matrix
    
    for i in range(X_new.shape[1]):
        data[:,i] = (data[:,i].astype(float) - np.mean(X_new[:,i].astype(float)))/np.std(X_new[:,i].astype(float)) 
        #Standardizing inputs using X_new, which was created earlier in the algorithm
    
    #Passing the inputs through feed forward
    z = np.dot(data,W) + b[0]
    z = 1/(1 + np.exp(-z.astype(float)))
    test_predictions = np.exp(np.dot(z,V) + c[0])

    #softmax
    for i in range(test_predictions.shape[0]):
        test_predictions[i,:] = test_predictions[i,:]/np.sum(test_predictions[i,:])
    
    test_predictions = np.round(test_predictions) #rounding the predictions to only get 1 and 0
    j = np.where(test_predictions == 1)[1][0] #Getting the index of which target we classified the data as (which target node has a value of 1)
    
    
    return nameOfTargets[j]#Returning the name of the target

print(" ")
print("Testing the algorithm's classifier: ") #The classifier should return the following.
print(classify(5.6,3,4.5,1.5)) #Versicolor
print(classify(6.7,3.3,5.7,2.5)) #Virginica
print(classify(5.1,3.5,1.4,0.2)) #Setosa


print("")
print("Weights of input layer and hidden layer: ")
print(W)

print("")
print("Biased Terms of the first hidden layer:")
print(b[0])

print("")
print("Weights of hidden layer and output layer:")
print(V)

print("")
print("Biased Terms of the output layer:")
print(c[0])

2444.5871389198182 0.5703703703703704
1622.4090943649549 0.9481481481481482
1856.3319365267225 0.9555555555555556
1792.9960091449575 0.9629629629629629
1906.4546639762107 0.9481481481481482
1850.0546451676003 0.9703703703703703
1712.7049218295235 0.9333333333333333
1938.9666561312604 0.9703703703703703
1904.7311487906939 0.9555555555555556
1953.0744634253265 0.9555555555555556
 
Final Cost and Accuracy of training data: 
1953.0744634253265 0.9555555555555556
 
Cost and Accuracy of testing data: 
1953.0744634253265 0.9444444444444444
 
Testing the algorithm's classifier: 
Iris-versicolor
Iris-virginica
Iris-setosa

Weights of input layer and hidden layer: 
[[ -4269.26426116  -8419.98151452 -13548.78898352   1993.66912437
  -13471.40036771 -12947.26553449 -11204.66468377  25322.69167123
   25976.19513477  17427.26294486  -7035.94503569  23112.30343406
   20050.62461375  13697.22619701  -3453.66009847  -6133.42453313
    8749.09394242   8257.41960608  17914.92991823  19569.7719509
   2298

How the algorithm makes predictions:
- First, the algorithm is trained using existing data. The Iris Flower data set is split into testing and training data, and the training data is standardized and inputted into the model. 
- The model, which has 1 hidden layer containing 100 nodes (M), finds the number of subjects (N), the number of features (D), and the number of outputs (K). All the weights and biased terms between the layers are then randomly generated according to these numbers.
- Backpropagation is now performed. First, the feed forward function. The algorithm multiplies the training data with the weights between the input layer and the hidden layer (W), and then adds the bias terms of each node (b) to the computed values (z). These computed values are passed through the sigmoid function (Z). 
- These Z values are then multipled by the weights between the hidden layer and the output layer (V), then the biased terms of the output layer are added (c). These values are then put through the softmax function. 
- The cost derivatives (dW, dV)are then calculated and gradient descent is performed on all the weights and biased terms in the model with a learning rate of 0.02. 
- This entire process of backpropagation is repeated for 1000 iterations. Once completed, the weights and biased terms can be considered ideal. The model has finished training.
- To predict using the model, pass the values of the sepal length, sepal width, petal length and petal width respectively into the classifier. The algorithm will use feed forward on the data and classify it into one of the three targets using the ideal weights and biased terms. Of the three target nodes, the node with the highest value will be what we classify the flower as. 