# Problem: Binary Classification of Iris Flowers

1. Dataset: Use the Iris dataset, a popular dataset in machine learning. It contains four features: sepal length, sepal width, petal length, and petal width, along with the species of iris (Setosa, Versicolor, or Virginica).

2. Objective: We will simplify the problem to binary classification by considering only two classes: Setosa and Versicolor. You will train the perceptron to distinguish between these two classes based on the provided features.

3. Steps:

    a)Load the Iris dataset.

    b) Preprocess the data: Since we're considering only two classes, Setosa and Versicolor, you can select only the corresponding rows from the dataset and use only the first two features (sepal length and sepal width) for simplicity.
    
    c) Implement the perceptron algorithm to learn a decision boundary that separates the two classes.
    Train the perceptron on a portion of the dataset.
    
    d)Test the perceptron on the remaining portion of the dataset and evaluate its performance (e.g., accuracy).
    
    e) Evaluation: You can evaluate the performance of your perceptron algorithm by calculating the accuracy, which is the proportion of correctly classified instances over the total number of instances.

4. Extension: Once you have a basic perceptron working, you can experiment with different aspects such as learning rate, number of epochs, feature scaling, or even extend it to handle multiple classes using techniques like one-vs-all or one-vs-one.

5. This exercise will allow you to test your perceptron algorithm in a simple binary classification task and assess its performance.

## Load and Preprocessing of Dataset

In [46]:
def load_label_info(path, label, column):
    
    labeled_list = []
    with open(path, 'r') as file:
        
        for line in file:
            
            info = line.split(',')

            if info[column] == label:
                labeled_list.append(info)
    
    return labeled_list

def create_labeled_lists(label1, label2, path, column):

    labeled_list1 = load_label_info(path, label1, column)
    labeled_list2 = load_label_info(path, label2, column)

    return labeled_list1, labeled_list2

labeled_list1, labeled_list2 = create_labeled_lists('Iris-versicolor\n', 'Iris-setosa\n', 'iris/iris.data', 4)

labeled_list1_80 = []
labeled_list1_20 = []
i = 0
while i/len(labeled_list1) < 0.8:
    
    labeled_list1_80.append([1] + [float(labeled_list1[i][0])] + [float(labeled_list1[i][1])])
    
    i += 1

while i < len(labeled_list1):

    labeled_list1_20.append([1] + [float(labeled_list1[i][0])] + [float(labeled_list1[i][1])])

    i += 1

labeled_list2_80 = []
labeled_list2_20 = []
i = 0
while i/len(labeled_list2) < 0.8:
    
    labeled_list2_80.append([1] + [float(labeled_list2[i][0])] + [float(labeled_list2[i][1])])
    
    i += 1

while i < len(labeled_list2):

    labeled_list2_20.append([1] + [float(labeled_list2[i][0])] + [float(labeled_list2[i][1])])

    i += 1
    
"""
for line in labeled_list2_80:
    print(line)
print()
for line in labeled_list2_20:
    print(line)
"""


'\nfor line in labeled_list2_80:\n    print(line)\nprint()\nfor line in labeled_list2_20:\n    print(line)\n'

## Initial weight vector

In [47]:
def average(list, column):
    sum = 0
    for line in list:
        sum += float(line[column])
    return sum/len(list)

# list1 average (0)
print('list1 (0) average:', average(labeled_list1, 0))

# list1 average (1)
print('list1 (1) average:', average(labeled_list1, 1))

# list2 average (0)
print('list2 (0) average:', average(labeled_list2, 0))

# list2 average (1)
print('list2 (1) average:', average(labeled_list2, 1))

import numpy as np


num_features = 3
weight = np.random.randn(num_features)

weight = [weight[0]] + [weight[1]] + [weight[2]]

#weight = [0.23, -0.75, 0.12]


list1 (0) average: 5.936
list1 (1) average: 2.7700000000000005
list2 (0) average: 5.005999999999999
list2 (1) average: 3.4180000000000006


## Perceptron Learning Algorithm - Training
We are going to define that:
 - if the dot product $(weight*x) >= 0$, then $y$ is versicolor (+1)
 - if the dot product $(weight*x) < 0$, then $y$ is setosa (-1)

In [43]:
for x in labeled_list1_80:
    dot_product = 0
    for i in range(len(weight)):
        dot_product += weight[i]*x[i]
    
    if dot_product < 0:
        weight = [weight[0]*(1*x[0])] + [weight[1]*(1*x[1])] + [weight[2]*(1*x[2])]
        print(weight)

for x in labeled_list2_80:
    dot_product = 0
    for i in range(len(weight)):
        dot_product += weight[i]*x[i]
    
    if dot_product >= 0:
        weight = [weight[0]*(-1*x[0])] + [weight[1]*(-1*x[1])] + [weight[2]*(-1*x[2])]
        print(weight)

print(weight)

[0.5967242702959495, -1.4995586584829723, -2.21784062285591]
[0.5967242702959495, -9.597175414291023, -7.097089993138913]
[0.5967242702959495, -66.22051035860807, -22.00097897873063]
[0.5967242702959495, -364.21280697234437, -50.602251651080444]
[0.5967242702959495, -2367.3832453202385, -141.68630462302522]
[0.5967242702959495, -13494.08449832536, -396.7216529444706]
[0.5967242702959495, -85012.73233944977, -1309.1814547167528]
[0.5967242702959495, -416562.38846330385, -3142.035491320207]
[0.5967242702959495, -2749311.763857805, -9111.902924828599]
[0.5967242702959495, -14296421.172060587, -24602.137897037217]
[0.5967242702959495, -71482105.86030293, -49204.275794074434]
[0.5967242702959495, -421744424.5757873, -147612.8273822233]
[0.5967242702959495, -2530466547.454724, -324748.22024089127]
[0.5967242702959495, -15435845939.473814, -941769.8386985847]
[0.5967242702959495, -86440737261.05336, -2731132.5322258957]
[0.5967242702959495, -579152939649.0575, -8466510.849900277]
[0.596724270

## Perceptron Learning Algorithm - Validation

In [48]:
accuracy = 0
for x in labeled_list1_20:
    dot_product = 0
    for i in range(len(weight)):
        dot_product += weight[i]*x[i]
    print(dot_product)
    if dot_product >= 0:
        accuracy += 1

for x in labeled_list2_20:
    dot_product = 0
    for i in range(len(weight)):
        dot_product += weight[i]*x[i]
    print(dot_product)
    if dot_product < 0:
        accuracy += 1

print(accuracy)



-0.9498306949205194
-0.9688176757034882
-0.9567833315760379
-0.9344316865573732
-0.9534186673403419
-0.9595474934961301
-0.9582770666281472
-0.9698647943873447
-0.9392900858451785
-0.9570066397601642
-0.9496768089731682
-0.9228439587981756
-0.9319602550581824
-0.9496768089731682
-0.9558056351289563
-0.9386895835295745
-0.9558056351289563
-0.9365953461618614
-0.9591702993646525
-0.9471359552372024
10
