# Perceptron Model with Iris Dataset


This notebook to train on how to build and train a Perceptron model using the Iris dataset. The Iris dataset is a classic dataset used in machine learning and statistics, consisting of 150 samples of iris flowers with four features each (sepal length, sepal width, petal length, and petal width) and three classes (Iris-setosa, Iris-versicolor, and Iris-virginica).

In this notebook, we will do the following:
* Load and prepare the Iris dataset.
* Split the dataset into training and testing sets.
* Standardize the features.
* Train a Perceptron model.
* Evaluate the model's performance.


# Import libraries

In [None]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron

# Load the Iris dataset

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

column_names = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'Species']

df = pd.read_csv(url, names=column_names)

In [None]:
X = df.drop('Species', axis= 1)
y = df['Species']

# EDA

In [None]:
df.sample(10)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Species
6,4.6,3.4,1.4,0.3,Iris-setosa
93,5.0,2.3,3.3,1.0,Iris-versicolor
54,6.5,2.8,4.6,1.5,Iris-versicolor
92,5.8,2.6,4.0,1.2,Iris-versicolor
110,6.5,3.2,5.1,2.0,Iris-virginica
94,5.6,2.7,4.2,1.3,Iris-versicolor
49,5.0,3.3,1.4,0.2,Iris-setosa
62,6.0,2.2,4.0,1.0,Iris-versicolor
58,6.6,2.9,4.6,1.3,Iris-versicolor
3,4.6,3.1,1.5,0.2,Iris-setosa


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   Species            150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [None]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal length (cm),150.0,5.843333,0.828066,4.3,5.1,5.8,6.4,7.9
sepal width (cm),150.0,3.054,0.433594,2.0,2.8,3.0,3.3,4.4
petal length (cm),150.0,3.758667,1.76442,1.0,1.6,4.35,5.1,6.9
petal width (cm),150.0,1.198667,0.763161,0.1,0.3,1.3,1.8,2.5


In [None]:
df.duplicated().sum()

3

In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.duplicated().sum()

0

In [None]:
df.isna().sum().sum()

0

# Split the dataset into training and testing sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and train the Perceptron model

In [None]:
model = Perceptron()
model.fit(X_train, y_train)

# Predict the labels of the test set

In [None]:
y_pred = model.predict(X_test)

# Calculate the accuracy of the model

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 0.8666666666666667


# Display the first few predictions

In [None]:
y_test = pd.Series(y_test)
y_test.reset_index(drop=True, inplace=True)

print("First few predictions:")
for i in range(5):
    print(f"Predicted: {y_pred[i]}, Actual: {y_test[i]}")

First few predictions:
Predicted: Iris-versicolor, Actual: Iris-versicolor
Predicted: Iris-setosa, Actual: Iris-setosa
Predicted: Iris-virginica, Actual: Iris-virginica
Predicted: Iris-virginica, Actual: Iris-versicolor
Predicted: Iris-versicolor, Actual: Iris-versicolor
