IRIS DATASET

The Iris dataset is one of the most famous datasets used for machine learning and data analysis. It was introduced by the British statistician and biologist Ronald Fisher in 1936. The dataset is simple and widely used for classification problems.

In [None]:
# Key Features of the Iris Dataset:
# Data Description:

# The dataset contains 150 samples of iris flowers.
# It includes measurements for four features of the flowers:
# Sepal length (in cm)
# Sepal width (in cm)
# Petal length (in cm)
# Petal width (in cm)
# Classes (Target Variable):

# There are three species of Iris flowers in the dataset:
# Iris-setosa
# Iris-versicolor
# Iris-virginica
# Each species has 50 samples.
# Type of Problem:

# It is a multiclass classification problem where the goal is to classify the iris flower species based on the given features.
# Balanced Dataset:

# The dataset is balanced, meaning each class has the same number of samples.

In [19]:
# Code Example (Using Python):
# Here's an example of how to load and explore the Iris dataset using Python's scikit-learn library:

from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
iris = load_iris()

# Create a DataFrame for easier visualization
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target

# Mapping numeric species to actual names
data['species'] = data['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# Display the first 5 rows
print(data.head())


   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

  species  
0  setosa  
1  setosa  
2  setosa  
3  setosa  
4  setosa  


Lets make a simple MAchine Learning Model using Iris Dataset

Step 1: Import Libraries

In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report , confusion_matrix

Step 2: Load and Explore Data

For this example, we’ll use the Iris dataset, a classic dataset for classification.

In [7]:
from sklearn.datasets import load_iris

#load dataset
iris = load_iris()
X = iris.data   #features
Y = iris.target  #labels

#check the dataset
print("Feature Names: ", iris.feature_names)
print("Target Names: ", iris.target_names)
print("First 5 Rows of Data:\n",X[:5])
print("Target Labels:\n",Y[:5])


Feature Names:  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target Names:  ['setosa' 'versicolor' 'virginica']
First 5 Rows of Data:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
Target Labels:
 [0 0 0 0 0]


Step 3: Split the Data

We split the dataset into training and testing sets to evaluate the model’s performance on unseen data.


In [12]:
#Split into training and testing sets
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.3,random_state = 42)

print("Training Set Size: ",len(X_train))
print("Testing Set Size: ",len(X_test))

Training Set Size:  105
Testing Set Size:  45


Step 4: Train the Model
We’ll use a Logistic Regression model for this classification problem.

In [13]:
model = LogisticRegression(max_iter = 200)
model.fit(X_train,Y_train)
print("Model Trained!")

Model Trained!


Step 5: Make Predictions

Predict on the test set and check how well the model performs.

In [14]:
#make predictions
Y_pred = model.predict(X_test)

print("Predicted Labels: ",Y_pred[:10])
print("Actual Labels: ",Y_test[:10])

Predicted Labels:  [1 0 2 1 1 0 1 2 1 1]
Actual Labels:  [1 0 2 1 1 0 1 2 1 1]


Step 6: Evaluate the Model

We’ll use accuracy, confusion matrix, and a classification report.

In [17]:
#Accuracy
accuracy = accuracy_score(Y_test,Y_pred)
print("\nAccuracy: ", accuracy)

#confusion matrix
cm = confusion_matrix(Y_test,Y_pred)
print("\nConfusion Matrix:\n", cm)

#Classification Report
report = classification_report(Y_test,Y_pred,target_names = iris.target_names)
print("\nClassification Report:\n", report)


Accuracy:  1.0

Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

