# Task 1: Iris Flower Classification

Use the Iris dataset available on Kaggle and build a machine learning model to classify iris flowers into their species based on features like petal length, petal width, sepal length, and sepal width. Your task is to preprocess the data, train the model, and evaluate its accuracy

# Installing Dataset from Kaggle

In [11]:
!pip install kaggle








In [14]:
import os

# Make .kaggle folder in your Windows user directory
kaggle_dir = os.path.join(os.path.expanduser("~"), ".kaggle")
os.makedirs(kaggle_dir, exist_ok=True)

# Move kaggle.json into that directory
shutil.move("kaggle.json", os.path.join(kaggle_dir, "kaggle.json"))

# Optional: Set correct file permission
os.chmod(os.path.join(kaggle_dir, "kaggle.json"), 600)


In [7]:
!kaggle datasets download -d uciml/iris


Dataset URL: https://www.kaggle.com/datasets/uciml/iris


  0%|          | 0.00/3.60k [00:00<?, ?B/s]
100%|##########| 3.60k/3.60k [00:00<00:00, 1.24MB/s]



License(s): CC0-1.0
Downloading iris.zip to C:\Users\dell



In [28]:
#Unzip the file
import zipfile

with zipfile.ZipFile("iris.zip", "r") as zip_ref:
    zip_ref.extractall("iris_data")


# Importing Libraries

In [36]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


# Load the Dataset

In [62]:
iris_df = pd.read_csv("iris_data/Iris.csv")
iris_df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


#  Explore and Preprocess the Data

In [72]:
iris_df.info()
iris_df.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

In [73]:
#Drop the ID column (not useful for prediction)
iris_df.drop('Id', axis=1, inplace=True)

In [74]:
#Encode species labels to numeric values
le = LabelEncoder()
iris_df['Species'] = le.fit_transform(iris_df['Species'])

# Split the Data

In [75]:
iris_X = iris_df.drop('Species', axis=1)
iris_y = iris_df['Species']

iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(
    iris_X, iris_y, test_size=0.2, random_state=42)

# Train the Model

In [76]:
iris_model = RandomForestClassifier(n_estimators=100, random_state=42)
iris_model.fit(iris_X_train, iris_y_train)

RandomForestClassifier(random_state=42)

# Evaluate the Model

In [77]:
iris_preds = iris_model.predict(iris_X_test)
print("Iris Accuracy:", accuracy_score(iris_y_test, iris_preds))

Iris Accuracy: 1.0


# Make Predictions on New Data

In [78]:
iris_sample = pd.DataFrame([{
    'SepalLengthCm': 5.1,
    'SepalWidthCm': 3.5,
    'PetalLengthCm': 1.4,
    'PetalWidthCm': 0.2
}])
print("Predicted Species:", le.inverse_transform(iris_model.predict(iris_sample))[0])

Predicted Species: Iris-setosa
