**Task 1**
Iris Flower Classification

● Use measurements of Iris flowers (setosa, versicolor, virginica) as input data.
● Train a machine learning model to classify the species based on these measurements.
● Use libraries like Scikit-learn for easy dataset access and model building.
● Evaluate the model’s accuracy and performance using test data.
● Understand basic classification concepts in machine learning.


In [None]:
# 1. Use measurements of Iris flowers (setosa, versicolor, virginica) as input data.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

df = pd.read_csv("/content/Iris.csv")

if "Id" in df.columns:
    df = df.drop("Id", axis=1)

X = df.drop("Species", axis=1)
y = df["Species"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Decision Tree Accuracy:", accuracy_score(y_test, y_pred))


Decision Tree Accuracy: 1.0


In [None]:
# 2. Train a machine learning model to classify the species based on these measurements.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

df = pd.read_csv("/content/Iris.csv")

X = df[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]
y = df["Species"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

sample = [[5.1, 3.5, 1.4, 0.2]]
print("Predicted species:", model.predict(sample)[0])




Accuracy: 1.0

Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

Predicted species: Iris-setosa




In [None]:
# 3. Use libraries like Scikit-learn for easy dataset access and model building.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

df = pd.read_csv("/content/Iris.csv")

X = df[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]
y = df["Species"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

sample = [[5.1, 3.5, 1.4, 0.2]]
print("Predicted Species:", model.predict(sample)[0])


Accuracy: 1.0

Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

Predicted Species: Iris-setosa




In [None]:
# 4. Evaluate the model’s accuracy and performance using test data.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

df = pd.read_csv("/content/Iris.csv")

X = df[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]
y = df["Species"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))



Accuracy: 1.0

Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30


Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


# 5. Understand basic classification concepts in machine learning.

What is Classification?

Classification is a supervised machine learning type.

The objective: Predict labels (categories) from input features.

Example: Based on flower measurements (sepal length, width, etc.), predict the species (Setosa, Versicolor, Virginica).

1. Features (X)

Input variables used to predict.

Example (Iris dataset):

SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm.

2. Target / Labels (y)

Output category we need to predict.

Example: Species (Setosa, Versicolor, Virginica).

3. Training Data vs Test Data

Training Data → Utilized to train the model patterns.

Test Data → Utilized to validate how well the model performs on unseen data.

4. Model

Mathematical function/algorithm that learns from data.

Examples:

Decision Tree

Logistic Regression

Random Forest

Support Vector Machine (SVM)

Neural Networks

5. Predictions

After training, the model makes the class prediction of new input data.

Example: [5.1, 3.5, 1.4, 0.2] → Setosa.
Evaluation Metrics
1. Accuracy

Percentage of correct predictions.

Example: Out of 100 predictions, 90 correct → Accuracy = 90%.

2. Confusion Matrix

Indicates how predictions match up against actual labels.

Rows = Actual, Columns = Predicted.

3. Precision

Of all predicted positives, how many were truly positive?

Good when false positives are significant (e.g., detection of spam email).

4. Recall

Of all actual positives, how many were we able to detect?

Good when false negatives are significant (e.g., detection of cancer).

5. F1-score

Harmonic mean of precision & recall (balance between them).

Iris Example (simple)

Input: Measurements of flowers.

Model: Decision Tree.

Output: Classification of species.

Accuracy: ~95–100% on test set.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

df = pd.read_csv("/content/Iris.csv")

X = df[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]
y = df["Species"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = DecisionTreeClassifier(random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

sample = [[5.1, 3.5, 1.4, 0.2]]  # Example flower measurements
print("\nPredicted Species for sample:", model.predict(sample)[0])
