## Week 2 — Supervised Learning Classifier (Iris)
Goal: turn the Week 1 dot-product “score” idea into a real classifier by training a model that learns weights from data.

In [1]:
import sys
from pathlib import Path
sys.path.append(str(Path("..").resolve()))

import numpy as np
import pandas as pd
from IPython.display import display

from src.iris_utils import load_iris_df, IRIS_FEATURES

df = load_iris_df(add_species_name=True)
X = df[IRIS_FEATURES].to_numpy()
y = df["species"].to_numpy()

df.head()


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species,species_name
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


#### Train/Test Split
We split the dataset so we can measure how well the classifier generalizes to unseen data.

In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

X_train.shape, X_test.shape


((120, 4), (30, 4))

#### Train a Classifier: Logistic Regression
Logistic regression learns weights for a linear scoring function and converts scores into class predictions.


In [3]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

print("Train accuracy:", clf.score(X_train, y_train))
print("Test accuracy:", clf.score(X_test, y_test))


Train accuracy: 0.975
Test accuracy: 0.9666666666666667


#### Evaluation
We evaluate predictions using a confusion matrix and classification report.


In [4]:
from sklearn.metrics import classification_report, confusion_matrix

y_pred = clf.predict(X_test)

print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification report:\n", classification_report(y_test, y_pred))


Confusion matrix:
 [[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]

Classification report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.90      0.95        10
           2       0.91      1.00      0.95        10

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



#### Connection to Week 1 Dot Product
The model learned a set of weights (one per feature per class). In Week 1 we manually chose weights; in Week 2 the algorithm learns them from data.


In [5]:
import pandas as pd
from IPython.display import display

# Each row corresponds to a class (0, 1, 2)
# Each column corresponds to a feature
weights = clf.coef_
bias = clf.intercept_

weights_df = pd.DataFrame(
    weights,
    columns=IRIS_FEATURES,
    index=["setosa (0)", "versicolor (1)", "virginica (2)"]
)

display(weights_df)
print("Intercepts (bias terms):", bias)

pred_df = pd.DataFrame(X_test, columns=IRIS_FEATURES)
pred_df["y_true"] = y_test
pred_df["y_pred"] = y_pred
pred_df.to_csv("../data/iris_predictions_week2.csv", index=False)


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
setosa (0),-0.529663,0.827414,-2.3472,-0.993506
versicolor (1),0.529739,-0.30495,-0.170974,-0.856209
virginica (2),-7.6e-05,-0.522463,2.518174,1.849716


Intercepts (bias terms): [ 10.12535658   1.79828599 -11.92364257]
