# Module 6 — SVM Practice (Student Notebook)
Goal: practice linear and non-linear SVMs on a shared dataset.
Write your own code. Only the dataset cell is provided.

In [None]:
# Step 0 — Imports (you may add more if needed)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, classification_report, ConfusionMatrixDisplay

np.random.seed(42)

## Step 1 — Shared dataset
Two features that make sense to students: Hours and Attendance.  
Target: `Passed ∈ {0, 1}`.  
You will **not** modify this cell, so everyone uses the same data.

In [5]:
# Provided: build the shared dataset
n = 120
hours = np.random.uniform(1, 10, n)
attendance = np.random.uniform(50, 100, n)

# Simple latent score with noise
score = 0.5 * hours + 0.05 * attendance + np.random.normal(0, 1.5, n)
passed = (score > 8).astype(int)

df = pd.DataFrame({"Hours": hours, "Attendance": attendance, "Passed": passed})
display(df.head())
print("Class balance (mean of Passed):", df["Passed"].mean().round(3))

Unnamed: 0,Hours,Attendance,Passed
0,4.370861,90.372008,0
1,9.556429,94.804565,1
2,7.587945,65.900174,0
3,6.387926,55.502596,0
4,2.404168,61.396758,0


Class balance (mean of Passed): 0.233


## Step 2 — Quick visualization
Scatter plot Hours vs Attendance colored by Passed.  
Helps you guess whether a linear split might work.

In [None]:
# TODO:
# 1) Make a scatter plot of Hours vs Attendance, color by Passed.
# 2) Label axes and title clearly.


## Step 3 — Train/test split + scaling
Split the data with stratification.  
Scale features using `StandardScaler` fitted on the train set only.

In [None]:
# TODO:
# 1) X = df[["Hours", "Attendance"]]; y = df["Passed"]
# 2) train_test_split with stratify=y, random_state=42, test_size=0.25
# 3) Fit scaler on X_train only, transform X_train and X_test


## Step 4 — Linear SVM
Train a linear SVM on scaled features. Report accuracy, F1, and a classification report.

In [None]:
# TODO:
# 1) Create SVC(kernel="linear", C=1.0, random_state=42)
# 2) Fit on X_train_scaled, y_train
# 3) Predict on X_test_scaled
# 4) Print accuracy and F1 with three decimals
# 5) Print classification report


## Step 5 — Visualize linear decision boundary
Make a helper that plots the decision regions and the training points in scaled space. Then call it for the linear model.

In [None]:
# TODO:
# 1) Build a function plot_boundary(model, X_scaled, y, title)
#    - make a meshgrid
#    - predict over the grid
#    - contourf for regions
#    - scatter the points
# 2) Call it on the training set with your linear SVM


## Step 6 — Try non-linear kernels (Polynomial and RBF)
Train `SVC` with `kernel="poly"` (degree 3) and `kernel="rbf"`. Compare accuracy and F1 to the linear model.

In [None]:
# TODO:
# 1) Train poly SVM (degree=3). Evaluate on test.
# 2) Train rbf SVM. Evaluate on test.
# 3) Print a small table or clear prints for all three models.


## Step 7 — Boundary plots for poly and RBF
Use your plot function to visualize their decision surfaces.

In [None]:
# TODO:
# 1) Plot boundary for poly SVM
# 2) Plot boundary for rbf SVM


## Step 8 — Confusion matrix
Pick your best model and show the confusion matrix on the test set.

In [None]:
# TODO:
# 1) Compute predictions for the best model
# 2) Use ConfusionMatrixDisplay.from_predictions(...)
# 3) Add a title


## Step 9 — Support vectors (inspection)
For each trained model, inspect how many support vectors it used.

In [None]:
# TODO:
# 1) Access attribute .support_vectors_ for each model
# 2) Print counts and a brief comparison


## Step 10 — Short interpretation
Three short paragraphs:
1) Which model performed best and why that makes sense given the scatter plot.  
2) What the boundary plots suggest about the relationship between Hours and Attendance.  
3) What you would try next to improve performance (feature ideas or hyperparameters like C and gamma).

In [None]:
# TODO:
# Write 3–5 sentences in a markdown cell above, or print them here as a string.
