
# JobInterviewGuide_Workshop — **Solutions Key**

This notebook provides worked solutions and example outputs corresponding to each section of the workshop.
Use it to check your work after attempting the exercises yourself.



## 1) Logistic Regression & the Sigmoid Output — Solution

Sigmoid outputs **probabilities**. Thresholding at 0.5 yields class predictions.


In [None]:

import math
z_values = [-4, -1, 0, 1, 3]
def sigmoid(z):
    return 1.0 / (1.0 + math.exp(-z))
probs = [sigmoid(z) for z in z_values]
threshold = 0.5
preds = [1 if p >= threshold else 0 for p in probs]
print("z:", z_values)
print("sigmoid(z):", [round(p,4) for p in probs])
print("class predictions @0.5:", preds)



## 2) Cross-Entropy (Log-Loss) vs Regularization — Solution

Cross-entropy measures probability fit; L2 adds a penalty on large weights.


In [None]:

import math
y_true = [1, 0, 1, 1, 0]
y_hat  = [0.9, 0.2, 0.6, 0.8, 0.1]
def binary_cross_entropy(y, p):
    eps = 1e-12
    loss = 0.0
    for yt, pt in zip(y, p):
        pt = min(max(pt, eps), 1.0 - eps)
        loss += -(yt*math.log(pt) + (1-yt)*math.log(1-pt))
    return loss/len(y)
ce = binary_cross_entropy(y_true, y_hat)
w = [0.8, -0.3, 0.5]
lam = 0.1
l2 = lam * sum(wi*wi for wi in w)
total = ce + l2
print("Cross-Entropy Loss:", round(ce, 4))
print("L2 penalty:", round(l2, 4))
print("Total (CE + L2):", round(total, 4))



## 3) Decision Trees — Leaf Nodes = Final Predictions — Solution

Leaf nodes store the final predicted class (or mean for regression).


In [None]:

from sklearn.tree import DecisionTreeClassifier, export_text
X = [[0],[1],[2],[3],[4],[5]]
y = [0,0,0,1,1,1]
clf = DecisionTreeClassifier(max_depth=2, random_state=0)
clf.fit(X,y)
print(export_text(clf, feature_names=["x"]))
print("Pred(1.5) =", clf.predict([[1.5]])[0])
print("Pred(3.2) =", clf.predict([[3.2]])[0])



## 4) Classification Metrics vs Regression Metrics — Solution

F1 is for classification; **R²** is for regression.


In [None]:

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
y_true = [1,0,1,1,0,1]
y_pred = [1,0,0,1,0,1]
print("Confusion Matrix:\n", confusion_matrix(y_true, y_pred))
print("Accuracy:", round(accuracy_score(y_true,y_pred),3))
print("Precision:", round(precision_score(y_true,y_pred),3))
print("Recall:", round(recall_score(y_true,y_pred),3))
print("F1:", round(f1_score(y_true,y_pred),3))
print("\nNote: R^2 is a regression metric, not shown here.")



## 5) Parametric vs Non-Parametric — Mini Experiment — Solution

On a non-linear signal, non-parametric KNN typically fits better than linear regression.


In [None]:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

rng = np.random.RandomState(0)
X = np.linspace(-3, 3, 60).reshape(-1,1)
y = np.sin(X).ravel() + 0.1*rng.randn(60)

lin = LinearRegression().fit(X,y)
mse_lin = mean_squared_error(y, lin.predict(X))

knn = KNeighborsRegressor(n_neighbors=5).fit(X,y)
mse_knn = mean_squared_error(y, knn.predict(X))

print("MSE Linear Regression (parametric):", round(mse_lin,4))
print("MSE KNN Regressor (non-parametric):", round(mse_knn,4))
print("Observation: Lower MSE indicates better fit; KNN often wins on non-linear patterns.")



## 6) Feature Engineering — Purpose — Solution

Adding informative transforms (e.g., polynomial features) can improve separability and accuracy.


In [None]:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import accuracy_score

rng = np.random.RandomState(42)
X = rng.uniform(-2,2,(300,1))
y = (X[:,0]**2 + rng.normal(0,0.3,300) > 1.0).astype(int)

# Baseline
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=0)
lr = LogisticRegression().fit(X_tr, y_tr)
base_acc = accuracy_score(y_te, lr.predict(X_te))

# With polynomial feature
poly = PolynomialFeatures(degree=2, include_bias=False)
X2 = poly.fit_transform(X)
X2_tr, X2_te, y_tr, y_te = train_test_split(X2, y, test_size=0.3, random_state=0)
lr2 = LogisticRegression().fit(X2_tr, y_tr)
poly_acc = accuracy_score(y_te, lr2.predict(X2_te))

print("Baseline acc (raw):", round(base_acc,3))
print("With polynomial feature acc:", round(poly_acc,3))
print("Observation: Higher accuracy with engineered features indicates improved separability.")



## 7) Train / Validation / Test — Role of Validation — Solution

Validation is used for **hyperparameter tuning** (e.g., via cross-validation) before the final test.


In [None]:

from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

X, y = make_classification(n_samples=400, n_features=6, n_informative=4, random_state=0)
param_grid = {"n_neighbors":[1,3,5,7,9]}
grid = GridSearchCV(KNeighborsClassifier(), param_grid=param_grid, cv=5, scoring="accuracy")
grid.fit(X,y)
print("Best params via CV (validation):", grid.best_params_)
print("Best CV accuracy:", round(grid.best_score_,3))
print("Note: The test set should only be used once for final, unbiased evaluation.")



## 8) Gradient Descent — Optimization — Solution

We minimize MSE for y = wx + b with vanilla GD and converge near the true parameters.


In [None]:

import numpy as np
rng = np.random.RandomState(1)
X = rng.rand(200,1)
true_w, true_b = 2.0, -0.5
y = true_w*X[:,0] + true_b + rng.normal(0,0.05,200)

w, b = 0.0, 0.0
lr = 0.5
for step in range(200):
    y_hat = w*X[:,0] + b
    dw = (2/len(X)) * np.sum((y_hat - y) * X[:,0])
    db = (2/len(X)) * np.sum((y_hat - y))
    w -= lr*dw
    b -= lr*db
    if step % 50 == 0:
        mse = np.mean((y_hat - y)**2)
        print(f"step={step:3d}  w={w:.3f}  b={b:.3f}  MSE={mse:.4f}")
print("Estimated w,b:", round(w,3), round(b,3), "  (true:", true_w, true_b, ")")
