# **Machine Learning Specialist Interview Study Notebook**

---
### **Overview**
This notebook is designed to help you review and practice essential **Machine Learning** concepts for your upcoming **ML Specialist interview**. It blends clear **Markdown explanations** with **Python code scaffolding** to let you test, explore, and visualize each concept — similar to your previous workshop notebooks.

---
## **1. Setup and Imports**

In this section, we import the common Python libraries used for Machine Learning experiments.

```python
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, accuracy_score, f1_score, log_loss
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
```

---
## **2. Non-Parametric Models Practice: KNN vs Decision Tree**

**Concept:**  
Non-parametric models do not assume a fixed functional form. They are flexible and adapt to data patterns, but can easily overfit. Here, we compare **KNN** and **Decision Tree** models.

```python
# Create a simple synthetic dataset
np.random.seed(42)
data = pd.DataFrame({
    'Feature1': np.linspace(0, 10, 100),
    'Feature2': np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.2, 100),
})
data['Target'] = (data['Feature2'] > 0).astype(int)

# Split dataset
X = data[['Feature1', 'Feature2']]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train KNN model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
knn_pred = knn.predict(X_test)

# Train Decision Tree model
tree = DecisionTreeClassifier(max_depth=4, random_state=42)
tree.fit(X_train, y_train)
tree_pred = tree.predict(X_test)

print(f"KNN Accuracy: {accuracy_score(y_test, knn_pred):.3f}")
print(f"Decision Tree Accuracy: {accuracy_score(y_test, tree_pred):.3f}")
```

**Try it yourself:**  
- Change `n_neighbors` in KNN (try 1, 3, 7) and observe accuracy.  
- Change `max_depth` in Decision Tree and check if overfitting occurs.

---
## **3. Cross-Entropy (Log Loss) Understanding**

**Concept:**  
Cross-Entropy, or Log Loss, is used for classification models like Logistic Regression. It penalizes incorrect predictions based on their confidence.

```python
# True labels and predicted probabilities
y_true = np.array([1, 0, 1, 1, 0])
y_pred_prob = np.array([0.9, 0.2, 0.8, 0.4, 0.1])

loss = log_loss(y_true, y_pred_prob)
print(f"Cross-Entropy Loss: {loss:.4f}")
```

**Exercise:** Modify `y_pred_prob` to include higher confidence errors (e.g., change `0.1` to `0.95`) and observe how the loss increases.

---
## **4. Model Evaluation: MSE vs MAE**

**Concept:**  
Regression models are evaluated using metrics like **MSE** (Mean Squared Error) and **MAE** (Mean Absolute Error). MSE penalizes large errors more heavily.

```python
# Synthetic regression dataset
np.random.seed(0)
X_reg = np.linspace(0, 10, 50)
y_reg = 3*X_reg + np.random.normal(0, 2, 50)

X_train, X_test, y_train, y_test = train_test_split(X_reg.reshape(-1, 1), y_reg, test_size=0.3, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"MSE: {mean_squared_error(y_test, y_pred):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.3f}")
```

**Try this:** Add an outlier (e.g., `y_test[0] += 25`) and check how MSE and MAE respond.

---
## **5. Bias-Variance Tradeoff Visualization**

**Concept:**  
A key tradeoff in ML:  
- High bias → underfitting  
- High variance → overfitting  
We visualize this using polynomial fits of increasing complexity.

```python
def plot_bias_variance_demo():
    np.random.seed(10)
    X = np.linspace(0, 6, 30)
    y_true = np.sin(X)
    y_noisy = y_true + np.random.normal(0, 0.2, X.shape)

    plt.figure(figsize=(10, 6))
    degrees = [1, 3, 9]
    for d in degrees:
        coeffs = np.polyfit(X, y_noisy, d)
        y_pred = np.polyval(coeffs, X)
        plt.plot(X, y_pred, label=f'Degree {d}')

    plt.scatter(X, y_noisy, color='black', s=20, label='Data')
    plt.plot(X, y_true, 'g--', label='True Function')
    plt.title('Bias-Variance Tradeoff: Model Complexity')
    plt.legend()
    plt.show()

plot_bias_variance_demo()
```

**Reflect:**  
- Low-degree (1) = High bias, Low variance  
- High-degree (9) = Low bias, High variance  
- The goal is a **balanced complexity**.

---
## **6. Self-Practice Tasks**

Use these open-ended exercises to deepen your understanding:

1. Adjust `K` in KNN and observe accuracy trends.
2. Increase Decision Tree `max_depth` and visualize overfitting.
3. Add Gaussian noise to regression targets and compare MSE vs MAE.
4. Compute **Precision, Recall, and F1-score** for KNN predictions.
5. Replot the bias–variance chart using more polynomial degrees.

---
### **Next Steps**
After completing this notebook, review the **Study Guide** topics: Supervised vs Unsupervised Learning, Regression Analysis, Logistic Regression, KNN, Decision Trees, and Evaluation Metrics.

This notebook is structured to resemble your shared workshop notebooks — combining conceptual explanations with practical code you can run and modify freely.
