<a href="https://colab.research.google.com/github/bozhang72/Hands-On-Large-Language-Models/blob/main/%E2%AD%90_KeyDoc_MLE_Interview_coding_examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Core ML Knowledge
- [ ] Linear & Logistic Regression: interpretation, regularization (L1/L2)
- [ ] Decision Trees, Random Forests, XGBoost: pros/cons, overfitting control
- [ ] K-Means, PCA, t-SNE: dimensionality reduction & clustering
- [ ] Loss functions: MSE, cross-entropy, hinge, contrastive loss
- [ ] Evaluation metrics: accuracy, AUC, precision/recall, log loss
- [ ] Model selection: cross-validation, early stopping, learning curves
- [ ] Overfitting & bias-variance tradeoff
- [ ] Feature importance: SHAP, gain, permutation

### Linear && Logistic Regression

In [3]:
# Linear Regression

import torch
import torch.nn.functional as F

class LogisticRegressionL2(torch.nn.Module):
    def __init__(self, input_dim, l2_lambda=0.01):
        super().__init__()
        self.weights = torch.nn.Parameter(torch.randn(input_dim, 1))  # shape: (D, 1)
        self.bias = torch.nn.Parameter(torch.zeros(1))
        self.l2_lambda = l2_lambda

    def forward(self, X):
        logits = X @ self.weights + self.bias  # shape: (N, 1)
        return torch.sigmoid(logits)  # probability output

    def loss(self, X, y):
        preds = self.forward(X).squeeze()
        bce = F.binary_cross_entropy(preds, y)
        l2_penalty = self.l2_lambda * torch.sum(self.weights ** 2)
        return bce + l2_penalty


# Example data (binary classification)
torch.manual_seed(42)
X = torch.randn(100, 5)  # 100 samples, 5 features
true_w = torch.tensor([[1.], [-1.], [0.5], [0.], [2.]])
y_prob = torch.sigmoid(X @ true_w).squeeze()
y = (y_prob > 0.5).float()  # binary labels

# Model + optimizer
model = LogisticRegressionL2(input_dim=5, l2_lambda=0.1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Train
for epoch in range(1000):
    optimizer.zero_grad()
    loss = model.loss(X, y)
    loss.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")


Epoch 0, Loss: 1.0018
Epoch 100, Loss: 0.5242
Epoch 200, Loss: 0.5239
Epoch 300, Loss: 0.5239
Epoch 400, Loss: 0.5239
Epoch 500, Loss: 0.5239
Epoch 600, Loss: 0.5239
Epoch 700, Loss: 0.5239
Epoch 800, Loss: 0.5239
Epoch 900, Loss: 0.5239


### Decision Tree, Random Forest, XGBoost

# üå≥ Decision Trees vs. Random Forests vs. XGBoost

## 1. Decision Trees

- **Concept**: Tree-based model that splits data based on feature thresholds to minimize impurity (e.g., Gini or entropy).
- **Pros**:
  - Easy to interpret
  - Fast to train
  - Handles non-linearities
  - Requires minimal data prep
- **Cons**:
  - Overfits easily
  - Unstable (small data change ‚Üí different tree)
  - Low predictive power as a single model
- **Use Case**: Quick baseline, interpretable decision rules

---

## 2. Random Forests

- **Concept**: Ensemble of many decision trees trained on random subsets of data + features (bagging). Final output is majority vote or average.
- **Pros**:
  - Reduces overfitting
  - High accuracy
  - Robust to noise
  - Handles missing data & outliers well
- **Cons**:
  - Slower than a single tree
  - Less interpretable
  - Can be memory-intensive
- **Use Case**: General-purpose, good first choice for tabular data

---

## 3. XGBoost (Extreme Gradient Boosting)

- **Concept**: Gradient boosting framework that builds trees sequentially, each correcting the previous one's errors using gradient descent.
- **Pros**:
  - State-of-the-art performance on tabular data
  - Regularization (L1/L2)
  - Missing value handling
  - Fast and scalable
- **Cons**:
  - More complex to tune (learning rate, depth, etc.)
  - Slower training than Random Forests
- **Use Case**: Competitions, production models needing best performance

---

## üîç Summary Comparison Table

| Model            | Overfitting | Accuracy     | Interpretability | Speed     | Feature Importance |
|------------------|-------------|--------------|------------------|-----------|--------------------|
| Decision Tree    | High        | Low‚ÄìMedium   | High             | Very Fast | ‚úÖ Yes             |
| Random Forest    | Low         | High         | Medium           | Moderate  | ‚úÖ Yes (avg)       |
| XGBoost          | Very Low    | Very High    | Low              | Slower    | ‚úÖ Yes (boosted gain) |


## 2. Deep Learning Engineering
- [ ] Neural nets: architecture, forward/backward pass
- [ ] CNNs: convolution, pooling, receptive field
- [ ] RNNs/LSTMs/GRUs: sequence modeling, time steps
- [ ] Transformers: attention, multi-head, positional encoding
- [ ] Model debugging: vanishing gradients, dead neurons, exploding gradients
- [ ] Training tricks: batch norm, dropout, LR schedules, warmup

## 3. PyTorch Coding Proficiency
- [ ] `nn.Module`, forward(), register_buffer
- [ ] Custom loss functions, metrics
- [ ] Data pipeline with `Dataset` & `DataLoader`
- [ ] Training loop: batching, optimizer steps, validation
- [ ] Save/load models with `state_dict`
- [ ] TorchScript, quantization, `torch.compile` (for deployment)

## 4. ML System Design
- [ ] Offline vs Online inference: batch scoring vs real-time predictions
- [ ] Model training pipeline: from raw data ‚Üí features ‚Üí training
- [ ] Feature store: freshness, reuse, consistency with production
- [ ] Model serving: REST/gRPC endpoint, latency/batching tradeoffs
- [ ] Retraining strategies: time-based, performance-based, online learning
- [ ] Monitoring: latency, throughput, drift, A/B testing metrics
- [ ] Model versioning, rollback, shadow deployment
- [ ] GPU vs CPU serving trade-offs (e.g. large embedding tables)

## 5. Real-World ML Problem-Solving
- [ ] Build an end-to-end ML solution: ingestion, features, model, metrics
- [ ] Tradeoff discussion: model complexity vs latency, accuracy vs robustness
- [ ] Production bugs: feature drift, label leakage, data inconsistency
- [ ] Model impact analysis: lift in CTR, retention, engagement

## 6. Coding Interviews (Data Structures + ML Coding)
- [ ] Arrays, strings, hashmaps, sliding window, two pointers
- [ ] Matrix operations: multiplication, diagonal, transpose
- [ ] Graph algorithms (BFS/DFS)
- [ ] Binary search, heap, queue
- [ ] Implement basic models: logistic regression, dot-product similarity
- [ ] Vectorized PyTorch/Numpy code for mini ML problems

## 7. Recommender/NLP (if relevant)
- [ ] Pointwise vs Pairwise vs Listwise ranking
- [ ] Implicit feedback, negative sampling
- [ ] Collaborative filtering vs content-based filtering
- [ ] Embedding generation, cosine similarity, ANN search
- [ ] Tokenization, embeddings (word2vec/BERT), transformer fine-tuning

## 8. Behavioral & Communication
- [ ] Use STAR framework to talk through ML projects
- [ ] Metrics-focused results (e.g., latency reduced 40%, AUC +0.03)
- [ ] Talk about tradeoffs: e.g., 'We chose X over Y because latency mattered more than accuracy'
- [ ] How you handled bad data, retraining failures, rollout problems

## Bonus for Staff-Level Candidates
- [ ] Leading project roadmap: how you align with PM, DS, infra
- [ ] Design doc writing: include assumptions, tradeoffs, metrics
- [ ] Mentoring or setting coding/design standards
- [ ] Influencing architecture across teams (e.g. shared embedding service)