# 📚 Week 2: Machine Learning Foundations – Notes

## 🔹 Day 8: ML Basics
- **Key Concepts**
  - **Features (X):** Input variables used for prediction.
  - **Labels (y):** Target/output variable.
  - **Overfitting:** Model fits training data too well, performs poorly on unseen data.
  - **Underfitting:** Model is too simple, misses important patterns.
- **General Workflow**
  1. Define problem
  2. Prepare data
  3. Choose model
  4. Train
  5. Evaluate
  6. Improve

---

## 🔹 Day 9: Train/Test Split
- **Purpose:** Evaluate model performance on unseen data.
- **Sklearn Syntax**
  ```python
  from sklearn.model_selection import train_test_split
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


- Baseline Models

    - Regression: Predict mean of target.
    - Classification: Predict majority class.

## Day 10: Linear Regression

- Use Case: Predict continuous values (e.g., housing prices).
- Formula: ŷ = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

- Evaluation Metrics
    - R² (explained variance)
    - MAE (Mean Absolute Error)
    - MSE (Mean Squared Error)

- Visualization Idea: Plot predicted vs actual values.

## Day 11: Logistic Regression

- Use Case: Binary classification (e.g., Titanic survival).
- Output: Probability between 0 and 1 → mapped to class (0/1).

- Interpretation:
    - coef_ shows effect of each feature on odds of outcome.
    - Positive coef → increases probability of class 1.
    - Evaluation: Confusion matrix + metrics (see Day 12).

## Day 12: Evaluation Metrics

- Classification Metrics
    - Accuracy = (TP + TN) / Total
    - Precision = TP / (TP + FP)
    - Recall = TP / (TP + FN)
    - F1 = 2 × (Precision × Recall) / (Precision + Recall)
    - ROC-AUC: Probability that model ranks a random positive higher than random negative.

<strong>Tip: Always compare multiple metrics, not just accuracy.</strong><br>
<strong>Custom Function Example</strong>

<pre>
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

def evaluate_classification(y_true, y_pred, y_proba=None):
    metrics = {
        "Accuracy": accuracy_score(y_true, y_pred),
        "Precision": precision_score(y_true, y_pred),
        "Recall": recall_score(y_true, y_pred),
        "F1": f1_score(y_true, y_pred)
    }
    if y_proba is not None:
        metrics["ROC-AUC"] = roc_auc_score(y_true, y_proba)
    return metrics
</pre>

## 13: Project – Titanic Survival Prediction

- Steps

    1. Load dataset

    2. Clean data (handle missing values, drop irrelevant cols)

    3. Feature engineering (e.g., convert categorical → numeric)

    4. Train Logistic Regression model

    5. Evaluate performance (accuracy, precision, recall, F1, ROC-AUC)

    6. Interpret results (important features)

- Deliverable: Clear explanation + visualizations.