# Regression with Multiple Features — Level 3: Real-World Application

In this level, you will build **production-style Python code** to solve a real problem:

**Task**: Predict diabetes disease progression using the Diabetes dataset.

### Dataset: Diabetes

| Property | Value |
|----------|-------|
| Samples | 442 |
| Features | 10 |
| Target | Quantitative measure of disease progression one year after baseline |

**Features:**
- age — Age of the patient
- sex — Sex of the patient
- bmi — Body mass index
- bp — Average blood pressure
- s1–s6 — Six blood serum measurements

### What You'll Implement

Your code lives in `src/multiple_regression/`:

| File | What to implement |
|------|-------------------|
| `model.py` | `fit()`, `predict()`, `evaluate()`, `get_coefficients()` |
| `pipeline.py` | `load_data()`, `explore_data()`, `preprocess_data()`, `split_data()` |

Tests are in `tests/test_multiple_regression.py`.

---

## Step 1: Implement the Code

Open these files and complete all the `TODO` sections:

1. `src/multiple_regression/pipeline.py` — Start here (data loading, preprocessing, splitting)
2. `src/multiple_regression/model.py` — Then implement the model class

Each function has docstrings and step-by-step hints.

## Step 2: Run the Tests

Run the test suite to validate your implementation. All tests should pass.

In [None]:
import os

os.chdir(
    os.path.join(os.path.dirname(os.getcwd().split("notebooks")[0]), "ml_playground")
)

!pytest tests/test_multiple_regression.py -v

## Step 3: Run the Full Pipeline

Once all tests pass, run the end-to-end pipeline to train on real data and see results.

In [None]:
from src.multiple_regression.pipeline import run_pipeline

run_pipeline()

## Step 4: Explore Further

After your pipeline runs successfully, try these experiments in the cells below:

1. Which features have the strongest influence on diabetes progression?
2. What is the R² on the test set? Is the model underfitting or overfitting?
3. Plot predicted vs actual values — are there patterns the model misses?
4. Try removing features with low importance — does the model performance change?

In [None]:
# Your exploration code here...

---

## Level 3 Complete!

You have now implemented multiple feature regression at three levels:
- **Level 1**: From scratch with NumPy (vectorized math)
- **Level 2**: With scikit-learn (industry tools & StandardScaler)
- **Level 3**: On real data with proper project structure (production skills)

### Key Takeaways from Week 2

- Vectorization (`X @ w` instead of loops) is essential for efficiency
- Feature scaling is critical when features have different ranges
- Scaled coefficients reveal relative feature importance
- More features don't always mean better performance

Ready for the next topic? Move on to **Week 3**!