# Lesson 1: Configuration Management (Hydra)

**Module 4b: Advanced Tooling**  
**Estimated Time**: 1 hour  
**Difficulty**: Beginner

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand why huge `argparse` scripts are bad  
âœ… Learn **Hydra** for hierarchical configuration  
âœ… Implement config composition (overriding defaults via CLI)  
âœ… Answer interview questions on reproducible configuration  

---

## ðŸ“š Table of Contents

1. [The Problem: Argparse Hell](#1-problem)
2. [The Solution: Hydra & OmegaConf](#2-hydra)
3. [Hands-On: Switching to Hydra](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. The Problem: Argparse Hell

In MLOps, experiments have hundreds of parameters:
- Model params (layers, hidden size)
- Training params (lr, batch size, optimizer)
- Data params (path, normalization)

Passing 50 arguments via CLI is error-prone:
`python train.py --lr 0.01 --batch 64 --model resnet --layers 50 --dropout 0.3 ...`

It's hard to track **what config generated what result**.

## 2. The Solution: Hydra & OmegaConf

**Hydra** (by Meta) allows you to:
1. Define configs in YAML files.
2. Group configs hierarchically (`conf/model/resnet.yaml`, `conf/model/bert.yaml`).
3. Override them easily from CLI (`python train.py model=bert`).
4. Automatically log the config used for every run.

## 3. Hands-On: Switching to Hydra

Requires `pip install hydra-core`.

In [None]:
# NOTE: Hydra normally runs as a script decorator. 
# In Jupyter, we use the Compose API or simulate the file structure.

import os
import yaml

# 1. Create a simplified config structure
os.makedirs("conf", exist_ok=True)

# config.yaml (Main entry point)
config_yaml = """
defaults:
  - model: resnet
  - dataset: cifar10

training:
  epochs: 10
  lr: 0.01
"""
with open("conf/config.yaml", "w") as f:
    f.write(config_yaml)

os.makedirs("conf/model", exist_ok=True)
# model/resnet.yaml
with open("conf/model/resnet.yaml", "w") as f:
    f.write("name: resnet18\ndepth: 18")

print("Created Config Files.")

# 2. Simulate Hydra Loading (using OmegaConf directly for Notebook demo)
from omegaconf import OmegaConf

# Load main config
conf = OmegaConf.load("conf/config.yaml")

# Load defaults (Manual simulation of what Hydra does)
model_conf = OmegaConf.load("conf/model/resnet.yaml")
conf.merge_with(model_conf)

print("\n--- Loaded Config ---")
print(OmegaConf.to_yaml(conf))

print("\n--- Accessing Values ---")
print(f"Learning Rate: {conf.training.lr}")
print(f"Model Name: {conf.name}")

## 4. Interview Preparation

### Common Questions

#### Q1: "Why use YAML configs instead of Python constants?"
**Answer**: "YAML separates code from configuration. This allows me to change hyperparameters without touching the code, which is safer for reproducibility. It also allows automated hyperparameter sweep tools to inject values easily."

#### Q2: "How do you manage secrets (API Keys) in configuration?"
**Answer**: "Never commit secrets to YAML. I use **Environment Variables** interpolation. In Hydra/OmegaConf, I can use `${oc.env:MY_API_KEY}` to pull from the environment at runtime."