# Hydra Configurations Tutorial

## Introduction

CMB-ML manages a complex pipeline that processes data across multiple stages. Each stage produces outputs that need to be tracked, reused, and processed in later stages. Without a clear framework, this can lead to disorganized code, redundant logic, and errors.

The CMB-ML library offers a set of tools to manage these pipelines in a modular and scalable way. At the core of this approach is configuration management, which cleanly separates the logic of the process from its parameters. This separation ensures that the code remains streamlined while the details stay isolated and easy to manage.

This notebook introduces Hydra, a tool developed by Meta to allow for elegant configuration management of complex programs.

## Contents
View this notebook with [nbviewer](https://nbviewer.org/github/CMB-ML/cmb-ml/tree/main/demonstrations/A_hydra_tutorial.ipynb#Introduction) (or in your IDE) to enable these links.

- [Simple configurations](#Simple-configurations)
- [Nested configurations](#Nested-configurations)
- [The defaults list](#The-defaults-list)
- [Config initialization](#Initializing-the-config)
- [Next Steps](#Next-steps)

# Simple configurations

In [1]:
import hydra
from hydra import compose, initialize
from omegaconf import DictConfig, OmegaConf

Hydra allows us to more simply load configurations.

Consider the hydra configuration `tutorial_configs/simple.yaml`:

``` yaml
some_string: abc
some_number: 3
```

We can pull simple strings and scalars from the configuration. We can access the `cfg` object either as a dict or by using dot notation:

In [2]:
with initialize(version_base=None, config_path="tutorial_configs"):
    cfg = compose(config_name='simple')
    n_repeats = cfg['some_number']
    my_text = cfg.some_string
    for i in range(n_repeats):
        print(my_text)

abc
abc
abc


# Nested configurations

Consider the Hydra configuration `tutorial_configs/simple2.yaml`:

``` yaml
shapes:
  - icon1:
    shape: square
    color: blue
  - icon2:
    shape: circle
    color: red
```

We use this to show how dot notation is used for nested configurations.

In [3]:
icon_mapping = {
    ('square', 'blue'): '🟦',
    ('circle', 'red'): '🔴',
    ('square', 'red'): '🟥',
    ('circle', 'blue'): '🔵'
}
with initialize(version_base=None, config_path="tutorial_configs"):
    cfg = compose(config_name='simple2')

    for icon in cfg.shapes:
        print(icon_mapping[icon.shape, icon.color])

🟦
🔴


# The defaults list

Hydra can also compose configurations, using a defaults list.

Consider the `tutorial_configs/defaults_example.yaml`:

```yaml
defaults:
  - scenario: scenario_512
  - splits: all
  - _self_

preset_strings : ["d9", "s4", "f1"]
```

Because we have the following directory structure in tutorial configs:
```
├─ tutorial_configs
│  ├─ scenario
|  |   ├─ scenario_128.yaml
|  |   └─ scenario_512.yaml
│  ├─ splits
│  │   ├─ 1-1.yaml
│  │   └─ all.yaml
│  └─ sample_cfg.yaml
└── tutorial notebooks here
```

When we specify the `sample_cfg.yaml` configuration, it automatically composes the following:

In [4]:
with initialize(version_base=None, config_path="tutorial_configs"):
    cfg = compose(config_name='sample_cfg')
    print(OmegaConf.to_yaml(cfg))

scenario:
  nside: 512
  map_fields: IQU
  precision: float
  units: uK_CMB
splits:
  name: '1450'
  Train:
    n_sims: 1000
  Valid:
    n_sims: 250
  Test:
    n_sims: 200
preset_strings:
- d9
- s4
- f1



We can override these default configurations and use a different file from our directory.

In [5]:
with initialize(version_base=None, config_path="tutorial_configs"):
    cfg = compose(config_name='sample_cfg',
                  overrides=['scenario=scenario_128', 'splits="1-1"'])
    print(OmegaConf.to_yaml(cfg))

scenario:
  nside: 128
  map_fields: I
  precision: float
  units: uK_CMB
splits:
  name: 1-1
  Test:
    n_sims: 1
preset_strings:
- d9
- s4
- f1



# Initializing the config

There are many ways to initialize Hydra configs.

In **Python modules** we use method similar to what was above. Wrapping the `main()` entrypoint to get the Hydra configuration also allows us to let Hydra manage logging. This is used throughout the top-level scripts (`main_<x>.py`). See [this python module](./B_hydra_script_tutorial.py) for an example. Generally, it looks like:

```python
@hydra.main(version_base=None, config_path="tutorial_configs", config_name="sample_cfg")
def main(cfg: DictConfig) -> None:
    do_something(cfg)
```

In the remaining **Jupyter notebooks**, we use a different instantiation method to make them global.

In [6]:
hydra.core.global_hydra.GlobalHydra.instance().clear() # if re-initialize is needed, clear the global hydra instance (in case of multiple calls to initialize)

initialize(version_base=None, config_path="tutorial_configs")

cfg = compose(config_name='sample_cfg')

print(OmegaConf.to_yaml(cfg))

scenario:
  nside: 512
  map_fields: IQU
  precision: float
  units: uK_CMB
splits:
  name: '1450'
  Train:
    n_sims: 1000
  Valid:
    n_sims: 250
  Test:
    n_sims: 200
preset_strings:
- d9
- s4
- f1



# Next steps

It may seem strange that this is where we begin the tutorial but we'll be using Hydra for the automated scripts.

For more information on how we use Hydra configs, refer to:
- [Hydra documentation](https://hydra.cc/docs/intro/)
- [The top level configs README](../cfg/README.md)
- [The pipeline configs README](../cfg/pipeline/README.md)

Continue with [setting up your local system](./C_setting_up_local.ipynb)