# Setting Up Your Local System

## Introduction

CMB-ML manages a complex pipeline that processes data across multiple stages. Each stage produces outputs that need to be tracked, reused, and processed in later stages. Without a clear framework, this can lead to disorganized code, redundant logic, and errors.

The CMB-ML library provides a set of tools to manage these pipelines in a modular way. A few resources are external; this notebook will help set those up.

## Contents

View this notebook with [nbviewer](https://nbviewer.org/github/CMB-ML/cmb-ml/tree/main/demonstrations/C_setting_up_local.ipynb#Introduction) to enable these links.

- [Setting up the configuration](#Setting-the-local_system-configuration-file)
- [Setting up PyILC](#Setting-up-PyILC)
- [Download external science assets](#Getting-science-assets)
- [Next steps](#Next-steps)

# Setting the local_system configuration file

First you'll create a configuration file.

I suggest using [mine](../cfg/local_system/generic_lab.yaml) as an example. Open that file and take a look. It has two keys.

The `datasets_root` will be where the datasets themselves are written. At first this will contain, for a dataset, only the simulation and the Logs generated while producing that simulation. As more of the pipeline is run, many stages will create folders alongside simulation.

The `assets_dir` is only for the science assets (maps used for noise, instrument parameters, cosmological parameter distributions). It is used once.

Set those according to your local system.

If more granularity of file storage is needed (e.g., you want to store models on a faster drive or analysis results on a slower drive), this can also be done in the pipeline yamls.

## Setting the top level configuration file

We also need to let your system know where that yaml is. This information goes in top level configurations, e.g. [config_setup.yaml](../cfg/config_setup.yaml), which look like:

```yaml
defaults:
  - local_system: ${oc.env:CMB_ML_LOCAL_SYSTEM}
  - file_system : common_fs
  - override hydra/job_logging: custom_log
  - _self_
```

For `local_system`, either change the value to the name of your local system yaml file, e.g.:

```yaml
  - local_system: generic_lab.yaml
```

Or add an environment variable to your system. On linux working with Python scripts, the command `export CMB_ML_LOCAL_SYSTEM=generic_lab.yaml` would be added to your shell startup script. In jupyter notebooks, this is done through the `os` library. When using VS Code for debugging, it needs to be added to your `launch.json` as an environment variable.

## Checking the configuration

Set this up now for both your local system configuration and [config_setup.yaml](../cfg/config_setup.yaml). Let's see how it looks:

In [3]:
import os
import hydra
from hydra import compose, initialize
from omegaconf import OmegaConf

# Set the environment variable, only effective for this notebook.
os.environ['CMB_ML_LOCAL_SYSTEM'] = 'generic_lab'

In [4]:
# The following line is simply in case you run this cell twice.
# This will clear the global hydra instance.
hydra.core.global_hydra.GlobalHydra.instance().clear()

# Get hydra ready
initialize(version_base=None, config_path="../cfg")

# Load the config
cfg = compose(config_name='config_setup.yaml')

# Print the config
print(OmegaConf.to_yaml(cfg))

local_system:
  datasets_root: /data/generic_user/CMB_Data/Datasets/
  assets_dir: /data/generic_user/CMB_Data/Assets/
file_system:
  sim_folder_prefix: sim
  sim_str_num_digits: 4
  dataset_template_str: '{root}/{dataset}/'
  default_dataset_template_str: '{root}/{dataset}/{stage}/{split}/{sim}'
  working_dataset_template_str: '{root}/{dataset}/{working}{stage}/{split}/{sim}'
  subdir_for_log_scripts: scripts
  log_dataset_template_str: '{root}/{dataset}/{hydra_run_dir}'
  log_stage_template_str: '{root}/{dataset}/{working}{stage}/{hydra_run_dir}'
  top_level_work_template_str: '{root}/{dataset}/{stage}/{hydra_run_dir}'
  wmap_chains_dir: WMAP/wmap_lcdm_mnu_wmap9_chains_v5



Those look good to me.

# Setting up PyILC

I've solved this in a way that may not be ideal. Feedback is welcomed.

PyILC isn't structured as an installable library. I've settled on a workaround of importing the necessary elements in a CMB-ML module. This allows the use of PyILC without any modification of PyILC code, and without unnecessary duplication of effort.

First, you'll need to get the code (those comfortable with all this should feel free to skip this paragraph). I don't recommend adding one git-tracked repository inside another. For instance, if you have '/home/a_bunch_of_repos/cmb-ml/', `cd /home/a_bunch_of_repos`. Clone PyILC at this location, using `git clone https://github.com/jcolinhill/pyilc.git` (or some equivalent).

Next, within the CMB-ML repository, open (cmbml/pyilc_redir/__init__.py)[./cmbml/pyilc_redir/__init__.py]. Edit the path to match the location where you've installed PyILC, specifically `input.py` and `wavelets.py`.

It should look like:
```
import sys
sys.path.append('/absolute/path/to/pyilc/pyilc')

from input import ILCInfo
from wavelets import Wavelets, wavelet_ILC, harmonic_ILC
```

Voila! I do not recommend this practice in general as it may cause security vulnerabilities or other terrible things.

# Getting science assets

<!-- We now need to get either:
- All science assets for running simulations
- Just the asset containing the mask used for analysis -->

## All Science Assets

The easiest method is the simplest: run [the get_data/get_assets.py](../get_data/get_assets.py) script. This will download from the ESA's Planck Legacy Archive and from NASA's LAMBDA Archive. Downloads may be slow.

<!-- There is also a CMB-ML data mirror for these files, but links are not currently available. Please contact us through the GitHub repository and they will be re-enabled. -->

## Assorted Assets

Individual files are available from the source.

Planck's observation maps are used for noise generation. The NILC-cleaned map is used for the mask. The WMAP9 chains are used to simulate realistic CMB. The bandpass tables define instrumentation parameters.

- Planck Maps
    - [Planck Collaboration Observation at 30 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/LFI_SkyMap_030-BPassCorrected_1024_R3.00_full.fits)
    - [Planck Collaboration Observation at 44 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/LFI_SkyMap_044-BPassCorrected_1024_R3.00_full.fits)
    - [Planck Collaboration Observation at 70 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/LFI_SkyMap_070-BPassCorrected_1024_R3.00_full.fits)
    - [Planck Collaboration Observation at 100 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_100_2048_R3.01_full.fits)
    - [Planck Collaboration Observation at 143 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_143_2048_R3.01_full.fits)
    - [Planck Collaboration Observation at 217 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_217_2048_R3.01_full.fits)
    - [Planck Collaboration Observation at 353 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_353-psb_2048_R3.01_full.fits)
    - [Planck Collaboration Observation at 545 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_545_2048_R3.01_full.fits)
    - [Planck Collaboration Observation at 847 GHz](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/HFI_SkyMap_857_2048_R3.01_full.fits)
    - [Planck Collaboration NILC-cleaned Map](https://irsa.ipac.caltech.edu/data/Planck/release_3/all-sky-maps/maps/component-maps/cmb/COM_CMB_IQU-nilc_2048_R3.00_full.fits)
- Others
    - [WMAP9 Chains, direct download](https://lambda.gsfc.nasa.gov/data/map/dr5/dcp/chains/wmap_lcdm_mnu_wmap9_chains_v5.tar.gz)
    - [Planck delta bandpass table, from Simons Observatory](https://github.com/galsci/mapsims/raw/main/mapsims/data/planck_deltabandpass/planck_deltabandpass.tbl)
    - [Original delta bandpass table, from Simons Observatory](assets/delta_bandpasses/CMB-ML/cmb-ml_deltabandpass.tbl)

Last, move the CMB-ML directory contained in [assets/](#../assets/CMB-ML) to your local_system assets folder (as defined in e.g., [your local_system config](cfg/local_system/generic_lab.yaml)). This contains the modified instrument information and links to download the simulations.

# Next steps

Your system is now set up to use CMB-ML.

Next, we'll look at a couple simulations to better understand the data, in [the next demonstration notebook](./D_first_look_at_sims.ipynb).

When you're ready, either [download simulations](../get_data/get_dataset.py) or [create simulations](../main_sims.py).

There are also optional demonstration notebooks if you intend to write code using CMB-ML, starting with [a description of the CMB-ML framework](./E_CMB_ML_framework.ipynb).