Skip to content

Split run identity from configuration#122

Merged
frazane merged 2 commits intomainfrom
feat/split-run-identity-from-config
Apr 1, 2026
Merged

Split run identity from configuration#122
frazane merged 2 commits intomainfrom
feat/split-run-identity-from-config

Conversation

@frazane
Copy link
Copy Markdown
Contributor

@frazane frazane commented Mar 26, 2026

Summary

Separates environment identity from run configuration to allow inference environments to be reused across configuration changes, eliminating unnecessary rebuilds of venv and squashfs images. Closes #111

Changes

  • Add ENV_FIELDS and HASH_EXCLUDE ClassVars to RunConfig documenting the identity contract
  • Split hashing logic: env_entry_hash() for environment-level changes, run_specific_hash() for configuration changes
  • Refactor register_run() to compute both env_id and run_id with nested directory structure: data/runs/{env_id}/{config_hash}/
  • Update inference rules to use {env_id} wildcard for environment artifacts (in data/runs/{env_id}/) and {run_id} for run outputs
  • Add ENV_CONFIGS global dict and collect_all_envs() function
  • Add comprehensive unit tests for identity separation

Benefits

  • Reuses environments across config changes (no squashfs rebuild)
  • Reduces disk I/O burden on shared filesystems
  • Clear separation of concerns: environment identity vs. run configuration
  • Nested directory structure aligns with the proposed design in issue

Testing

  • All existing tests pass
  • 5 new tests verify identity separation behavior

Separates environment identity (env_id) from run configuration (run_id) to
allow inference environments to be reused across configuration changes. This
prevents unnecessary rebuilding of venv and squashfs images when only the
inference config YAML or steps are modified.

Changes:

src/evalml/config.py:
- Add RunConfig.ENV_FIELDS ClassVar documenting fields that determine the
  inference environment (checkpoint, extra_requirements, disable_local_eccodes_definitions)
- Add RunConfig.HASH_EXCLUDE ClassVar for fields never included in hashing
  (label, inference_resources)
- Export module-level constants RUN_ENV_FIELDS and RUN_HASH_EXCLUDE

workflow/rules/common.smk:
- Add ENV_HASH_FIELDS and RUN_HASH_EXCLUDE constants
- Split hashing logic into two functions:
  - env_entry_hash(): hashes only environment-determining fields
  - run_specific_hash(): hashes run-specific fields (config YAML, steps)
- Refactor register_run() to compute and store both env_id and run_id in
  each run config entry. Format: run_id = {env_id}/{config_hash}
- Add collect_all_envs() function and ENV_CONFIGS global dict
- Update master_hash() to hash both env and run components separately

workflow/rules/inference.smk:
- Rules using {env_id} wildcard (outputs in data/envs/{env_id}/):
  - prepare_checkpoint
  - extract_checkpoint_requirements
  - create_inference_venv
  - make_squashfs_image
- Rules using {run_id} wildcard with nested config directories:
  - prepare_inference_forecaster
  - prepare_inference_interpolator
  - execute_inference (references env via lookup)
  - create_inference_sandbox

Directory structure change:
- Environment artifacts: data/envs/{env_id}/
- Run-specific outputs: data/runs/{env_id}/{config_hash}/{init_time}/

Benefits:
- Reuses environments across config changes (no squashfs rebuild)
- Reduces disk I/O on shared filesystems
- Documents identity contract via ClassVars
- Nested directory structure clearly separates concerns

Tests:
- Add test_run_identity.py with 5 tests validating identity separation
- All existing tests pass

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@frazane frazane requested a review from dnerini March 26, 2026 09:20
@frazane frazane changed the title Split run identity from configuration (issue #111) Split run identity from configuration Mar 26, 2026
@frazane frazane requested a review from Louis-Frey March 31, 2026 08:05
@dnerini
Copy link
Copy Markdown
Member

dnerini commented Mar 31, 2026

This is looking really good! Thanks!

One question I have: do I understand correctly that now, although we distinguish between model envs and actual model runs, we stack both together under data/runs? Wouldn't it perhaps make more sense to use something like data/envs/{env_id} and data/runs/{run_id} instead?

@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented Apr 1, 2026

One question I have: do I understand correctly that now, although we distinguish between model envs and actual model runs, we stack both together under data/runs? Wouldn't it perhaps make more sense to use something like data/envs/{env_id} and data/runs/{run_id} instead?

Used to be exactly this, but I changed it in c689c99. I like that by organizing hierarchically we see immediately which models use which envs. What advantage do you see in separating them?

@dnerini
Copy link
Copy Markdown
Member

dnerini commented Apr 1, 2026

Used to be exactly this, but I changed it in c689c99. I like that by organizing hierarchically we see immediately which models use which envs. What advantage do you see in separating them?

mmmh maybe I don't fully understand it, can you paste here an example of how it'd look like?

@Louis-Frey
Copy link
Copy Markdown
Contributor

Louis-Frey commented Apr 1, 2026

All the configs, with the exception of forecasters-co1e, run without error. forecasters-co1e failing is ok, it failed before and I will still fix it. Good to go from my side!

@frazane
Copy link
Copy Markdown
Contributor Author

frazane commented Apr 1, 2026

@dnerini

./output/data/runs
└── forecaster-b30a-d2c9
    ├── 3180
    │   ├── 202501010000
    │   ├── 202502010600
    │   └── 202503011200
    ├── 8772
    │   ├── 202501010000
    │   ├── 202502010600
    │   └── 202503011200
    ├── anemoi.json
    ├── inference-last.ckpt
    ├── requirements.txt
    └── venv.squashfs

Copy link
Copy Markdown
Contributor

@Louis-Frey Louis-Frey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side, good to go!

@frazane frazane merged commit 847d38d into main Apr 1, 2026
4 checks passed
frazane pushed a commit that referenced this pull request Apr 1, 2026
fix from regression after #122 when running showcase workflow:
``` 
InputFunctionException in rule make_forecast_animation in file "/users/ned/src/evalml/workflow/rules/plot.smk", line 125:
Error:
  KeyError: '88a3'
Wildcards:
  showcase=20260401_forecasters-ich1_75e9/forecaster-233b-098c
  run_id=88a3
  init_time=202406010000
  param=T_2M
  region=globe
Traceback:
  File "/users/ned/src/evalml/workflow/rules/plot.smk", line 130, in <lambda>
  File "/users/ned/src/evalml/workflow/rules/plot.smk", line 118, in get_leadtimes (rule make_forecast_animation, line 207, /users/ned/src/evalml/workflow/rules/plot.smk)
 ```
run_id now contains "/" (format: "{env_id}/{r_hash}"), Snakemake wildcards would greedily absorb part of run_id into {showcase} when matching paths of the form results/{showcase}/{run_id}/... constrain showcase to a single path component.
dnerini added a commit that referenced this pull request Apr 15, 2026
@frazane frazane deleted the feat/split-run-identity-from-config branch April 16, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New inference environment is computed whenever inference config is updated

3 participants