Skip to content

refactor(core): convert no-priors operator to a RandomWalk sampler#877

Merged
michael-johnston merged 26 commits into
mainfrom
dl_738_migrate_no_priors_to_sampler
Apr 27, 2026
Merged

refactor(core): convert no-priors operator to a RandomWalk sampler#877
michael-johnston merged 26 commits into
mainfrom
dl_738_migrate_no_priors_to_sampler

Conversation

@danielelotito
Copy link
Copy Markdown
Collaborator

As described in #738 .

… walk

Operation Creation: ❌ FAILED with recursion error

Command: uv run ado create operation -f examples/trim/example_yamls/op_pressure.yaml --use-latest space
Exit code: 133
The operation started and displayed the discovery space details correctly
Ray cluster initialized successfully
Failure Details
Immediate Failure Symptom: RecursionError: maximum recursion depth exceeded

Precise Location: orchestrator/modules/operators/_orchestrate_core.py:43 in log_space_details()

Call Stack:

File "orchestrator/modules/operators/_orchestrate_core.py", line 117, in _run_operation_harness
    operation_output: OperationOutput | None = run_closure()
File "orchestrator/modules/operators/_general_orchestration.py", line 32, in _run_general_operation_core
    return operation_function(
File "orchestrator/modules/operators/collections.py", line 153, in wrapper
    return orchestrate_general_operation(
File "orchestrator/modules/operators/_general_orchestration.py", line 101, in orchestrate_general_operation
    log_space_details(discovery_space)
File "orchestrator/modules/operators/_orchestrate_core.py", line 43, in log_space_details
    console.print(discovery_space)

Root Cause: The recursion occurs in the Rich library's rendering chain when attempting to print the discovery_space object. The stack trace shows infinite recursion through:

rich/console.py → rich/panel.py → rich/padding.py → rich/pretty.py
Specifically in pretty.py:489 where repr_str = "".join(str(line) for line in lines) creates a circular reference
Additional Observations:

The operation created multiple nested sub-operations (visible in the deeply nested error message showing operation identifiers like operation-trim-1.7.1.dev72+gb804c1e11.d20260420-1e3afcbc, operation-trim-1.7.1.dev72+gb804c1e11.d20260420-9a0c5225, etc.)
Each sub-operation encountered the same recursion error when trying to log space details
The error cascaded through multiple operation levels before the final SIGTRAP signal
Conclusion
The previous recursion failure still reproduces exactly. The issue is not intermittent—it consistently occurs at the same location (log_space_details()) when the TRIM operator attempts to print the discovery space object using Rich's console rendering.
@danielelotito danielelotito added the ci Enables CI integration label Apr 21, 2026
@danielelotito
Copy link
Copy Markdown
Collaborator Author

Atm I am encountering this bug, also in the main branch

[Bug] RecursionError in Rich rendering when running TRIM examples on main branch


Summary

The recursion issue exists on the main branch with both Python 3.10.17 and Python 3.12.7. The TRIM example fails with a RecursionError when running the operation creation command. This is a pre-existing bug on main, not a regression, related to how the DiscoverySpace object is rendered via the Rich library.


Test Environment

Branch: main (commit: d1e3664)
Python versions tested: 3.10.17, 3.12.7
Context: local SQLite metastore


Steps to Reproduce

git checkout main
uv sync --reinstall --group test --group dev
uv pip install -e plugins/operators/trim
uv pip install -e examples/trim/custom_experiments
uv run ado context local
uv run ado create space -f examples/trim/example_yamls/space_pressure.yaml --new-sample-store
uv run ado create operation -f examples/trim/example_yamls/op_pressure.yaml --use-latest space


Error Details

Location: orchestrator/modules/operators/_orchestrate_core.py:43 in log_space_details()
Root cause: console.print(discovery_space) triggers infinite recursion in Rich library's rendering code.
Observable pattern: Discovery Space details are printed repeatedly in a recursive loop before the process is killed.


Investigation Points

  1. DiscoverySpace circular references: The DiscoverySpace pydantic model likely contains circular references that cause Rich's pretty-printer to recurse infinitely.
  2. Rich rendering implementation: Check if DiscoverySpace or related classes have custom rich_console methods that create rendering cycles.
  3. TRIM-specific trigger: The issue manifests specifically with TRIM operations, suggesting the problem may be in how TRIM's parameters (e.g., nested NoPriorsParameters) are structured, but when I have done PR related to TRIM the issue was not there

Recommended Fix

Modify log_space_details() in orchestrator/modules/operators/_orchestrate_core.py to use a different serialization approach (e.g., YAML dump) instead of direct Rich console printing, or implement a custom rich_console method that breaks the circular reference chain.

@AlessandroPomponio
Copy link
Copy Markdown
Member

@danielelotito it doesn't look like it's a Rich issue, Trim is recursively launching operations:

=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-216e5aea ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-b6e4309e ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-7e0c84c3 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-d2fb8218 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-f3dbf74d ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-49b0564a ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-461dfb03 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-38efb1cd ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-40cf3af9 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-35fc2352 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-8ccbfdbb ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-fe4e49e9 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-7571d7c7 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-faa5940f ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-71fcb3f3 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-a65297d0 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-d04face8 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-88c33004 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-002cb14e ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-c1e545ba ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-b730426f ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-3f7130c7 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-791c1263 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-2b368a78 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-249b8b47 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-57eab0a8 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-2d632777 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-d2c83ff2 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-0000490c ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-734df52e ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-edbffe35 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-be444151 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-2c527acb ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-5da40209 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-f92452af ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-6f9ab8d0 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-0e4ee81d ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-25d873b6 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-e499e4be ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-24f04a07 ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-78337a2c ===========
=========== Starting Operation operation-trim-1.7.1.dev72+g8d7ce1158-cb99fa6f ===========

Please open an issue about it

@michael-johnston
Copy link
Copy Markdown
Member

Something is recursively calling trim.operator.trim() or orchestrator.operators.collections.explore.trim()

@michael-johnston
Copy link
Copy Markdown
Member

I have found the error. Will open a fix in another branch.

@michael-johnston
Copy link
Copy Markdown
Member

Fix is in #886

@AlessandroPomponio
Copy link
Copy Markdown
Member

@michael-johnston should we put no priors under orchestrator/core/discoveryspace/? @danielelotito correctly pointed out that we have samplers there, but it does feel a bit weird. This is compounded by the fact that the no priors utilities, sampler, etc, are fairly chunky files

- Resolved modify/delete conflicts: kept no-priors-characterization plugin deleted
- Resolved content conflict in trim/operator.py: accepted incoming changes
- Includes commit 77e5b90 and all changes from main
@danielelotito
Copy link
Copy Markdown
Collaborator Author

Also, Sobol sampler needs scipy as a dependency, where do we put it?

@AlessandroPomponio
Copy link
Copy Markdown
Member

Also, Sobol sampler needs scipy as a dependency, where do we put it?

in the main pyproject. uv add scipy

@danielelotito danielelotito linked an issue Apr 22, 2026 that may be closed by this pull request
@michael-johnston
Copy link
Copy Markdown
Member

If it's a lot to add, and also brings additional heavy dependencies to ado-core, we can keep it in Trim operator

Then the trim docs say it provides a no-prior custom sampler for randomwalk and describes how to use it and the example contains an example of the YAML

I'm thinking the ultimate solution here may be to extract randomwalk to its own operator plugin and then advanced capabilities for it can live in that plugin.

@danielelotito
Copy link
Copy Markdown
Collaborator Author

If it's a lot to add, and also brings additional heavy dependencies to ado-core, we can keep it in Trim operator

just scipy.

The branch now is working as expected, you can have a look @michael-johnston .

I was also thinking about having a plugin that makes these sampling strategies available, even without bringing new operators

@danielelotito danielelotito marked this pull request as ready for review April 22, 2026 12:24
@AlessandroPomponio AlessandroPomponio changed the title refactor: no-priors operator as a RandomWalk sampler refactor(core): convert no-priors operator to a RandomWalk sampler Apr 22, 2026
- Move no_priors_parameters.py, no_priors_sampler.py, no_priors_utils.py from orchestrator/core/discoveryspace/ to plugins/operators/trim/src/trim/samplers/
- Update all imports in trim plugin to reference new location
- Update module name in operator.py from orchestrator.core.discoveryspace.no_priors_sampler to trim.samplers.no_priors_sampler
- Delete old files from orchestrator/core/discoveryspace/
- Delete corresponding test file from tests/core/discoveryspace/

This change encapsulates no_priors functionality within the trim plugin where it belongs.
Update imports in trim operator source files to reference new location:
- operator.py: update module name and import
- trim_pydantic.py: update NoPriorsParameters import
- trim_sampler.py: update no_priors_utils imports
- utils/order.py: update get_sampling_indices_multi_dimensional import
Update test imports to reference new module location:
- test_high_dimensional_sampling.py: update concatenated_latin_hypercube_sampling import
- test_sampling.py: update get_index_list_van_der_corput import
danielelotito and others added 3 commits April 23, 2026 11:32
- Remove old no_priors files from orchestrator/core/discoveryspace/
- Documentation in random-walk.md already references correct new location
- Fix markdown line length issues
Signed-off-by: Daniele Lotito <99284466+danielelotito@users.noreply.github.com>
@michael-johnston
Copy link
Copy Markdown
Member

@danielelotito FYI The merge with main to fix recursion bug has left a problem in the tests.

tests/operators/test_trim_example_integration.py:16: in <module>
    from orchestrator.core.discoveryspace.no_priors_parameters import NoPriorsParameters
E   ModuleNotFoundError: No module named 'orchestrator.core.discoveryspace.no_priors_parameters'

@danielelotito
Copy link
Copy Markdown
Collaborator Author

danielelotito commented Apr 24, 2026

@michael-johnston , I pushed a test fix bcs I forgott to change (refactor) an import in trim tests. So the error you shared disappeared.

The merge with main to fix recursion bug has left a problem in the tests.

Do you refer to the following, I do

uv sync --reinstall --group test --group dev
pytest tests/operators     

or

 export TOX_ENV=py310-locked-macos
tox --colored yes --stderr-color RESET -r -e "$TOX_ENV" -vvv)

returns something similar to

=========================== short test summary info ============================
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_direct_init[mysql]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_direct_init[sqlite]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_conf_init[mysql]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_conf_init[sqlite]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[all-mysql]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[all-sqlite]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[value-mysql]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[value-sqlite]
ERROR tests/operators/test_operators.py::test_random_walk_fail_invalid_config[mysql]
ERROR tests/operators/test_operators.py::test_random_walk_fail_invalid_config[sqlite]
ERROR tests/operators/test_operators.py::test_run_ray_tune_operation[mysql]
ERROR tests/operators/test_operators.py::test_run_ray_tune_operation[sqlite]
ERROR tests/operators/test_trim_example_integration.py::test_trim_example_operation_succeeds[mysql]
ERROR tests/operators/test_trim_example_integration.py::test_trim_example_operation_succeeds[sqlite]
================== 71 passed, 1 warning, 14 errors in 15.43s ===================

note
I see that the automatic checks on this pr pass

@michael-johnston
Copy link
Copy Markdown
Member

On the tests can you see the ERROR reason?

My guess is that the container runtime is not configured - see tests/README.md , Checking the container runtime section

@danielelotito
Copy link
Copy Markdown
Collaborator Author

My guess is that the container runtime is not configured - see tests/README.md , Checking the container runtime section

You are right!
Testcontainers requires a container runtime that is compatible with the Docker API. Which I do not have, however tests pass in the CI

Reorder sections and header levels.
@DRL-NextGen
Copy link
Copy Markdown
Member

DRL-NextGen commented Apr 25, 2026

No vulnerabilities found.

michael-johnston and others added 2 commits April 25, 2026 22:21
Added ternary check: operationInfo.actuatorConfigurationIdentifiers if operationInfo else []
Mirrors the guard already present in the no-priors block (lines 122-126)
@michael-johnston michael-johnston added this pull request to the merge queue Apr 27, 2026
Merged via the queue into main with commit 6f6703a Apr 27, 2026
19 checks passed
@michael-johnston michael-johnston deleted the dl_738_migrate_no_priors_to_sampler branch April 27, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Enables CI integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: no-priors operator as a RandomWalk sampler

4 participants