refactor(core): convert no-priors operator to a RandomWalk sampler#877
Conversation
… is integrated in random walk
… walk
Operation Creation: ❌ FAILED with recursion error
Command: uv run ado create operation -f examples/trim/example_yamls/op_pressure.yaml --use-latest space
Exit code: 133
The operation started and displayed the discovery space details correctly
Ray cluster initialized successfully
Failure Details
Immediate Failure Symptom: RecursionError: maximum recursion depth exceeded
Precise Location: orchestrator/modules/operators/_orchestrate_core.py:43 in log_space_details()
Call Stack:
File "orchestrator/modules/operators/_orchestrate_core.py", line 117, in _run_operation_harness
operation_output: OperationOutput | None = run_closure()
File "orchestrator/modules/operators/_general_orchestration.py", line 32, in _run_general_operation_core
return operation_function(
File "orchestrator/modules/operators/collections.py", line 153, in wrapper
return orchestrate_general_operation(
File "orchestrator/modules/operators/_general_orchestration.py", line 101, in orchestrate_general_operation
log_space_details(discovery_space)
File "orchestrator/modules/operators/_orchestrate_core.py", line 43, in log_space_details
console.print(discovery_space)
Root Cause: The recursion occurs in the Rich library's rendering chain when attempting to print the discovery_space object. The stack trace shows infinite recursion through:
rich/console.py → rich/panel.py → rich/padding.py → rich/pretty.py
Specifically in pretty.py:489 where repr_str = "".join(str(line) for line in lines) creates a circular reference
Additional Observations:
The operation created multiple nested sub-operations (visible in the deeply nested error message showing operation identifiers like operation-trim-1.7.1.dev72+gb804c1e11.d20260420-1e3afcbc, operation-trim-1.7.1.dev72+gb804c1e11.d20260420-9a0c5225, etc.)
Each sub-operation encountered the same recursion error when trying to log space details
The error cascaded through multiple operation levels before the final SIGTRAP signal
Conclusion
The previous recursion failure still reproduces exactly. The issue is not intermittent—it consistently occurs at the same location (log_space_details()) when the TRIM operator attempts to print the discovery space object using Rich's console rendering.
|
Atm I am encountering this bug, also in the main branch [Bug] RecursionError in Rich rendering when running TRIM examples on main branch Summary The recursion issue exists on the main branch with both Python 3.10.17 and Python 3.12.7. The TRIM example fails with a RecursionError when running the operation creation command. This is a pre-existing bug on main, not a regression, related to how the DiscoverySpace object is rendered via the Rich library. Test Environment Branch: main (commit: d1e3664) Steps to Reproduce git checkout main Error Details Location: orchestrator/modules/operators/_orchestrate_core.py:43 in log_space_details() Investigation Points
Recommended Fix Modify log_space_details() in orchestrator/modules/operators/_orchestrate_core.py to use a different serialization approach (e.g., YAML dump) instead of direct Rich console printing, or implement a custom rich_console method that breaks the circular reference chain. |
|
@danielelotito it doesn't look like it's a Rich issue, Trim is recursively launching operations: Please open an issue about it |
|
Something is recursively calling trim.operator.trim() or orchestrator.operators.collections.explore.trim() |
|
I have found the error. Will open a fix in another branch. |
|
Fix is in #886 |
|
@michael-johnston should we put no priors under |
- Resolved modify/delete conflicts: kept no-priors-characterization plugin deleted - Resolved content conflict in trim/operator.py: accepted incoming changes - Includes commit 77e5b90 and all changes from main
|
Also, Sobol sampler needs scipy as a dependency, where do we put it? |
in the main pyproject. |
|
If it's a lot to add, and also brings additional heavy dependencies to ado-core, we can keep it in Trim operator Then the trim docs say it provides a no-prior custom sampler for randomwalk and describes how to use it and the example contains an example of the YAML I'm thinking the ultimate solution here may be to extract randomwalk to its own operator plugin and then advanced capabilities for it can live in that plugin. |
just scipy. The branch now is working as expected, you can have a look @michael-johnston . I was also thinking about having a plugin that makes these sampling strategies available, even without bringing new operators |
- Move no_priors_parameters.py, no_priors_sampler.py, no_priors_utils.py from orchestrator/core/discoveryspace/ to plugins/operators/trim/src/trim/samplers/ - Update all imports in trim plugin to reference new location - Update module name in operator.py from orchestrator.core.discoveryspace.no_priors_sampler to trim.samplers.no_priors_sampler - Delete old files from orchestrator/core/discoveryspace/ - Delete corresponding test file from tests/core/discoveryspace/ This change encapsulates no_priors functionality within the trim plugin where it belongs.
Update imports in trim operator source files to reference new location: - operator.py: update module name and import - trim_pydantic.py: update NoPriorsParameters import - trim_sampler.py: update no_priors_utils imports - utils/order.py: update get_sampling_indices_multi_dimensional import
Update test imports to reference new module location: - test_high_dimensional_sampling.py: update concatenated_latin_hypercube_sampling import - test_sampling.py: update get_index_list_van_der_corput import
- Remove old no_priors files from orchestrator/core/discoveryspace/ - Documentation in random-walk.md already references correct new location - Fix markdown line length issues
Signed-off-by: Daniele Lotito <99284466+danielelotito@users.noreply.github.com>
|
@danielelotito FYI The merge with main to fix recursion bug has left a problem in the tests. |
|
@michael-johnston , I pushed a test fix bcs I forgott to change (refactor) an import in trim tests. So the error you shared disappeared.
Do you refer to the following, I do uv sync --reinstall --group test --group dev
pytest tests/operators or export TOX_ENV=py310-locked-macos
tox --colored yes --stderr-color RESET -r -e "$TOX_ENV" -vvv)returns something similar to =========================== short test summary info ============================
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_direct_init[mysql]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_direct_init[sqlite]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_conf_init[mysql]
ERROR tests/operators/test_discovery_space_manager.py::test_internal_state_conf_init[sqlite]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[all-mysql]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[all-sqlite]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[value-mysql]
ERROR tests/operators/test_operators.py::test_run_random_walk_operation[value-sqlite]
ERROR tests/operators/test_operators.py::test_random_walk_fail_invalid_config[mysql]
ERROR tests/operators/test_operators.py::test_random_walk_fail_invalid_config[sqlite]
ERROR tests/operators/test_operators.py::test_run_ray_tune_operation[mysql]
ERROR tests/operators/test_operators.py::test_run_ray_tune_operation[sqlite]
ERROR tests/operators/test_trim_example_integration.py::test_trim_example_operation_succeeds[mysql]
ERROR tests/operators/test_trim_example_integration.py::test_trim_example_operation_succeeds[sqlite]
================== 71 passed, 1 warning, 14 errors in 15.43s ===================
note |
|
On the tests can you see the ERROR reason? My guess is that the container runtime is not configured - see tests/README.md , Checking the container runtime section |
You are right! |
Reorder sections and header levels.
|
No vulnerabilities found. |
Added ternary check: operationInfo.actuatorConfigurationIdentifiers if operationInfo else [] Mirrors the guard already present in the no-priors block (lines 122-126)
As described in #738 .