state_collapser is a Python package for inducing hierarchical reinforcement-learning structure on RL problems that do not arrive with an obvious subtask decomposition or a natural human-readable hierarchy.
WHAT: Not assuming that any hierarchy be visible in the problem specification, the package constructs quotient-style tower structure by recursively contracting discovered state/action graph structure. The result is a research-oriented runtime layer for experimenting with coarse-to-fine control, quotient tiers, and abstract intermediate states that need not be directly executable as long as they admit lift back to executable behavior.
WHY: The expected payoff is a reduction in effective search and training burden. Rather than repeatedly learning over the full fine-scale path space, the agent can learn first over more collapsed quotient structure, then return to finer tiers only for the additional correction that the coarser tiers could not supply. In the best case, this replaces broad flat exploration with staged coarse-to-fine control over a much smaller effective path space.
WHERE:
For an engineer used to existing RL tooling, state_collapser sits near
frameworks such as RLlib and Stable-Baselines3, but it is not trying to be
either. RLlib says "Give me an env; I will run scalable RL algorithms on it."
Stable-Baselines3 says "Give me an env; I will run standard reliable RL algorithms on it." state_collapser says:
"Give me an env or discovered transition system; I will construct a better hierarchical/quotient decision structure around it."
state_collapser is a structural layer that can sit before or
beside a learner:
Gymnasium env
-> state_collapser discovers graph/tower/quotient structure
-> policy learner trains using tower-aware decision inputs
The same quotient/tower layer can also support non-RL graph dataflow. The first
downstream application is
HGraphML, which treats a known
graph as already discovered, builds a state_collapser partition tower around
it, runs message passing on a coarse tier, and lifts messages back over node and
edge fibers.
HOW:
If the underlying mathematical model and its log speed-up theorems will best justify this state_collapser package for you, I'd start with this companion research paper that explains why such a package should exist in the first place. If benchmarks demonstrating the speed-up in coordination-constrained RL problems will best justify this state_collapser package for you, I'd start with this evaluation document.
Python 3.11 or 3.12 is required for the current release line.
The current public-release target is a lightweight GitHub research release. PyPI publication is intentionally deferred until the serious benchmark track is complete, so install from source for now.
Install from a local checkout:
pip install -e .Install from a public GitHub tag once the repository is public:
pip install "state-collapser @ git+https://github.com/TYLERSFOSTER/state_collapser.git@v0.6.0"Install a local checkout with development tooling:
pip install -e ".[dev]"Install a local checkout with the current RL and ML extras:
pip install -e ".[dev,rl,ml]"The top-level package surface is intentionally small right now:
import state_collapser
print(state_collapser.__version__)Most current entry points live in explicit subpackages.
from state_collapser.examples.plate_support_env import (
PlateSupportEnv,
TowerTrainingConfig,
run_tower_training,
)
result = run_tower_training(
env=PlateSupportEnv(),
config=TowerTrainingConfig(
episodes=20,
max_steps_per_episode=50,
alpha=0.5,
gamma=0.95,
epsilon=0.2,
seed=0,
),
)
print("episodes:", len(result.episodes))
print("successes:", sum(1 for ep in result.episodes if ep.success))from state_collapser.examples.rl_counterpoint_v3 import (
RlCounterpointEnv,
TowerTrainingConfig,
run_tower_training,
)
result = run_tower_training(
env=RlCounterpointEnv(),
config=TowerTrainingConfig(
episodes=5,
max_steps_per_episode=16,
alpha=0.5,
gamma=0.95,
epsilon=0.2,
seed=0,
),
)
print("episodes:", len(result.episodes))
print("q_table_states:", len(result.q_table))
print("successes:", sum(1 for ep in result.episodes if ep.success))from state_collapser.examples.plate_support_env import (
ExploitExploreTrainingConfig,
PlateSupportEnv,
run_exploit_explore_training,
)
result = run_exploit_explore_training(
env=PlateSupportEnv(),
config=ExploitExploreTrainingConfig(
episodes=10,
max_control_steps_per_episode=20,
alpha=0.5,
gamma=0.95,
seed=0,
),
)
print("episodes:", len(result.episodes))
print("successes:", sum(1 for ep in result.episodes if ep.success)).venv/bin/python -m state_collapser.examples.tower_depth_probe plate_support_env rl_counterpoint_v3 --schema-mode defaultTo compare against an explicit flat partition baseline:
.venv/bin/python -m state_collapser.examples.tower_depth_probe plate_support_env rl_counterpoint_v3 --schema-mode noneThis utility is useful when you want to inspect:
- how deep the dynamically constructed tower gets
- whether deeper tiers are materializing in a given example
- whether schema-driven partition contraction is scheduling discovered edges
- whether a change in contraction/runtime behavior materially changes tower growth
In the current partition-backed runtime, ContractionSchema is the tower
contraction schedule. ContractionPolicy remains available for legacy,
local-star, and vista-facing compatibility, but it is not the source of
partition-tower coarsening. Example environments that exercise hierarchy provide
default smoke schemas; pass NoContractionSchema() in Python, or
--schema-mode none in the probe, when you want an explicit flat baseline.
- Hidden, explored, and vista graph layers for RL state/action structure.
- Persistent nested state/action partition towers with quotient-tier compatibility readouts.
- Full-graph partition-tower construction usable by downstream graph-dataflow packages such as HGraphML.
- Tower runtime snapshots and tower-aware training support.
- An initial internal
state_collapser.trainingpackage with reusable training-facing surfaces. - A first exploit/explore active-tier controller.
- Example environments and runnable example training paths.
- Strong design-document support for the mathematical and architectural model.
Many HRL methods work best when an RL problem already comes with meaningful subtasks, a clean parameter reduction, or an obvious hierarchical task decomposition. But many important RL problems, such as constrained robotics and coordinated control problems, do not present themselves in that way.
In such settings, the reachable state/action structure may live on a constrained subset of a larger ambient space, and the natural coarse structure may be hidden rather than explicit. state_collapser is aimed at this harder case. It is capable of inducing a hierarchical learning structure on problems with no canonical such structure.
The package is built around two core ideas:
- Hierarchy can be induced by recursive contractions of discovered state/action graph structure.
- Intermediate HRL states can be pure abstractions, provided they admit a lift back to executable behavior.
The package currently contains real code for:
state_collapser.core- states, actions, edges, rewards, labels, and annotations
state_collapser.graph- hidden graphs, explored graphs, vista graphs, and local-star structure
state_collapser.contract- contraction-policy and selection surfaces
state_collapser.quotient- projections, cosets, and tier views
state_collapser.tower- schema-driven partition-backed tower runtime,
LiveRuntimeView, serializableRuntimeSnapshot, lazy compatibility readouts, trustworthiness, and exploit/explore control
- schema-driven partition-backed tower runtime,
state_collapser.training- internal reusable decision inputs, action masks, continuation-aware transitions, collectors, learners, metrics, reference loops, and fiber-conditioned training surfaces
state_collapser.adaptersStateCollapserGymWrapperhook surfaces and legacy/toy adapter examples
state_collapser.benchmarks- lightweight runtime benchmark smoke tooling for hot-path/readout comparisons
state_collapser.examples- reference example environments and runtime integrations
The most developed example packages right now are:
state_collapser.examples.plate_support_envstate_collapser.examples.rl_counterpoint_v3
The example suite also now includes:
state_collapser.examples.articulated_loop_envstate_collapser.examples.cable_parallel_envstate_collapser.examples.dual_arm_manipulation_envstate_collapser.examples.parallelogram_singularity_env
The current evaluation examples expose default contraction schemas for the
partition-backed runtime, plus explicit flat-baseline behavior through
NoContractionSchema.
plate_support_env still contains:
- the environment
- a runtime adapter
- the older tower-aware training path
- the newer exploit/explore training path
rl_counterpoint_v3 now serves as the first real migration target for the new training-surface package:
- the environment
- a runtime adapter
- a tower-aware training path built on the new reusable training components
This project is pre-alpha.
What is solid enough to rely on:
- the package layout
- CI, linting, typing, and test workflow
- the first vertical slice of graph / quotient / tower runtime machinery
- the example environment integrations
- the existence of both old and new training paths for
PlateSupportEnv - the first internal
state_collapser.trainingcomponent layer - the first
FrozenQuotientBehavior -> PathFiber -> FiberConditionedStagebridge - the migrated
rl_counterpoint_v3training path as a first training-surface reality check
What should still be treated as unstable:
- the broad public API
- the public shape of
state_collapser.training - exploit/explore control tuning and behavior
- long-term naming of some modules and surfaces
- future instrumentation and benchmark interfaces
Current top-level source layout:
src/state_collapser/
core/
graph/
contract/
quotient/
tower/
training/
adapters/
benchmarks/
examples/
instrumentation/
The instrumentation area is intended to support future work on:
- path-space metrics
- tower-growth metrics
- training-run visualization
Where to go next:
- New to the package: start with
docs/usage/01_001_what_state_collapser_is.md. - Trying to understand the tower runtime: read
docs/usage/01_002_tower_runtime_mental_model.md. - Trying to train with your own learner: read
docs/usage/01_003_training_surface_quickstart.mdanddocs/usage/01_004_fiber_conditioned_training.md. - Looking for exact implemented surfaces: use
docs/api_notes. - Looking for downstream applications: read
docs/usage/01_009_downstream_applications.md. - Planning to contribute: read
CONTRIBUTING.md. - Looking for vulnerability/reporting expectations: read
SECURITY.md.
General package docs:
Mathematical and design docs:
docs/design/mathematical_model.pdfdocs/design/reward_locality_for_quotient_training.mddocs/design/module_design_desiderata.mddocs/design/package_best_practices_proposal.mddocs/design/model_train_surfaces/01_001_model_and_training_surface_architecture.mddocs/design/model_train_surfaces/01_002_model_and_training_surface_blueprint.mddocs/design/RL_framework_maturity/01_001_rl_framework_maturity_and_tower_training_spine_discussion.mddocs/design/RL_framework_maturity/01_002_fiber_conditioned_training_spine_blueprint.mddocs/design/Young_tableaux_refactor/01_001_young_tableaux_runtime_refactor_blueprint.md
Major implementation docs:
docs/design/final_initial/final_initial_blueprint.mddocs/design/final_initial/final_initial_implementation_gameplan.mddocs/design/HRL_exploit-explore/01_013_exploit_explore_algorithm_blueprint.mddocs/design/HRL_exploit-explore/01_014_exploit_explore_algorithm_implementation_gameplan.mddocs/design/model_train_surfaces/01_003_model_and_training_surface_implementation_gameplan.mddocs/design/Young_tableaux_refactor/01_002_young_tableaux_runtime_refactor_implementation_gameplan.mddocs/design/Young_tableaux_refactor/01_003_young_tableaux_runtime_refactor_implementation_log.mddocs/design/test_design/post_young_audit/01_003_post_young_diagram_evaluation_environment_repair_implementation_gameplan.mddocs/design/RL_framework_maturity/01_006_fiber_conditioned_training_spine_paired_implementation_log.md
Continuity / project history:
docs/engineer_continuity/2026/05/15/01_007_exploit_explore_design_implementation_and_merge.mddocs/engineer_continuity/2026/05/20/01_009_evaluation_family_counterpoint_and_training_surface_consolidation.mddocs/engineer_continuity/2026/05/23/01_010_package_readiness_and_loghrl_research_document_consolidation.mddocs/engineer_continuity/2026/05/24/01_011_young_tableaux_runtime_review_release_and_synthetic_blow_revisions.md
Common local checks:
.venv/bin/python -m pytest tests
.venv/bin/python -m ruff check .
.venv/bin/python -m mypy srcThe project expects:
- typed Python
- tests for new runtime behavior
- alignment with the authoritative design documents
- care with the package’s mathematical vocabulary
Evaluation and benchmarking guidance: EVALUATION.md
This project is released under the MIT License.