Skip to content

restructure codebase#117

Merged
clemsgrs merged 11 commits intomainfrom
codex/package-cleanup-runtime-split
Apr 18, 2026
Merged

restructure codebase#117
clemsgrs merged 11 commits intomainfrom
codex/package-cleanup-runtime-split

Conversation

@clemsgrs
Copy link
Copy Markdown
Owner

@clemsgrs clemsgrs commented Apr 18, 2026

Summary

This PR continues the package-cleanup effort by splitting slide2vec.inference into workflow-scoped runtime modules while preserving behavior and current public API usage.

Major steps shipped

  1. Refactor inference into runtime workflow modules
  • Added slide2vec.runtime package with:
    • batching.py
    • hierarchical.py
    • progress_bridge.py
    • serialization.py
    • types.py
  • Rewired slide2vec.inference to consume these helpers.
  1. Extract distributed inference helpers into runtime module
  • Added slide2vec.runtime.distributed for:
    • torchrun orchestration/log capture/progress streaming
    • distributed coordination-dir lifecycle
    • rank assignment
    • shard merge/load helpers
  • Kept compatibility shims in slide2vec.inference for existing monkeypatch/test patterns.
  1. Extract pipeline artifact persistence from inference
  • Added slide2vec.runtime.persistence for:
    • artifact discovery/loading
    • process-list status updates after embedding
  • Reduced orchestration clutter in slide2vec.inference.
  1. Extract tiling/embedding helper domains + architecture boundaries
  • Added slide2vec.runtime.tiling for pure tiling config/backend/archive/coordinate helpers.
  • Added slide2vec.runtime.embedding for metadata + artifact writer helpers.
  • Added runtime import-boundary guardrail to prevent runtime modules from depending on CLI/package facade.
  1. Declutter root package module surface
  • Moved internal root modules to clearer homes:
    • slide2vec/runtime_types.pyslide2vec/runtime/types.py (LoadedModel)
    • slide2vec/model_settings.pyslide2vec/runtime/model_settings.py
    • slide2vec/registry.pyslide2vec/runtime/registry.py
    • slide2vec/resources.pyslide2vec/configs/resources.py
  • Updated internal and test imports accordingly.
  • Added a guardrail test that enforces a curated, minimal set of root-level package modules.
  1. Trim secondary tests to keep suite focused
  • Removed low-signal/secondary tests:
    • tests/test_docs.py
    • tests/test_packaging_metadata.py
    • tests/test_batch_collator_timing.py
    • tests/test_output_consistency.py
  • Kept core regression, model, progress, registry, tile-store, and architecture suites.

Guardrails + docs

  • Added/extended architecture guardrail test: tests/test_architecture_runtime_split.py
  • Updated docs log in docs/documentation.md.

Outcomes

  • slide2vec.inference.py reduced from ~3753 lines to ~981 lines.
  • Root package Python files are now curated to a smaller, intentional set.
  • Runtime concerns are split into focused modules by domain.
  • Test suite is sharper and more focused on core behaviors.

@clemsgrs clemsgrs marked this pull request as ready for review April 18, 2026 19:50
@clemsgrs clemsgrs changed the title Refactor inference into workflow-scoped runtime modules restructure codebase Apr 18, 2026
@clemsgrs clemsgrs merged commit 32707b2 into main Apr 18, 2026
2 of 3 checks passed
@clemsgrs clemsgrs deleted the codex/package-cleanup-runtime-split branch April 18, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant