Phase 1: provider abstraction + scalable.yaml manifest foundation#20
Merged
Conversation
Creates the additive Phase 1 package structure off of version/2.0.0: manifest/, providers/, session/, planning/, cli/. Each new package ships with a docstring describing its Phase 1 role and its hooks for later phases (telemetry, AI assistants, Kubernetes/cloud providers, ML advisor). scalable/manifest/schema.py defines the frozen v1 schema dataclasses (ManifestModel, ProjectConfig, TargetConfig, ComponentConfig, TaskConfig) and SCHEMA_VERSION = 1. The schema is intentionally implemented with stdlib dataclasses so manifest validation works without the optional [ai] extra (resolves Phase 1 plan section 9 open question #1). scalable/manifest/errors.py declares the ManifestError hierarchy used by the parser, validator, and Phase 4 AI migration assistant. scalable/cli/main.py is a Phase 1 stub for the [project.scripts] entry point; the real validate / plan --dry-run wiring lands in WU-10. pyproject.toml: version bumped to 2.0.0a1, pyyaml pinned explicitly, empty placeholder extras for ai/cloud/kubernetes registered so pip install scalable[ai] resolves cleanly from day one, scalable console script registered, packages.find used so the new sub-packages are picked up by setuptools. Verified: existing 73 unit tests pass unchanged; ruff clean on all new modules. No public API removed or renamed. Refs plans/v2.0.0_phase1_plan.md WU-1.
This was referenced May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1: Provider Abstraction +
scalable.yamlManifest FoundationSummary
This PR delivers Phase 1 of the v2.0.0 roadmap defined in
plans/v2.0.0_phase1_plan.md: a provider-neutral execution seam, a declarative manifest layer, a deterministic dry-run planner, the publicScalableSessionAPI, and thescalable validate/scalable plan --dry-runCLI commands. All legacy v1.1.0 imperative APIs (SlurmCluster,add_container,add_workers,ScalableClient) remain functional. Phase 1 is strictly additive plus one targeted deprecation warning.Scope (delivered work units)
pyproject.tomlbump to2.0.0a1, console script registration, optional extras placeholders${VAR}/${VAR:-default}expansion + version checkValidationReport/ValidationIssueDeploymentProviderprotocol,DeploymentSpec,ScalePlan,ResourceRequest,ClusterHandle, registry with entry-point discoveryLocalProviderover DaskLocalCluster+ unit + integration coverageSlurmProvidertranslation layer over existingSlurmCluster+ mocked unit testsModelConfigdeprecation warning gated by adapter contextScalableSessionlifecycle API (from_yaml,validate,plan,start,close, context manager, reserved Phase 4 kwargs)compute_manifest_lock()scalable validateandscalable plan --dry-runCLI commands + Phase 1 reserved-namespace stubsscalable/__init__.pymanifest.rst,providers.rst,READMEv2 quickstart,CHANGELOG.mdvalidate-example-manifestsjob, macOS matrix, version-branch triggersArchitectural changes
New package layout
No existing modules are removed. Existing v1.1.0 imports continue to work unchanged.
Public API surface (additive)
scalable/__init__.pynow also re-exports:ScalableSessionDeploymentProviderLocalProviderSlurmProviderLegacy exports (
SlurmCluster,JobQueueCluster,ScalableClient,cacheable,SEED,settings, etc.) are preserved.Schema and provider contracts (frozen for Phase 1)
scalable.yamlschemaversion: 1enforced byscalable/manifest/parser.py.scalable/providers/base.py:DeploymentSpec,ResourceRequest,ScalePlan,ClusterHandle, and theDeploymentProviderprotocol.scalable/providers/registry.pysupports runtime registration and entry-point discovery under groupscalable.providersfor future Phase 3 third-party providers.Session API (Phase 1 minimal form)
ScalableSession.plan(...)is purely deterministic in Phase 1. Theobjective=andpolicy=keyword arguments named in the v2 north-star are reserved on the public surface and currently raiseNotImplementedError, locking the API shape for the Phase 4 AI planner without committing behavior.CLI
scalable validate <manifest>— exits 0/non-zero, emits a structured JSON report.scalable plan <manifest> --target <name> --dry-run --output plan.json— writesplan.jsonandmanifest.lockand prints the plan to stdout.run,diagnose,explain,init-component,compose,report) are registered as namespace stubs that exit 2 with a phase-pointer message, so the UX namespace cannot be hijacked by third-party packages.manifest_lockSHA-256 over canonicalized JSON of the post-env-expansion manifest (sorted keys, UTF-8, compact separators). Designed to be stable so Phase 2 telemetry and Phase 4 AI assistants can durably reference manifests.
Deprecations
ModelConfig.__init__emitsDeprecationWarningwhen invoked outside the manifest adapter context. This is the path slated for replacement by the manifest. Behavior is unchanged; the warning only surfaces when the legacy auto-discovery is used directly. Suppression is provided bymodel_config_adapter_context()for adapter-internal callers.No other public API is deprecated in this phase.
Configuration surface
version = "2.0.0a1"inpyproject.toml.[project.scripts] scalable = "scalable.cli.main:main".aicloudkubernetespyyaml >= 6.0to core dependencies.scalable/common.py:SCALABLE_MANIFEST→Settings.manifest_path(default./scalable.yaml)SCALABLE_TARGET→Settings.targetDocumentation
docs/manifest.rst: schema v1 reference, validation/plan commands, env vars, migration note, links to examples.docs/providers.rst: provider contract, built-in providers, registry/discovery hook.docs/index.rst: both pages added under the API section.docs/getting_started.rst: cross-link to manifest/providers.README.md: manifest-first quickstart with v2 session example.CHANGELOG.md:2.0.0a1entry under Keep-a-Changelog conventions.docs/examples/scalable.minimal.yamldocs/examples/scalable.gcam_stitches.yamlCI updates
.github/workflows/tests.yml:pushandpull_requesttriggers now includeversion/**.validate-example-manifestsjob runsscalable validateandscalable plan --dry-runagainst the docs examples to lock the documented manifest grammar against drift.lint(ruff + mypy) job retained; no rule changes.Test coverage
Full unit suite is green locally (
156 passed):ruff check scalable testsis also clean.New test modules added in this phase:
tests/unit/test_manifest_parser.py— env expansion, schema rejections, version checktests/unit/test_manifest_validate.py— cross-field rules + multi-error accumulationtests/unit/test_providers_base_registry.py— registry, lazy builtin lookuptests/unit/test_providers_local.py+tests/integration/test_local_provider_end_to_end.pytests/unit/test_providers_slurm.py— mocked Slurm translationtests/unit/test_manifest_adapter.pytests/unit/test_modelconfig_deprecation.pytests/unit/test_planning_dryrun.py— deterministicmanifest_locktests/unit/test_session.pytests/unit/test_cli_validate.pytests/unit/test_cli_plan.pytests/unit/test_public_api_exports.pyCLI smoke checks performed manually against the new docs examples.
Backward compatibility
SlurmCluster(...)→add_container(...)→add_workers(...)→ScalableClient(cluster)) is unchanged and remains tested.ModelConfigDockerfile auto-discovery now warns but still functions.version = "2.0.0a1"is alpha-tagged so downstreams pinning<2.0.0are unaffected.Phase 1 success criteria checklist (from
plans/v2.0.0_phase1_plan.md)from scalable import ScalableSessionworks;ScalableSession.from_yaml(..., target="local")produces a workingLocalCluster+ScalableClientcapable of taggedsubmit(...).ScalableSession.from_yaml(..., target="slurm")configuresSlurmClusterfrom manifest with no functional regression vs. v1.1.0 imperative path.ModelConfigDockerfile path emitsDeprecationWarning.scalable validateexits 0 on valid manifest, non-zero on invalid, with structured error report.scalable plan --dry-run --target <name>writesplan.json+manifest.lockwithout instantiating a scheduler.LocalProviderhas integration coverage; existing tests remain green.CHANGELOG.mdentry, README + docs updates, migration note forModelConfigusers.Groundwork explicitly enabling later phases
DeploymentProviderprotocol with provider-neutralDeploymentSpec/ScalePlanentry_pointsdiscoverymanifest_lockSHA-256 fingerprinttargets[*].options: dict[str, Any]passthrough + unknown-key warningsScalableSession.plan(objective=, policy=)reserved kwargs raisingNotImplementedErrorrun,diagnose,explain,init-component,compose,report)ai,cloud,kubernetes) declared emptySettings.manifest_path/Settings.target+ env varsRisk and rollback
git revertof the squash/merge commit onversion/2.0.0without affectingdevelop/master.processes=False, n_workers=1to keep CI fast and macOS-stable.How to review
plans/v2.0.0_phase1_plan.mdfor the architectural intent.scalable/providers/base.py— these are frozen for Phase 1.scalable/manifest/schema.pyandscalable/manifest/parser.py.ScalableSessioninscalable/session/session.py— note the auto-target heuristic and reserved kwargs.scalable/cli/main.py,scalable/cli/cmd_validate.py, andscalable/cli/cmd_plan.py.scalable/__init__.pyre-exports.Phase 1 architecture
flowchart LR subgraph User U[scalable.yaml] CLI[scalable CLI] PY[ScalableSession.from_yaml] end subgraph Manifest_Layer P[parser] V[validate] S[schema v1] A[adapter] L[manifest_lock] end subgraph Provider_Layer B[DeploymentProvider Protocol] R[registry] LP[LocalProvider] SP[SlurmProvider] end subgraph Existing_v1_1_0 JC[JobQueueCluster] SC[SlurmCluster] CC[ScalableClient] end U --> P --> V --> S P --> L CLI --> P CLI --> V CLI --> A PY --> P PY --> A A --> B B --> R R --> LP R --> SP LP --> CC SP --> SC --> JCMerge target:
version/2.0.0Source branch:
version/2.0.0-phase1-provider-manifestTracking PR: #20