Skip to content

Version/2.0.0#26

Merged
crvernon merged 44 commits into
developfrom
version/2.0.0
May 20, 2026
Merged

Version/2.0.0#26
crvernon merged 44 commits into
developfrom
version/2.0.0

Conversation

@crvernon
Copy link
Copy Markdown
Member

This pull request introduces significant new features and improvements across the codebase, focusing on advanced ML optimization, model emulation, expanded AI assistant capabilities, and enhanced CI coverage. It also updates documentation, environment configuration, and the test matrix to support new functionality and platforms.

Major feature additions and improvements:

Machine Learning Optimization and Emulation:

  • Introduces a comprehensive ML optimization subsystem (scalable.ml) with learned resource prediction, adaptive scaling, feature extraction, model wrappers, distributed hyperparameter search, and model quality assessment. Adds a model emulation subsystem (scalable.emulation) for surrogate modeling, uncertainty-aware dispatch, and active learning strategies. Public APIs and CLI commands are provided for ML-backed recommendations and emulation. ([CHANGELOG.mdR8-R285])

AI Assistant and Cloud/Kubernetes Support:

  • Expands the AI assistant subsystem with pluggable LLM backends, onboarding, diagnosis, plan explanation, workflow composition, and migration assistants. Adds support for Kubernetes, AWS, and GCP providers, cloud cost estimation, artifact storage, and manifest overlays. CLI and API are extended for these new capabilities. ([CHANGELOG.mdR8-R285])

Testing and CI Enhancements:

  • Updates the test matrix to include macOS and Python 3.13, and adds a dedicated job for validating and planning example manifests in CI. [1]], [2]], [3]])

Documentation and Configuration:

  • Updates the CHANGELOG.md with detailed release notes for all new features, breaking changes, and tests. Updates the README to reflect new capabilities, optional dependency groups, and system requirements. Adds an .env.example for OpenAI configuration. [1]], [2]], [3]], [4]], [5]])

Notable grouped changes:

1. ML Optimization & Emulation

  • Adds scalable.ml and scalable.emulation subsystems: learned resource prediction, adaptive scaling, surrogate modeling, uncertainty calibration, and CLI integration. ([CHANGELOG.mdR8-R285])
  • New optional dependency group [ml] for ML features. ([CHANGELOG.mdR8-R285])

2. AI Assistants & Cloud/Kubernetes

  • Implements AI assistant commands (init-component, diagnose, explain, compose, migrate) with LLM backend support and heuristic fallback. ([CHANGELOG.mdR8-R285])
  • Adds Kubernetes, AWS, and GCP provider support, cloud cost estimation, artifact storage, overlays, and related CLI/API changes. ([CHANGELOG.mdR8-R285])

3. CI and Testing

  • Expands CI test matrix to include macOS and Python 3.13; adds validation and dry-run planning for example manifests in CI. [1]], [2]], [3]])
  • Over 400 passing unit tests, including extensive new coverage for ML and AI features. ([CHANGELOG.mdR8-R285])

4. Documentation & Config

  • Major updates to CHANGELOG.md and README.md for new features, usage, and requirements. [1]], [2]], [3]], [4]])
  • Adds .env.example for OpenAI credentials and model configuration. ([.env.exampleR1-R13])

5. Maintenance & Versioning

  • Updates versioning, export lists, and documentation links for new modules and features. [1]], [2]])

These changes collectively advance the project to a new phase with robust ML, emulation, AI, and cloud-native capabilities, while ensuring strong test coverage and clear documentation.

sash19 and others added 30 commits February 17, 2025 14:25
Creates the additive Phase 1 package structure off of version/2.0.0:
manifest/, providers/, session/, planning/, cli/. Each new package ships
with a docstring describing its Phase 1 role and its hooks for later
phases (telemetry, AI assistants, Kubernetes/cloud providers, ML
advisor).

scalable/manifest/schema.py defines the frozen v1 schema dataclasses
(ManifestModel, ProjectConfig, TargetConfig, ComponentConfig, TaskConfig)
and SCHEMA_VERSION = 1. The schema is intentionally implemented with
stdlib dataclasses so manifest validation works without the optional
[ai] extra (resolves Phase 1 plan section 9 open question #1).

scalable/manifest/errors.py declares the ManifestError hierarchy used by
the parser, validator, and Phase 4 AI migration assistant.

scalable/cli/main.py is a Phase 1 stub for the [project.scripts] entry
point; the real validate / plan --dry-run wiring lands in WU-10.

pyproject.toml: version bumped to 2.0.0a1, pyyaml pinned explicitly,
empty placeholder extras for ai/cloud/kubernetes registered so
pip install scalable[ai] resolves cleanly from day one, scalable
console script registered, packages.find used so the new sub-packages
are picked up by setuptools.

Verified: existing 73 unit tests pass unchanged; ruff clean on all new
modules. No public API removed or renamed.

Refs plans/v2.0.0_phase1_plan.md WU-1.
Phase 1: provider abstraction + scalable.yaml manifest foundation
…sing

phase 2 progress towards telemetry and deterministic advising
Implements Phase 3 of the v2.0.0 roadmap:

- KubernetesProvider over Dask Kubernetes Operator
- AWSBatchProvider over dask-cloudprovider (Fargate/EC2)
- GCPProvider scaffold (validation only; build_cluster deferred)
- ArtifactStore protocol with local and fsspec backends
- RemoteCacheBackend for opt-in remote cache (SCALABLE_CACHE_REMOTE)
- Manifest overlays (overlays: block + targets[*].overlay)
- CostEstimate primitives and static cost tables
- scalable run CLI verb
- Settings: cache_remote_uri, default_storage, runs_dir_remote
- Telemetry: CostEvent, cost.jsonl stream, cost in report
- Provider protocol: optional estimate_cost() method
- Public API: Phase 3 exports with optional-dep guards
- Docs: cloud.rst, kubernetes.rst, artifacts.rst, overlays.rst, cost.rst
- Example manifests: gke, aws, overlays
- 238 unit tests passing, ruff clean

Version bumped to 2.0.0a3.
Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost
Implements the Phase 4 deliverables from the v2.0.0 development plan:

- AI assistant subsystem (scalable.ai) with pluggable LLM backend
  protocol and heuristic-only fallback mode
- Component onboarding assistant (scalable init-component)
- Failure diagnosis assistant (scalable diagnose)
- Plan explanation assistant (scalable explain)
- Workflow composition assistant (scalable compose)
- Manifest migration assistant (scalable migrate)
- ScalableSession.plan(objective=, policy=) now functional with
  heuristic-based resource/worker adjustments
- Prompt template system for all assistants
- Settings: SCALABLE_AI_BACKEND, SCALABLE_AI_MODEL, SCALABLE_AI_ENDPOINT
- Populated [project.optional-dependencies] ai extra
- Version bumped to 2.0.0a4
- 356 unit tests passing, ruff clean

All AI features work without an LLM backend via deterministic heuristic
fallbacks. LLM enhancement is opt-in. All outputs are reviewable
artifacts - never auto-executed.

Ref: plans/v2.0.0_phase4_plan.md
Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
[WIP] Fix failing GitHub Actions job 'ruff + mypy'
Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
crvernon added 14 commits May 19, 2026 20:17
- Add scalable.ml package: LearnedAdvisor, AdaptiveScaler, FeatureExtractor,
  ResourceModel, HyperparameterSearch, cross_validate_advisor
- Add scalable.emulation package: @emulatable decorator, EmulatorRegistry,
  EmulatorDispatch, ActiveLearner, GradientBoostingEmulator,
  RandomForestEmulator, uncertainty calibration
- Add scalable advise CLI command with ML-backed recommendations
- Add EmulationEvent to telemetry events
- Add Phase 5 settings (ML cache, emulator registry, enable flags)
- Add [ml] optional dependency extra (scikit-learn, dask-ml, joblib)
- Bump version to 2.0.0a5
- 75 new unit tests, 431 total passing
@crvernon crvernon merged commit 7c2b2bd into develop May 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants