Version/2.0.0 by crvernon · Pull Request #26 · JGCRI/scalable

crvernon · 2026-05-20T03:42:34Z

This pull request introduces significant new features and improvements across the codebase, focusing on advanced ML optimization, model emulation, expanded AI assistant capabilities, and enhanced CI coverage. It also updates documentation, environment configuration, and the test matrix to support new functionality and platforms.

Major feature additions and improvements:

Machine Learning Optimization and Emulation:

Introduces a comprehensive ML optimization subsystem (scalable.ml) with learned resource prediction, adaptive scaling, feature extraction, model wrappers, distributed hyperparameter search, and model quality assessment. Adds a model emulation subsystem (scalable.emulation) for surrogate modeling, uncertainty-aware dispatch, and active learning strategies. Public APIs and CLI commands are provided for ML-backed recommendations and emulation. ([CHANGELOG.mdR8-R285])

AI Assistant and Cloud/Kubernetes Support:

Expands the AI assistant subsystem with pluggable LLM backends, onboarding, diagnosis, plan explanation, workflow composition, and migration assistants. Adds support for Kubernetes, AWS, and GCP providers, cloud cost estimation, artifact storage, and manifest overlays. CLI and API are extended for these new capabilities. ([CHANGELOG.mdR8-R285])

Testing and CI Enhancements:

Updates the test matrix to include macOS and Python 3.13, and adds a dedicated job for validating and planning example manifests in CI. [1]], [2]], [3]])

Documentation and Configuration:

Updates the CHANGELOG.md with detailed release notes for all new features, breaking changes, and tests. Updates the README to reflect new capabilities, optional dependency groups, and system requirements. Adds an .env.example for OpenAI configuration. [1]], [2]], [3]], [4]], [5]])

Notable grouped changes:

1. ML Optimization & Emulation

Adds scalable.ml and scalable.emulation subsystems: learned resource prediction, adaptive scaling, surrogate modeling, uncertainty calibration, and CLI integration. ([CHANGELOG.mdR8-R285])
New optional dependency group [ml] for ML features. ([CHANGELOG.mdR8-R285])

2. AI Assistants & Cloud/Kubernetes

Implements AI assistant commands (init-component, diagnose, explain, compose, migrate) with LLM backend support and heuristic fallback. ([CHANGELOG.mdR8-R285])
Adds Kubernetes, AWS, and GCP provider support, cloud cost estimation, artifact storage, overlays, and related CLI/API changes. ([CHANGELOG.mdR8-R285])

3. CI and Testing

Expands CI test matrix to include macOS and Python 3.13; adds validation and dry-run planning for example manifests in CI. [1]], [2]], [3]])
Over 400 passing unit tests, including extensive new coverage for ML and AI features. ([CHANGELOG.mdR8-R285])

4. Documentation & Config

Major updates to CHANGELOG.md and README.md for new features, usage, and requirements. [1]], [2]], [3]], [4]])
Adds .env.example for OpenAI credentials and model configuration. ([.env.exampleR1-R13])

5. Maintenance & Versioning

Updates versioning, export lists, and documentation links for new modules and features. [1]], [2]])

These changes collectively advance the project to a new phase with robust ML, emulation, AI, and cloud-native capabilities, while ensuring strong test coverage and clear documentation.

Releasing v1.0

Merging Develop

Release: 1.1.0

Creates the additive Phase 1 package structure off of version/2.0.0: manifest/, providers/, session/, planning/, cli/. Each new package ships with a docstring describing its Phase 1 role and its hooks for later phases (telemetry, AI assistants, Kubernetes/cloud providers, ML advisor). scalable/manifest/schema.py defines the frozen v1 schema dataclasses (ManifestModel, ProjectConfig, TargetConfig, ComponentConfig, TaskConfig) and SCHEMA_VERSION = 1. The schema is intentionally implemented with stdlib dataclasses so manifest validation works without the optional [ai] extra (resolves Phase 1 plan section 9 open question #1). scalable/manifest/errors.py declares the ManifestError hierarchy used by the parser, validator, and Phase 4 AI migration assistant. scalable/cli/main.py is a Phase 1 stub for the [project.scripts] entry point; the real validate / plan --dry-run wiring lands in WU-10. pyproject.toml: version bumped to 2.0.0a1, pyyaml pinned explicitly, empty placeholder extras for ai/cloud/kubernetes registered so pip install scalable[ai] resolves cleanly from day one, scalable console script registered, packages.find used so the new sub-packages are picked up by setuptools. Verified: existing 73 unit tests pass unchanged; ruff clean on all new modules. No public API removed or renamed. Refs plans/v2.0.0_phase1_plan.md WU-1.

…validation

Phase 1: provider abstraction + scalable.yaml manifest foundation

…sing phase 2 progress towards telemetry and deterministic advising

Implements Phase 3 of the v2.0.0 roadmap: - KubernetesProvider over Dask Kubernetes Operator - AWSBatchProvider over dask-cloudprovider (Fargate/EC2) - GCPProvider scaffold (validation only; build_cluster deferred) - ArtifactStore protocol with local and fsspec backends - RemoteCacheBackend for opt-in remote cache (SCALABLE_CACHE_REMOTE) - Manifest overlays (overlays: block + targets[*].overlay) - CostEstimate primitives and static cost tables - scalable run CLI verb - Settings: cache_remote_uri, default_storage, runs_dir_remote - Telemetry: CostEvent, cost.jsonl stream, cost in report - Provider protocol: optional estimate_cost() method - Public API: Phase 3 exports with optional-dep guards - Docs: cloud.rst, kubernetes.rst, artifacts.rst, overlays.rst, cost.rst - Example manifests: gke, aws, overlays - 238 unit tests passing, ruff clean Version bumped to 2.0.0a3.

Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost

Implements the Phase 4 deliverables from the v2.0.0 development plan: - AI assistant subsystem (scalable.ai) with pluggable LLM backend protocol and heuristic-only fallback mode - Component onboarding assistant (scalable init-component) - Failure diagnosis assistant (scalable diagnose) - Plan explanation assistant (scalable explain) - Workflow composition assistant (scalable compose) - Manifest migration assistant (scalable migrate) - ScalableSession.plan(objective=, policy=) now functional with heuristic-based resource/worker adjustments - Prompt template system for all assistants - Settings: SCALABLE_AI_BACKEND, SCALABLE_AI_MODEL, SCALABLE_AI_ENDPOINT - Populated [project.optional-dependencies] ai extra - Version bumped to 2.0.0a4 - 356 unit tests passing, ruff clean All AI features work without an LLM backend via deterministic heuristic fallbacks. LLM enhancement is opt-in. All outputs are reviewable artifacts - never auto-executed. Ref: plans/v2.0.0_phase4_plan.md

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/b7e62493-29e0-4a5f-9bdb-28a778012e68 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

[WIP] Fix failing GitHub Actions job 'ruff + mypy'

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Phase 4: AI assistant features

- Add scalable.ml package: LearnedAdvisor, AdaptiveScaler, FeatureExtractor, ResourceModel, HyperparameterSearch, cross_validate_advisor - Add scalable.emulation package: @emulatable decorator, EmulatorRegistry, EmulatorDispatch, ActiveLearner, GradientBoostingEmulator, RandomForestEmulator, uncertainty calibration - Add scalable advise CLI command with ML-backed recommendations - Add EmulationEvent to telemetry events - Add Phase 5 settings (ML cache, emulator registry, enable flags) - Add [ml] optional dependency extra (scikit-learn, dask-ml, joblib) - Bump version to 2.0.0a5 - 75 new unit tests, 431 total passing

Version/2.0.0 phase5 ml emulation

sash19 and others added 30 commits February 17, 2025 14:25

Merge pull request #17 from JGCRI/develop

efc9261

Releasing v1.0

Merge pull request #18 from JGCRI/develop

ec1d603

Merging Develop

ignore roo

1bbd5b7

Merge pull request #19 from JGCRI/develop

4266aaf

Release: 1.1.0

resize logo

38da71d

resize logo

0ed8671

resize logo

65d539e

WU-2: add scalable.yaml parser with env expansion + schema checks

2f6c7fd

WU-3: add manifest semantic validator + validation report tests

5b6b7b9

WU-4: add provider protocol, deployment spec, and registry

a2eccad

WU-5: add LocalProvider with tagged local execution and tests

609b9e0

WU-6: add SlurmProvider translation layer with mocked tests

738a0b5

WU-7: add manifest-to-legacy adapter and ModelConfig deprecation gate

1bbac8f

feat(v2-phase1): add session+dryrun APIs, CLI commands, docs, and CI …

cdbc51d

…validation

ignore env files

abecbeb

add env example file

78ca831

update changelog

f9b5642

Merge pull request #20 from JGCRI/version/2.0.0-phase1-provider-manifest

fc0a8e8

Phase 1: provider abstraction + scalable.yaml manifest foundation

phase 2 progress towards telemetry and deterministic advising

5cf2d97

Merge pull request #21 from JGCRI/version/2.0.0-phase2-telemetry-advi…

1ae928c

…sing phase 2 progress towards telemetry and deterministic advising

Merge pull request #22 from JGCRI/version/2.0.0-phase3-cloud-kubernetes

6c55dde

Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost

Initial plan

78b12cb

Merge version/2.0.0-phase4-ai-assistants into fix branch

468ff67

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Fix lint violations in session and AI planning tests

6e18a73

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Add explicit GitHub Actions token permissions in tests workflow

cc28345

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/b7e62493-29e0-4a5f-9bdb-28a778012e68 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Merge pull request #24 from JGCRI/copilot/fix-ruff-mypy-job-failure

10f57eb

[WIP] Fix failing GitHub Actions job 'ruff + mypy'

Rollback branch content to commit 1460fff

3dc3683

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

crvernon added 14 commits May 19, 2026 20:17

ruff fixes

3b4bd09

ruff fixes

d911ef8

Merge pull request #23 from JGCRI/version/2.0.0-phase4-ai-assistants

a864a60

Phase 4: AI assistant features

Add Phase 5 implementation plan

60dfd39

Merge Phase 4: AI assistant features into version/2.0.0

bc321c9

Merge pull request #25 from JGCRI/version/2.0.0-phase5-ml-emulation

a3b68a6

Version/2.0.0 phase5 ml emulation

update docs

dcb8a2d

how-to tutorials in docs

b3b2151

jupyter notebook tutorials

6112929

pydanticai transition

ea975f4

support tests failure for ai

2dc4988

ruff adjustments

c4021e1

formatting for ruff

d0df09f

crvernon merged commit 7c2b2bd into develop May 20, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version/2.0.0#26

Version/2.0.0#26
crvernon merged 44 commits into
developfrom
version/2.0.0

crvernon commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

crvernon commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants