Version/2.0.0 phase5 ml emulation by crvernon · Pull Request #25 · JGCRI/scalable

crvernon · 2026-05-20T00:48:22Z

Here is the pull request description for Phase 5:

Phase 5: ML Optimization and Emulation

Branch: version/2.0.0-phase5-ml-emulation → version/2.0.0
Plan: plans/v2.0.0_phase5_plan.md
Prior phases: Phase 1 (#20), Phase 2 (#21), Phase 3 (#22), Phase 4 (#23)
Version: 2.0.0a5

Summary

Phase 5 is the capstone phase of the v2.0.0 roadmap. It adds ML-backed resource prediction and scientific model emulation as first-class capabilities, completing Scalable's evolution from a Slurm-oriented launcher into a portable scientific-model execution control plane with optional ML services.

All ML features degrade gracefully to Phase 2 heuristics when training data is insufficient or scalable[ml] is not installed. Emulation is opt-in (off by default) and uncertainty-aware — predictions are never silently substituted for full model runs.

What's new

ML Optimization (`scalable.ml`)

LearnedAdvisor — ML-backed resource recommendations using gradient boosting, random forest, or quantile regression trained on telemetry history. Returns the same ResourceRecommendation payload as Phase 2's heuristic advisor.
AdaptiveScaler — real-time adaptive worker scaling with configurable queue-depth thresholds, min/max worker bounds, and cooldown periods to prevent thrashing.
FeatureExtractor — engineers features from telemetry records (rolling aggregates, task identity hashing) and user-provided input features for ML consumption.
ResourceModel — unified sklearn wrapper with fit/predict/confidence intervals, model persistence via joblib, and percentile fallback when sklearn is unavailable.
HyperparameterSearch — Dask-ML distributed tuning (hyperband, successive halving, random) with sklearn RandomizedSearchCV fallback.
cross_validate_advisor() — model quality assessment with MAE, RMSE, R², and coverage metrics.

Model Emulation (`scalable.emulation`)

@emulatable decorator — marks functions as emulation-capable with declared inputs, outputs, domain bounds, uncertainty requirements, and confidence thresholds.
EmulatorRegistry — versioned emulator management with filesystem persistence, domain validation, and joblib serialization.
EmulatorDispatch — confidence-gated routing between emulator and full model with full provenance recording of every dispatch decision.
ActiveLearner — intelligent scenario selection using expected improvement, maximum uncertainty, or random acquisition strategies.
GradientBoostingEmulator and RandomForestEmulator — surrogate model implementations with tree-based uncertainty estimation.
calibrate_emulator() — uncertainty calibration assessment (coverage, sharpness).

CLI

scalable advise — ML-backed resource recommendations from the command line. Supports --task, --target, --model-type, --confidence, --format (text/json). Degrades to heuristic advisor when ML unavailable.

Telemetry

EmulationEvent — new event type tracking emulator dispatch decisions (source, confidence, fallback reason, domain validity).

Configuration

SCALABLE_ML_CACHE_DIR — trained ML model cache location
SCALABLE_EMULATOR_DIR — emulator registry storage
SCALABLE_ML — enable/disable ML features (default: enabled)
SCALABLE_EMULATION — enable/disable emulation (default: disabled)
SCALABLE_EMULATION_CONFIDENCE — default confidence threshold (0.9)

Dependencies

New [project.optional-dependencies] ml extra: scikit-learn >= 1.3, dask-ml >= 2023.3.24, joblib >= 1.3

Stats

26 files changed (+4,214 lines)
14 new source modules across scalable/ml/ and scalable/emulation/
1 new CLI command (scalable advise)
75 new unit tests — all passing
431 total unit tests — zero regressions from Phases 1–4
ruff lint clean on all new modules

Design principles honored

Principle	How it's satisfied
AI proposes; Scalable disposes	ML predictions validated by deterministic policy before execution
Every plan is inspectable	Recommendations include feature importances, confidence intervals, training provenance
Manual overrides always win	Users can pin resources, disable ML, or force full-model execution
Emulators are opt-in and labeled	Default disabled; every result records whether it was emulated or full-model
Offline-compatible	All models trained/cached locally; no remote inference required

After this merge

The version/2.0.0 branch will contain the complete v2.0.0 feature set across all 5 phases and can proceed toward release candidate stabilization.

- Add scalable.ml package: LearnedAdvisor, AdaptiveScaler, FeatureExtractor, ResourceModel, HyperparameterSearch, cross_validate_advisor - Add scalable.emulation package: @emulatable decorator, EmulatorRegistry, EmulatorDispatch, ActiveLearner, GradientBoostingEmulator, RandomForestEmulator, uncertainty calibration - Add scalable advise CLI command with ML-backed recommendations - Add EmulationEvent to telemetry events - Add Phase 5 settings (ML cache, emulator registry, enable flags) - Add [ml] optional dependency extra (scikit-learn, dask-ml, joblib) - Bump version to 2.0.0a5 - 75 new unit tests, 431 total passing

crvernon added 4 commits May 19, 2026 20:17

ruff fixes

3b4bd09

Add Phase 5 implementation plan

60dfd39

Merge Phase 4: AI assistant features into version/2.0.0

bc321c9

crvernon force-pushed the version/2.0.0-phase5-ml-emulation branch from 0777b09 to 2efbe9d Compare May 20, 2026 00:50

crvernon merged commit a3b68a6 into version/2.0.0 May 20, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version/2.0.0 phase5 ml emulation#25

Version/2.0.0 phase5 ml emulation#25
crvernon merged 4 commits into
version/2.0.0from
version/2.0.0-phase5-ml-emulation

crvernon commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crvernon commented May 20, 2026

Phase 5: ML Optimization and Emulation

Summary

What's new

ML Optimization (scalable.ml)

Model Emulation (scalable.emulation)

CLI

Telemetry

Configuration

Dependencies

Stats

Design principles honored

After this merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ML Optimization (`scalable.ml`)

Model Emulation (`scalable.emulation`)