Minor Changes
-
8fefe5c: Adds Phase 1 of the Local Model Lifecycle Manager — hardware detection.
The new
HardwareDetectorreturns aHardwareProfileon macOS (Apple Silicon), Linux/Windows with NVIDIA, and CPU-only hosts. The dispatcher honors an operator override ahead of autodetection, caches results for 24h by default to match the spec's refresh cadence, and falls through to a CPU profile with a structured warning when a platform-specific probe fails — it never throws (S3).detectMacOSparsessystem_profiler SPDisplaysDataType -json+sysctland maps Apple Silicon chips (M1 through M4 Max) to their published unified-memory bandwidths.detectNVIDIAparsesnvidia-smi --query-gpu=name,memory.totaland maps NVIDIA GPUs (Ada, Ampere, Hopper) to their published memory bandwidths. Multi-GPU hosts pick the highest-VRAM card and warn.detectCPUderives a conservative bandwidth heuristic by regex-matching the CPU brand string against known DDR4/DDR5 desktop and DDR5/DDR4 server families.- Shell-outs are dependency-injected via a
ShellRunnerinterface so unit tests stay deterministic across CI hosts.
No orchestrator wiring yet — the detector is consumed by the ranker (Phase 2), scheduler (Phase 6), and HTTP/dashboard surfaces (Phases 7–8). LMLM remains opt-in and disabled by default per Phase 0.
-
0a90f37: Adds Phase 2a of the Local Model Lifecycle Manager — the HuggingFace data plane and the frozen benchmark snapshot.
HuggingFaceClientis a typed wrapper over the public HF REST endpoints (/api/models,/api/models/:repo). Every failure mode maps to a stableHuggingFaceClientErrorcode (HF_NOT_FOUND,HF_UNAUTHORIZED,HF_UNAVAILABLE,HF_NETWORK,HF_PARSE) so the cache and the future ranker can branch deterministically.HuggingFaceCacheis a versioned in-memory + on-disk cache for HF responses. The on-disk file at~/.harness/local-models/cache/huggingface.jsonis written atomically via tmp + rename (mirrors the proposal's O2 invariant). Missing, malformed, or schema-mismatched files reset to an empty cache and emit a structured warning instead of throwing.loadFrozenSnapshotreturns the bundled benchmark snapshot the orchestrator falls back to when HF and the live leaderboard sources are unreachable (S4). The loader is intentionally lenient — malformed or schema-invalid input yields a typed warning and an empty snapshot, never a throw.- A seed
snapshot.jsonships three placeholder models across Qwen / DeepSeek / Llama so Phase 2c has something to merge against on its first run. - The HF
fetcherand the cache filesystem are injected through narrow interfaces — unit tests stay fully deterministic without touching the network or the real~/.harnessdirectory.
No orchestrator, CLI, dashboard, or HTTP wiring yet. VRAM/speed math (Phase 2b), evidence + recency grading (Phase 2c), the merge algorithm (Phase 2c), the
RankedModelorchestrator (Phase 2d), and the parity tests against the whichllm reference outputs (Phase 2d) land in subsequent slices. LMLM remains opt-in and disabled by default per Phase 0. -
24d0bd5: Adds Phase 2b of the Local Model Lifecycle Manager — the VRAM and speed estimators the ranker (Phase 2c–d) will compose.
normalizeQuantIdresolves any GGUF / MLX quant string the HF ecosystem actually emits (canonical keys, case variants, common aliases like'q4_k_m','mlx-q4','fp16','q4') to a canonical{ canonical, known, bitsPerWeight }record. Unknown ids fall through to a conservative 8-bit fallback and surface asknown: falseso downstream callers can flag the estimate.estimateVram({ sizeB, activeB?, quant, contextTokens?, kvCacheQuant? })returns the four-term decomposition the dashboard's "why this won't fit" tooltip will eventually show — weights, KV cache, activations, framework overhead — pre-summed intototalGb. Weights are sized off the total params (MoE keeps all weights resident); KV cache scales linearly withcontextTokensand respects the kv-cache quantization multiplier.estimateSpeed({ sizeB, activeB?, quant, hardware, vramEstimate, backend? })returns the bandwidth-bound token throughput projection plus enough provenance for the ranker's justification text (effectiveBandwidthGbps,partialOffloadFraction,activeWeightsGb,backend,confidence). MoE active params drive throughput, partial-offload blends GPU bandwidth with a conservative CPU floor, andtokPerSecshort-circuits to 0 withconfidence: 'low'when the model won't fit at all — the estimator never throws.
The canonical
QUANT_BITS_PER_WEIGHTtable,BACKEND_EFFICIENCYtable, andCPU_BANDWIDTH_FLOOR_GBPSlive as named constants in one place so Phase 2d's parity fixtures can retune them without touching call sites.No orchestrator, CLI, dashboard, or HTTP wiring yet. Evidence + recency grading (Phase 2c), the cross-source benchmark merge (Phase 2c), the
RankedModelorchestrator (Phase 2d), and the parity tests against the whichllm reference outputs (Phase 2d) land in subsequent slices. LMLM remains opt-in and disabled by default per Phase 0. -
1efe708: Adds Phase 2c of the Local Model Lifecycle Manager — the evidence grader, the lineage-aware recency demotion, two seed benchmark source adapters, and the cross-source merge the ranker (Phase 2d) will compose with Phase 2b's VRAM/speed math.
gradeEvidence({ observationModel, observationQuant, targetModel, targetQuant, lineagePosition?, observationEvidence? })returns one of'direct' | 'variant' | 'base' | 'interpolated' | 'self-reported'with its calibrated confidence multiplier fromEVIDENCE_CONFIDENCE.'self-reported'is an absorbing tag (no upgrade);-GGUF/-MLX/-AWQ/-GPTQmirror suffixes are stripped before model comparison soQwen/Qwen3-32B-GGUFandQwen/Qwen3-32Bcollapse to one identity. Quant alias resolution goes throughnormalizeQuantIdso'q4_k_m'and'Q4_K_M'match.applyRecencyDecay({ observedAt, snapshotDate, lineagePosition? })ages observations on an exponential curve withHALFLIFE_MONTHS = 9and applies a per-generationLINEAGE_STEP_PENALTY = 0.6multiplier on top. The final weight clamps toMIN_RECENCY_WEIGHT = 0.05so no observation is fully zeroed out. Future-dated observations and malformed ISO strings degrade safely to age-zero rather than throwing.openLlmLeaderboardSourceandhuggingFacePopularitySourceimplement the newBenchmarkSourceinterface. Both take an injectedFetcherso CI mocks the wire and the live network is not touched during tests. Every failure path (network, schema, parse) surfaces a structuredSourceWarningrather than throwing — same discipline as Phase 2a's frozen snapshot loader. The leaderboard adapter emitsdirectobservations across the leaderboard's benchmark slugs; the popularity adapter emits a synthetic'hf-popularity'benchmark fromdownloads + likes × LIKE_WEIGHT, normalised against the per-fetch maximum.mergeBenchmarks({ observations, target, snapshotDate, sourceWeights? })weights each observation byevidenceConfidence × recencyWeight × sourceWeight, normalises source-native scales into[0, 1], and emits{ score (0–100), confidence: 'high' | 'medium' | 'low', contributions[] }.'high'requires at least onedirectobservation withrecencyWeight ≥ 0.8;'low'when no observation graded aboveinterpolatedor every combined weight< 0.3.DEFAULT_SOURCE_WEIGHTSdefaults popularity to a quarter of a leaderboard score; callers override viasourceWeights. Unknown sources fall back toDEFAULT_UNKNOWN_SOURCE_WEIGHT = 0.5.
No orchestrator, CLI, dashboard, or HTTP wiring yet. Phase 2d composes Phase 2c's merge output with Phase 2b's VRAM/speed math into the
RankedModelorchestrator and adds the whichllm parity fixtures (Q1, Q2). LMLM remains opt-in and disabled by default per Phase 0. -
e4f070a: Adds Phase 2d of the Local Model Lifecycle Manager — the
RankedModelorchestrator (rankModels) and the two parity fixtures called out in spec success criteria Q1 + Q2.rankModels(input: RankInput): RankResultcomposes the Phase 2b math (estimateVram,estimateSpeed) and the Phase 2c fusion (mergeBenchmarks) into a single hardware-aware ranking. The orchestrator is pure: candidates in, ranked models out, no I/O. Won't-fit candidates are filtered from the default result so callers conform to F3 / Q3 without an extra step;options.includeUnfit: truekeeps them at the bottom withscore: 0so a dashboard can explain why a popular model is missing. Sorting is deterministic across runs and locales:scoredesc →estimatedTokPerSecdesc →hfRepoIdascending code-point order (we deliberately avoidlocaleComparebecause case-sensitivity flips silently between CI environments and would corrupt the parity fixtures).RankedModelmatches the spec's Core types block (proposal.md lines 124–138) and adds the full per-contributor breakdown (vramEstimate,speedEstimate,benchmarkScore) so the dashboard's "why this score?" tooltip and the Phase 5b proposal-justification renderer can show provenance without re-running the math. The row'sevidencefield is the weakest grade among the merged contributions — operators read it as "how trustworthy is the supporting evidence?", so a single self-reported observation flags the row even when a direct observation also contributes.LiveObservation = BenchmarkObservation & { hfRepoId }carries the model anchor the Phase 2cBenchmarkObservationshape deliberately omits. The orchestrator filters live observations byhfRepoIdbefore handing the per-candidate slice tomergeBenchmarks; Phase 6's scheduler will refactor the source adapters to emitLiveObservation[]directly so the model dimension stops being re-stitched at call time.scaleScorefolds the merge'sconfidencelabel and the speed estimator'sconfidenceband into the orchestrator-level score (BENCHMARK_CONFIDENCE_MULTIPLIER = { high: 1, medium: 0.85, low: 0.6 },SPEED_CONFIDENCE_MULTIPLIER = { high: 1, medium: 0.9, low: 0.75 }). This is what makes Q4 / Q5 hold at theRankedModel.scoreboundary: the merge's weighted-mean math collapses to the same raw score for a single-observation direct vs self-reported, so the confidence label is the only signal that distinguishes them in that case — and the orchestrator's score field has to carry the signal forward for downstream ranking.- Parity fixtures
tests/ranker/parity/m3-max-36gb.jsonandtests/ranker/parity/rtx-4090-24gb.jsonpin the top-1 model id (deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-GGUF) and a[scoreMin, scoreMax]band[55, 60]for the two hardware profiles the spec names. The bundled seed benchmark snapshot is the source of truth; CI never invokes whichllm. Refreshing the fixtures is a manual maintenance task tied to each v1.x release — the file format is intentionally small so the diff review is fast. RankerWarningsurfaces degraded paths the algorithm took.snapshot_unavailablefires when the caller passes the empty-snapshot fallback envelope (S4) and emerges once per call, not per candidate, so the orchestrator's invariant "never throws" composes cleanly with the merge's invariant "empty input → low confidence, score 0".
The
PoolManagerorchestrator that combines this ranking layer with the install adapter, allowlist enforcement, and budget-driven eviction lands in Phase 3c. TheLocalModelResolverintegration, proposal engine + schema generalization, background scheduler, and HTTP / WS / CLI / dashboard surfaces ship in Phases 4–9 per the spec. LMLM remains opt-in and disabled by default per Phase 0; nothing in this slice changes the orchestrator's behavior on a config without alocalModelsblock (N4). -
7eacc57: Adds Phase 3a of the Local Model Lifecycle Manager — the pool-state persistence primitive and the lowest-score-LRU eviction planner the Phase 3b Ollama installer +
PoolManagerwill compose.PoolStateStoreatomically persists the pool record to~/.harness/local-models/pool.jsonvia the same tmp + rename pattern the HuggingFace cache uses (O2 in the spec).load()tolerates missing (no warning — fresh install), malformed-JSON, schema-version-mismatched, and shape-mismatched files by resetting toEmptyPoolState()and emitting a single structured warning; the store never throws on a degraded file. The persisted envelope is versioned (POOL_STATE_VERSION = 1) so a Phase 3+ format change resets safely instead of silently consuming stale data.update(mutator)is the single mutation path. After every call the store recomputesdiskUsedGbfrom the entry sum so a caller cannot drift the field away fromentries.sizeOnDiskGb— the "derived data" invariant lives in the store, not at each call site.snapshot()returns a structured clone so reads cannot leak references back into the authoritative record.planEviction({ state, freeBudgetGb })returns anEvictionPlanwhoseevict[]is ordered lowest-currentScorefirst, with ties broken by oldestlastUsedAt(treatingnullas oldest so unused fresh installs evict before recently-resolved entries at the same score) and then oldestinstalledAt.freedGbis the cumulativesizeOnDiskGbof the selection;remainingNeededGbis the shortfall when the pool cannot satisfy the budget. The function is pure — no I/O, no mutation, no throws on negative / zero / oversized budgets.- A
PoolFilesystemport mirrors the cache'sCacheFilesystemso tests substitute an in-memory implementation without touching the disk; the two ports stay decoupled so neither module forces a shape on the other.
The installer interface, the
PoolManagerorchestration layer that combines this store with allowlist enforcement and an Ollama install adapter, theLocalModelResolverintegration that consumes the pool state as the resolver's candidate list, the proposal engine + schema generalization, the background scheduler, and the HTTP / WS / CLI / dashboard surfaces all land in Phases 3b–8. LMLM remains opt-in and disabled by default per Phase 0; nothing in this slice changes the orchestrator's behavior on a config without alocalModelsblock. -
2a236ba: Adds Phase 3b of the Local Model Lifecycle Manager — the install-adapter layer the Phase 3c
PoolManagerorchestrator will compose with Phase 3a'sPoolStateStore+planEviction.InstallAdaptercontract (install,evict,list,inspect) is transport-agnostic. In-band failures ofinstall/evict(target missing, install_failed, not_in_pool) resolve toInstallResultwithstatus: 'error'so the manager canswitch (result.code)cleanly; out-of-band failures (parse_failed,advisory_only) throwInstallErrorwith the same stableInstallErrorCodetaxonomy higher layers branch on (advisory_only,failed_target_missing(D13),installer_unavailable(S6),install_failed(S7),not_in_pool(D12),parse_failed).OllamaInstallAdapterspeaks/api/pull(NDJSON-streamed progress decoded into typedInstallEvents:pulling | progress | success | error),/api/delete,/api/tags, and/api/showagainst a configurable endpoint (defaulthttp://localhost:11434). Mid-stream cancellation viaAbortSignalresolves toinstall_failedso the manager (S7) can decide whether to invokeevictfor partial-byte cleanup. Network rejects map toinstaller_unavailable; 404s map tofailed_target_missing; malformed NDJSON lines are logged viaonWarnand skipped without breaking the stream. A faultyonEventconsumer is caught and logged so it cannot strand an in-flight install.AdvisoryInstallAdaptercovers LM Studio / vLLM / llama.cpp (D4).installandevictreject withInstallError('advisory_only', …)carrying the copy-paste command (lms get …,vllm serve …,llama-server -m …) the operator runs manually;listreturns[](the resolver probe loop is authoritative for advisory backends);inspectrejects withadvisory_only. Names are shell-quoted on render so a hostile-looking model id cannot break the rendered command.nullInstallAdapter()ships as the manager's default when LMLM is disabled and as a test seam for scenarios that don't exercise the install path; every method rejects withinstaller_unavailableso an accidental invocation surfaces structurally instead of as anundefinedmethod call.InstallError.toJSON()preserves thecodediscriminant across the structured-logger boundary so an operator reading~/.harness/logs/orchestrator.jsonlkeeps the error taxonomy afterJSON.stringifywould otherwise drop it.
The
PoolManagerorchestrator that combines this layer with allowlist enforcement and budget-driven eviction lands in Phase 3c. TheLocalModelResolverintegration, proposal engine + schema generalization, background scheduler, and HTTP / WS / CLI / dashboard surfaces ship in Phases 4–9 per the spec. LMLM remains opt-in and disabled by default per Phase 0; nothing in this slice changes the orchestrator's behavior on a config without alocalModelsblock. -
3588062: Adds Phase 3c of the Local Model Lifecycle Manager — the
PoolManagerorchestrator that composes Phase 3a'sPoolStateStore+planEvictionwith Phase 3b'sInstallAdapterinto the single high-level API later phases consume.PoolManager.installruns the full slot pipeline in one call: allowlist gate againstallowedOrgs/allowedFamilies(D1, F8) → idempotent short-circuit when the entry already exists →installer.inspectto resolve the disk footprint (skipped when the caller passessizeOnDiskGb, the path Phase 5b's proposal engine prefers) → capacity check againstdiskBudgetGb - diskUsedGb→ pre-commitinstaller.evictperplanEvictionplan in lowest-score-LRU order (F5) →installer.install→ appendPoolEntry+ persist atomically (O2 via Phase 3a). Returns a discriminatedInstallPoolResultwhoseevicted: PoolEntry[]lists exactly what changed.- Budget enforcement is hard at the engine layer (S5): an install whose target exceeds even the fully-evicted pool resolves to
{ status: 'error', code: 'budget_exceeded' }without invoking the installer. Every Phase 3b error code (advisory_only,failed_target_missing,installer_unavailable,install_failed,not_in_pool,parse_failed) propagates unchanged so the proposal engine (Phase 5b) and scheduler (Phase 6) branch on the same taxonomy. install_failedtriggers a best-effortinstaller.evictcleanup of partial bytes (S7);installer_unavailabledoes not (the installer is down — cleanup wouldn't reach it; S6);failed_target_missingdoes not (nothing was downloaded; D13).PoolManager.evictinvokesinstaller.evict, removes the entry from pool state, and persists once. An installer reply ofnot_in_poolis treated as silent D12 drift reconciliation — the entry is dropped from pool state and the result carriesreconciled: true. Aninstaller_unavailablereply preserves pool state (S6 — keep the operator's record until we can confirm the install backend agrees).PoolManager.reconcile()lists the installer's models and prunes pool entries the installer no longer reports (D12, F10 primitive). Auto-import is not done — that would cross the autonomy boundary (D1). A transport failure leaves pool state untouched and emitsonWarnso the scheduler's next tick can retry.PoolManager.markUsed(name)andPoolManager.updateScores(updates)are the bookkeeping seams Phase 4'sLocalModelResolverand Phase 6's scheduler will call after each resolved dispatch / re-rank tick. Both persist once per call and silently no-op when no update applies.PoolManager.configurePool(partial)is the Phase 7 CLI seam forpool {set-budget, allow-org, allow-family}. Updates only the supplied fields; existing entries are preserved.PoolManager.snapshot()andPoolManager.isAllowed({ hfRepoId, family? })are the read-only seams every consumer (resolver, proposal engine, dashboard, CLI) uses. Org matching is case-sensitive (HF registry truth); family matching is case-insensitive on the operator-typed slug.
The CLI subcommands,
LocalModelResolverintegration, proposal engine + schema generalization, background scheduler, and HTTP / WS / dashboard surfaces ship in Phases 4–9 per the spec. LMLM remains opt-in and disabled by default per Phase 0; nothing in this slice changes the orchestrator's behavior on a config without alocalModelsblock.
Patch Changes
-
5f9ed8c: Scaffolds the Local Model Lifecycle Manager (LMLM) — Phase 0.
- New package
@harness-engineering/local-models(empty barrel, no business logic yet). - New types in
@harness-engineering/types:LocalModelsConfig,LocalModelsPoolConfig,LocalModelsRefreshConfig,LocalModelsInstallerConfig,LocalModelsHardwareOverride, plus platform/installer unions. - New optional
localModelsblock onHarnessConfigSchemain the CLI, with Zod defaults that match the spec (24h refresh, 100GB budget, Ollama installer, opt-in disabled by default).
Disabled by default;
harness validateon existing configs remains green. Hardware detection, ranking, pool management, installer, proposal lifecycle, scheduler, HTTP/WS surfaces, CLI commands, and dashboard panel land in subsequent phases perdocs/changes/local-model-lifecycle-manager/proposal.md. - New package
-
Updated dependencies [5f9ed8c]
-
Updated dependencies [318b878]
- @harness-engineering/types@0.16.0