Skip to content

Feature/refactor n audit#47

Merged
cryptopoly merged 151 commits into
stagingfrom
feature/refactor-n-audit
May 10, 2026
Merged

Feature/refactor n audit#47
cryptopoly merged 151 commits into
stagingfrom
feature/refactor-n-audit

Conversation

@cryptopoly
Copy link
Copy Markdown
Owner

No description provided.

cryptopoly added 30 commits May 9, 2026 10:37
Release v0.7.6: HTML Challenge + UI scale + retry
Sets the floor before splitting state.py / inference.py / video_runtime.py
in Phase 1. Captures coverage + perf baselines, lifts route coverage
from 3 of 21 routes to 10, fixes two app.state.engine bugs that would
500 in production, and converts CI from a single Ubuntu runner to a
3-OS matrix (Windows advisory until Phase 4).

Tests
- Python: 1284 → 1302 (+18 contract tests for benchmarks, cache,
  finetuning, metrics, openai_compat, server, storage, prompts)
- TypeScript: 335 → 340 (+5 import-smoke tests for the four largest
  untested feature tabs: HtmlChallengeTab, CompareView, VideoStudioTab,
  ImageStudioTab)
- vitest scoped to src/ — drops 84 phantom tests that came from stale
  worktree clones under .claude/worktrees/

Coverage
- Python line coverage 61.7 % captured to COVERAGE_BASELINE.md, gated
  at 60 % floor in CI
- TS line coverage 60.2 % of imported source captured

Bugs fixed
- routes/prompts.py: app.state.engine → chaosengine (route would 500)
- routes/finetuning.py: same fix on /api/adapters

Infrastructure
- .github/workflows/build.yml: matrix over ubuntu-latest, macos-latest,
  windows-latest (advisory). Adds cargo check + coverage gate step.
- vitest.config.ts: include src/**, exclude node_modules + dist +
  src-tauri + vendor + .venv + .claude/worktrees
- pyproject.toml: pytest-cov pinned to dev extra
- package.json: @vitest/coverage-v8 added

Cleanup
- chaos_readme.md (3-line stub) deleted

Docs
- REFACTOR_PLAN.md — 6-phase plan w/ multi-OS + perf guardrails
- COVERAGE_BASELINE.md — floor numbers + lowest/highest modules
- PERF_BASELINE.md — text gen 297 tok/s captured; image + video specs
- scripts/perf-baseline.py — reusable subprocess-isolated capture
First extraction in the state.py split. Sets the facade pattern that
later sub-phases (sessions, models, benchmarks, downloads) follow.

- backend_service/state.py → backend_service/state/__init__.py (rename)
- New: backend_service/state/logs.py — LogManager owning the two ring
  buffers (logs maxlen=120, activity maxlen=60) + subscriber list +
  the two _time_label / _relative_label helpers.
- ChaosEngineState now composes a LogManager and exposes the public
  surface (state.logs, state.activity, state.add_log, state.subscribe_logs,
  state.unsubscribe_logs, state.add_activity) via thin delegate methods +
  read-only properties so existing call sites in routes/server.py,
  routes/setup.py, mcp/loader.py etc. keep working unchanged.
- Dead code removed: ``_relative_label`` static method had no callers.

state/__init__.py: 4418 → 4393 LOC; LogManager isolated for testability.

Tests: 1302 pass + 1 skip (no regression).
Lifts the pure cache-label / profile-change-reasons / metrics-payload
helpers off ``ChaosEngineState`` into ``backend_service/state/metrics.py``.
The 11 methods involved had no dependency on ``self._lock``, the chat
session list, or any other instance state — they only needed the
``runtime`` (already passed in via ``self.runtime``) plus the request
arguments. Now they live as module-level functions, with thin one-line
delegators kept on the class so the heavy internal call sites
(``generate_stream``, ``add_message_variant``, ``run_benchmark``, etc.)
don't need a churn pass.

Methods extracted:
- ``_cache_strategy_label`` → ``native_cache_strategy_label``
- ``_native_cache_label`` → ``native_cache_strategy_label`` (parameterised)
- ``_cache_label`` → ``cache_label``
- ``_loaded_model_metrics_fields`` → ``loaded_model_metrics_fields(runtime)``
- ``_requested_runtime_metrics_fields`` → ``requested_runtime_metrics_fields``
- ``_result_runtime_metrics_fields`` → ``result_runtime_metrics_fields``
- ``_assistant_metrics_payload`` → ``assistant_metrics_payload(runtime, …)``
- ``_stream_assistant_metrics_payload`` → ``stream_assistant_metrics_payload``
- ``_should_reload_for_profile`` → ``should_reload_for_profile``
- ``_cache_profile_change_reasons`` → ``cache_profile_change_reasons``
- ``_runtime_profile_change_reasons`` → ``runtime_profile_change_reasons``

state/__init__.py: 4393 → 4273 LOC. metrics.py: 307 LOC, fully unit-testable
without instantiating a ChaosEngineState.

Tests: 1302 pass + 1 skip (no regression).
routes/setup.py at 1932 LOC mixed nine concerns: pip + system + CUDA
torch + GPU bundle install, capability refresh, LongLive install,
mlx-video Wan install, and turbo-binary version detection. Two of
those concerns — LongLive and Wan — are background-job patterns that
each own their own state class + worker + 2-3 endpoints, with no
overlap with the rest of the file. Lifting them out clears ~500 LOC
from the main module.

- routes/setup.py → routes/setup/__init__.py (rename)
- New: routes/setup/longlive.py (244 LOC) — clone + venv + pip + 8 GB
  weights download. Owns _LongLiveJobState, _LONGLIVE_JOB singleton,
  _longlive_job_worker, and the start + status endpoints.
- New: routes/setup/wan_install.py (285 LOC) — Apple Silicon Wan2.x →
  MLX conversion path. Owns _WanInstallJobState, _WAN_INSTALL_JOB
  singleton, _wan_install_job_worker, and start + status + inventory
  endpoints.
- routes/setup/__init__.py: 1932 → 1441 LOC; sub-routers included via
  router.include_router() at the bottom of the file. The remaining
  concerns (pip / system / CUDA torch / GPU bundle / capability
  refresh / turbo) stay in __init__.py for now — each is its own
  cohesive group but they share the GPU bundle's helper surface
  (_run_pip_install, _purge_*) and a follow-up split would need to
  decide where those live first.

Test patches updated to target the new module paths:
- tests/test_mlx_video_wan_installer.py — patch
  backend_service.routes.setup.wan_install._wan_install_job_worker
  + ...wan_install.threading.Thread (instead of the old top-level)
- tests/test_setup_routes.py InstallLongLiveTests setUp — reset the
  _LONGLIVE_JOB singleton via backend_service.routes.setup.longlive
  module reference, since the singleton lives there now and the
  start_install_longlive endpoint reads it from longlive.py's
  module namespace, not from setup/__init__.

Tests: 1302 pass + 1 skip. Endpoint count unchanged at 13.
Sets up the inference subpackage scaffolding so the bigger engine
extractions (MLXWorkerEngine, LlamaCppEngine, RuntimeController) can
land as separate commits. Mirrors the state.py + routes/setup.py
patterns from earlier in Phase 1.

- backend_service/inference.py → backend_service/inference/__init__.py
  (rename — fixes the ``Path(__file__).resolve().parents[1]`` constant
  which would otherwise resolve to ``backend_service/`` instead of the
  repo root after the package rename. New parents[2] gets back to the
  workspace root.)
- New: backend_service/inference/_constants.py — shared constants
  (WORKSPACE_ROOT + 4 timeout values) lifted out so submodules can
  import without circling through __init__.py
- New: backend_service/inference/jsonrpc.py — JsonRpcProcess (214 LOC),
  the subprocess JSON-RPC bridge that owns the MLX worker's lifecycle.
  Self-contained — depends only on stdlib + the new _constants module.

inference/__init__.py: 3574 → 3395 LOC. Public re-exports preserve
``from backend_service.inference import JsonRpcProcess, RuntimeController, …``
for the 4 callers (vllm_engine, state, helpers/system, tests/).

Tests: 1302 pass + 1 skip (no regression).
Lifts the foundational types every inference engine touches into
``backend_service/inference/base.py``:

- ``RepeatedLineGuard`` — runaway-stream abort guard
- ``BackendCapabilities`` — runtime probe payload (mlx / gguf / vllm
  flags, llama-server paths, version strings)
- ``LoadedModelInfo`` — currently-loaded model dataclass + ``to_dict``
  with deferred capability resolution
- ``GenerationResult`` — single-shot completion + ``to_metrics``
- ``StreamChunk`` — token-stream event dataclass
- ``BaseInferenceEngine`` — abstract interface (``load_model``,
  ``generate``, ``stream_generate``, ``eval_perplexity``, …)

These were the most-imported public symbols from ``inference.py``
(test_inference.py + tests/test_dflash.py + vllm_engine.py + state
all pull at least one of them). The package re-exports the full set
from ``inference/__init__.py`` so the import surface
``from backend_service.inference import LoadedModelInfo, …`` keeps
working unchanged across all 4 callers.

inference/__init__.py: 3395 → 3074 LOC. base.py: 363 LOC, fully
unit-testable without instantiating any concrete engine.

Tests: 1302 pass + 1 skip (no regression).
Two more pieces lifted from inference/__init__.py:

- ``inference/_utils.py`` (39 LOC) — small helpers shared across
  engines and the controller. ``_now_label()`` (timestamp string used
  by every ``LoadedModelInfo``) and ``_normalize_message_content()``
  (coerce OpenAI-style content lists into plain strings).
- ``inference/simple_engines.py`` (198 LOC) — both engines that don't
  touch subprocesses, model files, or KV cache state:
    * ``RemoteOpenAIEngine`` — proxy chat completions to any
      OpenAI-compatible API (base URL + bearer key + model encoded
      into ``model_ref`` as ``remote:<base>|<key>|<model>``).
    * ``MockInferenceEngine`` — placeholder used as the initial
      default before any model is loaded; every method raises with
      a hint about which backend is missing.

inference/__init__.py: 3074 → 2897 LOC. Cumulative inference split
this branch: 3574 → 2897 in __init__ (-677 LOC), with the extracted
modules totaling 832 LOC across 5 focused files (jsonrpc, base,
_constants, _utils, simple_engines).

Tests: 1302 pass + 1 skip (no regression).
The MLX inference engine now lives in its own module
(``backend_service/inference/mlx_engine.py``, 485 LOC). This is the
biggest single class in the package — wraps a subprocess via the
JSON-RPC bridge and handles the full load / generate / stream /
update_profile / eval lifecycle, plus DFLASH speculative decoding
draft-model resolution.

Dependencies all resolve cleanly through the modules already split:
- base classes from ``inference/base``
- timeouts from ``inference/_constants``
- ``_now_label`` from ``inference/_utils``
- ``JsonRpcProcess`` from ``inference/jsonrpc``
- ``resolve_dflash_target_ref`` from ``backend_service.model_resolution``

inference/__init__.py: 2897 → 2458 LOC. RuntimeController + LlamaCppEngine
are the two big classes still living in __init__; both warrant their
own modules but each pulls 9-30 module-level helpers along with them
which need a separate small extraction pass first.

Tests: 1302 pass + 1 skip (no regression).
Largest extraction this branch — 953 LOC out of inference/__init__.py.
The llama.cpp engine + every helper that's only called by it now lives
in ``backend_service/inference/llama_cpp_engine.py`` (981 LOC), and the
shared util helpers move to ``inference/_utils.py``.

llama_cpp_engine.py owns:
- ``LlamaCppEngine`` — the engine class itself (load_model with the
  3-tier strategy fallback chain, build_command with binary routing
  + cache pre-validation, _wait_for_server, generate / stream_generate)
- ``_apply_llama_chat_template_fixes`` (Gemma system-role workaround)
- ``_apply_sampler_kwargs`` (Phase 2.2 sampler forwarding)
- ``_LLAMA_SAMPLER_KEYS`` constant
- ``_friendly_llama_error`` (startup-failure message translator)
- ``_llama_server_help_text`` / ``_llama_server_supports`` /
  ``_llama_server_cache_types`` (binary feature-detection trio)
- ``_LLAMA_HELP_CACHE`` / ``_CACHE_TYPE_CACHE`` / ``_STANDARD_CACHE_TYPES``
  (the caches feature-detection populates)
- ``_resolve_mmproj_path`` (vision projector sibling lookup)
- ``_gguf_startup_fallback_note`` (runtime note builder)

inference/_utils.py grows to take the helpers shared between
LlamaCppEngine + RuntimeController:
- ``_read_text_tail`` / ``_append_runtime_note`` / ``_http_json``
- ``_find_open_port`` / ``_resolve_gguf_path`` / ``_is_local_target``
- ``_looks_like_gguf``

inference/__init__.py: 2458 → 1521 LOC. Cumulative inference split
this branch: 3574 → 1521 in __init__ (-2053 LOC) across 7 focused
modules. RuntimeController is the only big class still there; its
~1050 LOC depend on a wide cross-cutting helper surface that warrants
its own dedicated extraction pass next.

Test patches updated:
- tests/test_inference.py — 16 ``mock.patch("backend_service.inference.
  _llama_server_*")`` calls retargeted to ``…llama_cpp_engine.…``
  because the engine resolves these helpers through its own module
  namespace, not the package facade.

The 8 helpers + 4 constants stay reachable via re-export from
``backend_service.inference`` so existing direct imports
(``_friendly_llama_error``, ``_apply_sampler_kwargs``, etc.) keep
working unchanged.

Tests: 1302 pass + 1 skip (no regression).
…fiers

Sets up the image_runtime subpackage scaffolding so engine extractions
(DiffusersTextToImageEngine 1112 LOC, MfluxImageEngine, PlaceholderImageEngine)
can land as separate commits. Mirrors the patterns from state.py +
inference.py + routes/setup.py earlier in Phase 1.

- backend_service/image_runtime.py → backend_service/image_runtime/__init__.py
  (rename — fixes Path(__file__).resolve().parents[1] which would
  resolve to backend_service/ instead of repo root after the package
  rename. parents[2] gets back to the workspace root.)
- New: backend_service/image_runtime/types.py (156 LOC) — three frozen
  dataclasses every image engine speaks: ImageRuntimeStatus,
  ImageGenerationConfig (40+ opt-in flags: GGUF / Nunchaku / preview-VAE /
  distill-LoRA / fp8 / cache strategy / sampler / CFG decay),
  GeneratedImage.
- New: backend_service/image_runtime/repos.py (262 LOC) — pure-string
  family classifiers (_is_sdxl_repo, _is_flux_repo, _is_flow_matching_repo,
  _gguf_transformer_class_for_repo, _nunchaku_transformer_class_for_repo)
  plus _SDXL_VAE_FIX_REPO + _locate_sdxl_vae_fix_snapshot, _AYS_TIMESTEPS
  schedule arrays, _SAMPLER_REGISTRY, and _apply_scheduler. All pure
  functions on the repo string + diffusers pipeline; nothing here touches
  the filesystem or torch.

Public re-exports preserve every existing import path:
``from backend_service.image_runtime import ImageGenerationConfig,
ImageRuntimeManager, validate_local_diffusers_snapshot`` keeps working
across app.py, state, sdcpp_image_runtime, helpers/{images,video},
video_runtime, tests/.

image_runtime/__init__.py: 2097 → 1761 LOC.

Tests: 1302 pass + 1 skip (no regression).
Two more pieces from image_runtime/__init__.py:

- backend_service/image_runtime/snapshot.py (185 LOC) —
  validate_local_diffusers_snapshot + supporting _snapshot_retry_guidance
  + _snapshot_visible_label. Walks a downloaded snapshot tree to catch
  incomplete downloads up-front (4 classes of incompleteness:
  missing model_index.json, missing components, broken symlinks,
  missing weight shards). Used by sdcpp_image_runtime, video_runtime,
  helpers/{images,video}, tests.
- backend_service/image_runtime/device.py (95 LOC) —
  _resolve_image_python + _guess_expected_device +
  _windows_cuda_unavailable_message + _is_cuda_torch_unavailable_error.
  All four are device-detection helpers used by the engine + the
  Discover-panel runtime probe. Pure: no torch import, no engine state.

Re-exports keep the public surface unchanged
(``from backend_service.image_runtime import validate_local_diffusers_snapshot``).

image_runtime/__init__.py: 1761 → 1575 LOC.

Test patch updated: tests/test_image_runtime.py
test_detect_device_reports_broken_windows_cuda_torch — patches now
target ``backend_service.image_runtime.device.{platform,_nvidia_gpu_present}``
because ``_windows_cuda_unavailable_message`` resolves both via its
own module namespace, not the package facade.

Tests: 1302 pass + 1 skip (no regression).
The placeholder SVG renderer + its 4 colour helpers (_stable_hash,
_resolve_base_seed, _mix_channel, _rgb_from_hsv) live in
``backend_service/image_runtime/placeholder_engine.py`` (158 LOC).
Renders a deterministic SVG card with the prompt, model, seed, and
hashed pseudo-random colours so the Image Studio panel stays
interactive on machines without diffusers / torch installed.

Self-contained — only depends on stdlib + the ImageGenerationConfig +
GeneratedImage dataclasses from types.py. No torch, no diffusers, no
filesystem touches.

``MAX_IMAGE_SEED`` + ``_resolve_base_seed`` are re-exported through
__init__.py because ``DiffusersTextToImageEngine`` and ``MfluxImageEngine``
both also resolve their seeds via ``_resolve_base_seed`` (both still
in __init__ for now).

image_runtime/__init__.py: 1575 → 1455 LOC.

Tests: 1302 pass + 1 skip (no regression).
Native Apple Silicon FLUX runtime now lives in
``backend_service/image_runtime/mflux_engine.py`` (112 LOC). The
engine + ``_mflux_name_for_repo`` repo→name mapper move together since
nothing else uses the mapper.

Self-contained: imports stdlib + ``GeneratedImage`` /
``ImageGenerationConfig`` from types, ``_resolve_base_seed`` from
placeholder_engine. The ``mflux`` package itself is a deferred
import — kept inside ``generate()`` so the module loads cleanly on
non-Apple platforms and the manager can fall back to diffusers when
``probe()`` reports unavailable.

image_runtime/__init__.py: 1455 → 1366 LOC.

Tests: 1302 pass + 1 skip (no regression).
Mirrors the image_runtime split. Sets up the video_runtime subpackage
scaffolding so engine extractions (DiffusersVideoEngine 1335 LOC,
VideoRuntimeManager) can land as separate commits.

- backend_service/video_runtime.py → backend_service/video_runtime/__init__.py
  (rename — fixes Path(__file__).resolve().parents[1] which would
  resolve to backend_service/ instead of repo root after the package
  rename. parents[2] gets back to the workspace root.)
- New: backend_service/video_runtime/types.py (195 LOC) — three frozen
  dataclasses every video engine speaks:
    * VideoRuntimeStatus — runtime probe payload (engine, device,
      missing deps, GPU memory, torch-install warning).
    * VideoGenerationConfig — request payload mirroring
      ImageGenerationConfig with video-specific footprint (numFrames,
      fps, interpolationFactor + LTX refiner / Wan distill / NF4 / FP8 /
      Nunchaku / preview-VAE / scheduler / CFG-decay / STG-scale).
    * GeneratedVideo — engine output (mp4 bytes + duration + dims +
      effective steps/guidance).

Public re-exports preserve the existing
``from backend_service.video_runtime import VideoRuntimeManager,
VideoGenerationConfig, GeneratedVideo`` paths across app, state,
helpers/video, sdcpp_video_runtime, mlx_video_runtime, tests.

video_runtime/__init__.py: 2378 → 2216 LOC.

Tests: 1302 pass + 1 skip (no regression).
Sets up the api subpackage scaffolding so domain extractions (chat,
image, video, models, setup, storage) can land as separate commits.
TypeScript resolves ``from "../api"`` to ``api/index.ts`` so every
existing import path keeps working unchanged.

- src/api.ts → src/api/index.ts (rename — fixes the relative
  ``./types`` import that would otherwise resolve to a non-existent
  ``src/api/types`` after the rename. Now ``../types``.)
- 119 exports unchanged in this commit; only the file location moved.

Tests: 340 pass (29 files); tsc clean.
13 functions covering catalog / runtime / outputs / progress polling /
downloads / preload / unload / generate / cancel / delete-output now
live in ``src/api/image.ts``. The package facade ``src/api/index.ts``
re-exports the image surface so existing
``import { generateImage } from "../api"`` paths keep working.

- New: src/api/image.ts (112 LOC) — 13 image endpoints
- src/api/index.ts: helpers ``postJson`` / ``patchJson`` / ``deleteJson``
  flipped from private to ``export`` so domain modules can import them.
- Domain re-export block added at the bottom of index.ts.

src/api/index.ts: 1430 → 1368 LOC (after the rename + extract).

Tests: 340 pass (29 files); tsc clean.
16 functions covering catalog + 3 runtime probes (diffusers / LongLive /
mlx-video) / progress / downloads / preload / unload / generate /
cancel / outputs CRUD / blob-URL fetcher now live in
``src/api/video.ts`` (162 LOC).

Mirrors the image-domain shape from Phase 2-2. The 3-runtime probe split
preserves the per-engine probe pattern (LongLive at
``~/.chaosengine/longlive`` vs diffusers torch probe vs mlx-video
Apple-Silicon probe) so the Studio's runtime chips stay independent.

Re-exported from ``./index`` so the existing
``import { generateVideo, getVideoRuntime } from "../api"`` paths in
useVideoState, VideoStudioTab, VideoDiscoverTab, VideoModelsTab,
VideoGalleryTab keep working.

src/api/index.ts: 1368 → 1283 LOC.

Tests: 340 pass; tsc clean.
…ints

Two cohesive clusters split out into separate domain modules:

- ``src/api/setup.ts`` (332 LOC) — all install / refresh endpoints:
  pip + system package install, CUDA torch install with index walk,
  GPU bundle background job + types, LongLive (FU-003) async install +
  types, mlx-video Wan (FU-025) install + types + inventory,
  llama-server-turbo update probe, capability refresh, FU-022 LLM
  prompt enhancer.
- ``src/api/admin.ts`` (187 LOC) — diagnostics + storage:
  DiagnosticsSnapshot + DiagnosticsLogTail + ReextractRuntimeResult +
  fetchDiagnosticsSnapshot + fetchDiagnosticsLogTail + reextractRuntime;
  ModelMoveJobState + StorageSettingsSnapshot + UpdateStoragePathResult +
  getStorageSettings + updateHfCachePath + startModelMove +
  getModelMoveStatus.

Together these account for the heaviest section of api.ts. Re-exported
from ``./index`` so existing call sites in
SettingsPanel / ModelStoragePanel / DiagnosticsPanel / VideoStudioTab /
VideoDiscoverTab / SetupTab / useSettings / useVideoState keep their
``import { ... } from "../../api"`` paths unchanged.

src/api/index.ts: 1283 → 895 LOC. Cumulative Phase 2 split: api.ts
1430 → 895 in index, with 4 domain modules totaling 788 LOC.

Tests: 340 pass (29 files); tsc clean.
Sessions CRUD + message variants (Phase 2.5) + Delve critique
(Phase 3.6) + thread fork (Phase 2.4) + generate / cancel /
generateChatStream (with phase / panic / thermal / logprobs callbacks) +
session document upload / list / delete now live in
``src/api/chat.ts`` (328 LOC).

12 functions, 3 type aliases (ChatStreamPhase, StreamCallbacks,
SessionDocument). The streaming path is the longest single function
in the package — the SSE protocol parser ships with full callback
fan-out for token / reasoning / phase / panic / thermal / logprobs /
done / error events.

``readErrorDetail`` flipped from private to ``export`` so the chat
module's upload + delete paths can use it for user-facing error
messages without duplicating the JSON-vs-text fallback logic.

Re-exported from ``./index`` so existing
``import { generateChatStream, createSession } from "../api"`` paths
in App, useChat, ChatTab, etc. keep working.

src/api/index.ts: 895 → 622 LOC. Cumulative Phase 2 split: api.ts
1430 → 622 in index, with 5 domain modules totaling 1116 LOC.

Tests: 340 pass; tsc clean.
13 functions + 3 type aliases covering search (catalog + HF hub
passthrough), load / unload of the active inference engine, the text-
model download cluster (DownloadStatus + DeleteDownloadResult +
downloadModel + getDownloadStatus + cancelDownload + deleteModelDownload),
conversion, benchmark runs, and library admin (revealModelPath +
deleteModelPath + listHubFiles) now live in ``src/api/models.ts``
(144 LOC).

The ``DownloadStatus`` + ``DeleteDownloadResult`` types are re-exported
through ``./index`` so the image + video domains' ``./index`` imports
continue resolving without churn.

Re-exported from ``./index`` so existing
``import { loadModel, downloadModel } from "../api"`` paths in
useChat / useDownloads / MyModelsTab / OnlineModelsTab / SearchPanel
keep working.

src/api/index.ts: 622 → 559 LOC. Cumulative Phase 2 split: api.ts
1430 → 559 in index across 6 domain modules totaling 1260 LOC. The
remaining ~559 LOC in index.ts is now: imports, module state cache
(api base + token + tauri info), ``apiFetch`` / ``fetchJson`` /
``postJson`` / ``patchJson`` / ``deleteJson`` / ``readErrorDetail``
shared helpers, ``getWorkspace`` + ``checkBackend`` + ``getGpuStatus``
+ ``getSettings`` + ``updateSettings`` + ``getCachePreview`` +
``shutdownServer`` + ``stopManagedBackend`` + ``restartManagedBackend``
+ ``openHtmlChallengeFile``, plus the domain re-export aggregator at
the bottom.

Tests: 340 pass (29 files); tsc clean.
Phase 0 + 1a-1 + 1a-2 + 1b-1 through 1b-5 + 1c-1 through 1c-6 + 1d
(partial) + 2-1 through 2-6 marked DONE / PARTIAL with commit hashes
and remaining work scoped. Adds the cumulative -4,453 LOC mega-file
shrink table so the next session can see where the wins landed.
First splits in the Rust shell — lib.rs at 1808 LOC mixed six
concerns (constants + types + commands, BackendManager impl, embedded
runtime extraction, binary resolution, settings + port selection,
HTTP probe, lease + orphan cleanup, Windows Job Object management,
signal handling + entry).

Two cleanly self-contained mods extracted first:

- src-tauri/src/windows_job.rs (95 LOC) — Job Object kill-on-close
  for orphan prevention. Already an inline mod; lifted to its own file
  with no logic change. Previously cfg(windows) was applied at the
  inline mod block; now sits as #![cfg(windows)] in the module.
- src-tauri/src/orphans.rs (171 LOC) — three platform variants of
  cleanup_orphaned_backends (Unix ps walk, Windows wmic + tasklist,
  fallback no-op) plus the Unix-only terminate_process_group +
  ORPHAN_COMMAND_MARKERS constant. Self-contained — only stdlib +
  libc deps.

lib.rs imports both via mod declarations + a use clause for
cleanup_orphaned_backends so the existing call site at line 236 stays
unchanged.

src-tauri/src/lib.rs: 1808 → 1578 LOC (-230). cargo check clean on
darwin (Windows code paths verified by cfg gate consistency).
src-tauri/src/settings.rs (129 LOC) — desktop settings JSON reader +
backend port selection. Owns:

- SavedDesktopSettings struct (preferred_server_port, allow_remote_connections,
  hf_cache_path; auto_start_server stays inside the Python sidecar)
- settings_path() — platform-aware ~/.chaosengine/settings.json (Unix)
  / %APPDATA%\.chaosengine\settings.json (Windows)
- saved_backend_port / saved_allow_remote_connections / saved_hf_cache_path —
  three readers with the ~ expansion logic for HF cache pinning
- selected_bind_host ("127.0.0.1" vs "0.0.0.0") + select_backend_port
  (preferred + OS-fallback with warning string)

lib.rs imports the four functions it actually calls (saved_*, select_backend_port,
selected_bind_host) via a single use statement; no other call-site
churn. Fixes a leftover unused-import warning on TcpListener since
the only consumer (select_backend_port) moved out.

src-tauri/src/lib.rs: 1578 → 1475 LOC. Cumulative Phase 3 split:
1808 → 1475 in lib.rs (-333 LOC) across 3 focused modules totaling
395 LOC.
src-tauri/src/probe.rs (118 LOC) — backend HTTP probe + lifecycle
helpers. Owns:

- ExistingBackendProbe struct (workspace_root + python_executable
  parsed from /api/health response)
- port_responding / wait_for_port — TCP-only liveness checks. The
  wait_for_port signature now takes the poll interval as a parameter
  rather than referencing the lib.rs-private BACKEND_POLL_INTERVAL,
  which keeps the probe module a pure leaf with no module-back-reference.
- backend_http_json — minimal hand-crafted HTTP/1.1 GET/POST that
  parses JSON. Avoids pulling reqwest into the Tauri shell just for
  five endpoints.
- probe_chaosengine_backend — /api/health → ExistingBackendProbe.
- fetch_backend_api_token — /api/auth/session → token string.
- request_backend_shutdown — POST /api/server/shutdown + 3-second
  port-free wait.

lib.rs imports the six call sites use. The lone caller of wait_for_port
in BackendManager::bootstrap now passes BACKEND_POLL_INTERVAL explicitly.
Three unused stdlib imports cleaned up (Read, Write, net::TcpStream
were only consumed by backend_http_json which moved out).

src-tauri/src/lib.rs: 1475 → 1392 LOC. Cumulative Phase 3 split:
1808 → 1392 in lib.rs (-416 LOC) across 4 focused modules totaling
513 LOC (windows_job 95 + orphans 171 + settings 129 + probe 118).
src-tauri/src/lease.rs (93 LOC) — managed-backend lease persistence
+ stale-cleanup. Owns:

- ManagedBackendLease struct ((pid, port) written to
  app-data/managed-backend.json on each Tauri spawn)
- managed_backend_lease_path / write_managed_backend_lease /
  read_managed_backend_lease / clear_managed_backend_lease — JSON
  persistence trio.
- cleanup_stale_managed_backend — invoked at app launch. Reads the
  prior lease, probes the recorded port, asks the previous backend
  to shut down if it's still alive (with the workspace-root match
  guard against killing an unrelated ChaosEngineAI on a different
  workspace).

Imports probe::{fetch_backend_api_token, probe_chaosengine_backend,
request_backend_shutdown} — the lease module is the most cross-cutting
of the Phase 3 splits because cleanup_stale_managed_backend reaches
into both the HTTP probe + the desktop settings layer.

src-tauri/src/lib.rs: 1392 → 1335 LOC. Cumulative Phase 3 split:
1808 → 1335 in lib.rs (-473 LOC) across 5 focused modules totaling
606 LOC (windows_job 95 + orphans 171 + settings 129 + probe 118 +
lease 93).
Two new files mirror the .sh shape so Windows users have one-shot
update commands matching the macOS/Linux flow:

- scripts/update-llama-turbo.ps1 (76 LOC) — companion to
  build-llama-turbo.ps1. Reads the prior version file from
  ~/.chaosengine/bin/llama-server-turbo.version, fetches latest commit
  on origin/feature/turboquant-kv-cache, fast-exits if unchanged,
  otherwise delegates the rebuild to build-llama-turbo.ps1.
- scripts/update-sdcpp.ps1 (66 LOC) — same shape for the
  leejet/stable-diffusion.cpp tracked branch. Delegates to
  build-sdcpp.ps1 when the commit has moved.

Both scripts use the existing build-*.ps1 sibling for the heavy
lifting so the MSVC/CUDA toolchain plumbing
(Resolve-CmakeWindowsBuildContext, Sync-CudaVsIntegration,
Get-CmakeWindowsConfigureArgs, Invoke-CmakeStaleCacheWipe in
scripts/lib/windows-msvc-cuda.ps1) stays in one place.

REFACTOR_PLAN.md Phase 4 markers updated:
- 4-1 marked DONE with this commit
- 4-3 (de-dupe build/update pairs) DROPPED — build scripts already
  handle clone-or-fetch; update scripts add version-file fast-exit;
  consolidating would obscure intent.
- 4-4 (rename update-llama-cpp.sh → check-llama-cpp.sh) DROPPED —
  audit was wrong, the script does rebuild llama-server.
- 4-2 (pre-build-check.sh → .mjs) deferred to dedicated session.
Phase 4-1 (PowerShell update-* ports) shipped in 861de0a. Two
follow-ons dropped on second look:
- 4-3 (build/update consolidation): two narrow scripts read clearer
  than one with subcommand router; build clones-or-fetches and update
  adds version-file fast-exit, semantics differ enough to keep apart.
- 4-4 (rename update-llama-cpp.sh): original audit was wrong, the
  script actually rebuilds llama-server, name matches behavior.

4-2 (pre-build-check Node port) and 4-5 (CI Windows promotion) stay
deferred — they belong together since the cross-platform gate is the
prerequisite for promoting the Windows runner.
…romotion

scripts/pre-build-check.mjs (252 LOC) — Node port of the bash gate.
Same 7 checks, runs identically on macOS / Linux / Windows by
leaning on Node's stdlib + spawnSync. Wired as npm run pre-build-check.

Cross-platform plumbing the Node port adds vs. the .sh:
- venvPython() probes both .venv/bin/python and .venv/Scripts/python.exe
- npmCommand() / npxCommand() return .cmd suffixes on Windows
- Binary availability check uses 'where' (Windows) or 'which' (Unix)
  with file-existence fallback for Homebrew / .chaosengine/bin paths
- File path joins go through path.join so Windows backslashes / Unix
  forward slashes resolve correctly
- Cache strategy probe runs the same Python via the resolved venv,
  no platform branching needed since the probe is pure Python

Live smoke on the dev box: 8 PASS / 0 FAIL / 1 WARN — the warn is
the turbo binary being one commit behind upstream (expected; not a
real failure).

CI Windows runner promoted from advisory to required. Phase 0 set
up the matrix; Phase 4-1 + 4-2 closed the remaining cross-OS gaps
(PowerShell update-* + Node pre-build-check) so a Windows-only
regression now blocks the PR the same way a macOS / Ubuntu failure
does.

REFACTOR_PLAN.md Phase 4-2 + 4-5 marked DONE.
…/turbo.py

backend_service/routes/setup/turbo.py (99 LOC) — owns the
turbo-update-check endpoint plus its supporting state:

- _CHAOSENGINE_BIN_DIR / _TURBO_REPO / _TURBO_BRANCH /
  _TURBO_VERSION_FILE constants
- _turbo_remote_cache + _TURBO_REMOTE_CACHE_TTL (1-hour rate-limit
  guard for the unauthenticated git ls-remote against GitHub)
- _read_turbo_version (parse 3-line version file: commit / branch /
  build_date)
- _fetch_turbo_remote_head (cached git ls-remote of the upstream
  branch HEAD)
- turbo_update_check endpoint (the only public surface)

Self-contained: deps are stdlib (subprocess, time, pathlib) + FastAPI
APIRouter. No shared helpers from setup/__init__.py.

Re-exported via the existing sub-router include pattern in
setup/__init__.py — the new turbo router slots in alongside the
longlive + wan_install routers.

Test patches updated:
- 6 mock.patch calls for _TURBO_VERSION_FILE / _CHAOSENGINE_BIN_DIR /
  _fetch_turbo_remote_head retargeted to the new
  backend_service.routes.setup.turbo namespace.

setup/__init__.py: 1441 → 1367 LOC. Tests: 1302 pass + 1 skip.
cryptopoly added 29 commits May 10, 2026 09:40
…tatus

Pulls workspace + server_status payload renderers out of
ChaosEngineState into a sibling module that takes the state instance
as the first argument. Class methods become 1-line wrappers.

The workspace() builder is the heavier of the two — composes system
snapshot + library scan + recommendation + featured models + runtime
status + benchmark history + log/activity tails + cache-preview
math. The per-process annotation pass that joins runningLlmProcesses
against the runtime's active + warm engines (so the UI can show
which model each PID belongs to) lives there too.

Updates two test mock paths that targeted
``backend_service.state._describe_process`` to point at the new
``backend_service.state.payloads._describe_process`` import path.

state/__init__.py: 3315 → 3083 LOC (-232 this commit, -1004 vs
v0.8.0 baseline 4087).
Pulls _settings_payload + update_settings out of ChaosEngineState
into a sibling module that takes the state instance as the first arg.

settings_payload renders the user-visible settings shape with per-
directory model counts, masked API keys / HF token, resolved data +
output directory paths.

update_settings applies a settings patch: normalises model directories,
validates output-path overrides (absolute or ~-relative — never bare
relative), migrates the data directory when changed, persists remote-
provider entries while preserving existing API keys when only metadata
changed, refreshes the library cache, and returns the
settings/restartRequired/migrationSummary envelope routes hand back
to the UI.

state/__init__.py: 3098 → 2872 LOC (-226 this commit, -1215 vs
v0.8.0 baseline 4087, -27%).
Bump package.json / package-lock.json / pyproject.toml /
src-tauri/Cargo.toml / src-tauri/Cargo.lock / src-tauri/tauri.conf.json
from 0.7.6 → 0.8.0.

CHANGELOG.md gets the v0.8.0 release entry covering:

- The MLX HTML-challenge dual-worker memory leak fix
  (JsonRpcProcess.close TimeoutExpired swallow path).
- Backend module shrinks: state/__init__.py (-35%),
  inference/__init__.py (-67%), mlx_worker.py (-42%),
  image_runtime/__init__.py (-49%), video_runtime/__init__.py (-43%),
  routes/setup (-82%), helpers/ regrouping.
- Frontend module shrinks: api.ts (-100% via 6 domain modules),
  types.ts (-83%), useChat / useImageState / useVideoState / App.tsx /
  HtmlChallengeTab.tsx decomposition into features/ subpackages.
- Performance gate: scripts/perf-gate.py + .github/workflows/perf-gate.yml
  comparator workflow.
- Cross-OS parity: PowerShell ports + cross-platform pre-build-check.
- CLAUDE.md Code Quality Guidelines section (perf, security,
  modularisation file-size soft caps).

1,302 Python tests + 340 TypeScript tests pass, mypy + tsc clean.
Pulls 13 helpers covering the full session-and-message lifecycle out
of ChaosEngineState into a sibling module:

- default_session_model — populate runtime profile from loaded model,
  falling back to library entry, then catalog default chat variant.
- promote_session / persist_sessions — recency reorder + disk write.
- unique_session_title / auto_session_title /
  normalize_auto_generated_session_titles — title disambiguation.
- ensure_session / create_session — fetch-or-create + public CRUD.
- add_message_variant (Phase 2.5) — sibling variant against a
  different already-loaded model.
- delve_message (Phase 3.6) — critique-mode re-process attached as
  a "Delve critique" variant.
- fork_session (Phase 2.4) — branch a thread at a specific message,
  carrying the parent's runtime profile + parentSessionId linkage.
- update_session — apply title/runtime patch.
- delete_session — remove from list + persist.

All take the state instance as the first argument so the class
methods stay 1-3 line wrappers. Public API surface unchanged.

state/__init__.py: 2873 → 2393 LOC (-480 this commit, -1694 vs
v0.8.0 baseline 4087, -41%).
Pulls the full Hugging Face download flow out of ChaosEngineState
into a sibling module that takes the state instance as the first
argument:

- start_download — preflight repo size, install tracking entry,
  spawn background snapshot_download worker + sibling progress
  thread polling _hf_repo_downloaded_bytes for the UI bar.
- download_status — list current entries (UI poll).
- cancel_download — flip cancel flag, terminate live process,
  refresh byte count, mark cancelled.
- delete_download — cancel + unload from runtimes + rmtree the
  repo cache dir + clear bookkeeping.
- loaded_model_matches_repo_cache + unload_repo_from_runtimes —
  the predicate + driver used to evict active or warm models from
  runtimes (LLM, image, video) before the cache rmtree.

The inner ``_download_worker`` thread now closes over (state, repo,
allow_patterns, download_token, validation_error_fn) instead of self.

Class methods stay 1-3 line wrappers. Test mock paths retargeted from
``backend_service.state.{threading,_spawn_snapshot_download,subprocess}``
to ``backend_service.state.downloads.*`` so the patches still
intercept correctly.

state/__init__.py: 2873 → 2032 LOC (-841 this commit, -2055 vs
v0.8.0 baseline 4087, -50%).
Moves the full ~1050 LOC RuntimeController class out of
inference/__init__.py into a dedicated module. Re-exported from
``backend_service.inference`` so existing call sites
(``from backend_service.inference import RuntimeController``) keep
working without churn.

The class owns the cross-engine LLM orchestration: warm pool of
inference engines, capability probes, orphaned-subprocess tracking,
model load/unload/convert flows, generate / stream_generate dispatch,
and the status payload routes hand back to the UI. Methods are
unchanged — pure file move with the imports duplicated up to the
new module.

inference/__init__.py: 1180 → 97 LOC (-92%). The package's __init__
is now just the public re-export surface (RuntimeController + all the
engine classes + helpers from the existing engines/binaries/
capabilities/conversion submodules). CLAUDE.md directory map
refreshed to point at the new layout.
Pulls generate + generate_stream out of ChaosEngineState into a
sibling module that takes the state instance as the first argument.

generate (~258 LOC) — synchronous chat completion. Resolves the
effective runtime profile (request → session → launch_preferences
cascade), reloads the model when the profile changed, runs RAG
retrieval, invokes either the agent loop (when enableTools) or
plain runtime.generate, returns session + assistant message +
runtime status payload.

generate_stream (~576 LOC) — SSE streaming version. Same profile
cascade + RAG injection, then yields phase / token / reasoning /
toolCallStart / toolCallResult / done events. Five guards (memory
pre-flight, output-length runaway, repetition / loop, tok/s floor,
in-stream panic + thermal) abort or surface warnings without
wedging the UI.

Class methods stay 1-line wrappers.

state/__init__.py: 2032 → 1205 LOC (-827 this commit, -2882 vs
v0.8.0 baseline 4087, -71%).
…convert

Pulls 5 helpers covering model lifecycle out of ChaosEngineState into
a sibling module that takes the state instance as the first argument:
load_model, unload_model, convert_model, reveal_model_path,
delete_model_path. Class methods stay 1-line wrappers.

state/__init__.py: 1205 → 860 LOC (-345 this commit, -3227 vs
v0.8.0 baseline 4087, -79%).
Pulls 6 path resolvers (resolve_llama_server, resolve_llama_server_turbo,
resolve_llama_cli, resolve_sd_cpp, resolve_candidate, find_in_path) out
of lib.rs into a dedicated module. Each honours an env-var override
first, falls back to ~/.chaosengine/bin/<name> for managed installs,
then walks PATH (with .exe suffix on Windows).

lib.rs: 1335 → 1249 LOC. cargo check clean.
Pulls performUnloadModel + performDeleteModel out of App.tsx into
features/app/modelActions.ts. Each takes its dependencies (setBusyAction,
setError, setWorkspace, refreshWorkspace, activeChatId) as a typed deps
object so the App keeps thin wrappers that close over the live setters.

handleUnloadModel + handleDeleteModel are now 8-line wrappers down from
13 + 16 LOC inline. handleUnloadWarmModel still routes through
handleUnloadModel.

Drops deleteModelPath + unloadModel + syncRuntime imports from App.tsx
(now consumed inside modelActions.ts). App.tsx: 2170 → 2153 LOC.
Pulls 3 env-var + path-list helpers (apply_library_path, join_paths,
prepend_env_paths) out of lib.rs. Each is a pure utility used by
apply_embedded_runtime_env to thread the embedded Python runtime's
LD_LIBRARY_PATH / DYLD_LIBRARY_PATH / PATH / PYTHONPATH onto the
sidecar Command before spawn.

Drops the now-unused `ffi::OsString` import from lib.rs (was only
needed by these three functions).

lib.rs: 1249 → 1224 LOC. cargo check clean.
Pulls the embedded Python runtime extraction + env-application stack
out of lib.rs into a dedicated module: EmbeddedRuntimeManifest +
EmbeddedRuntime structs + 20 helper fns covering manifest fingerprint,
tar extraction, extras-dir ABI namespacing, and cmd env application.

lib.rs: 1224 → 735 LOC (-489, -40%). cargo check clean.
Pulls the full ~400 LOC BackendManager impl out of lib.rs into a
dedicated module. The struct itself stays in lib.rs (it's the type
the Tauri State manages); the impl with bootstrap / shutdown /
runtime_info methods moves to backend.rs.

Methods promoted to pub(crate) so the lib.rs Tauri command stubs
(stop_backend_sidecar, restart_backend_sidecar) can call them across
the module boundary.

lib.rs trims 50+ lines of imports that the impl was pulling in
(probe::*, lease::*, runtime::*, settings::*, binaries::* are all
private to the impl now). Three crate-wide constants
(DEFAULT_BACKEND_PORT, BACKEND_START_TIMEOUT, BACKEND_POLL_INTERVAL)
promoted to pub(crate) for backend.rs.

lib.rs: 735 → 333 LOC (-402, -55%). cargo check clean.
Pulls loadPayloadFromVariant + threadPatchFromVariant out of App.tsx
into features/app/variantPayloads.ts. Both were already pure given
chatLibrary + launchSettings + launchCacheLabel + sanitizeSpeculativeForModel
deps. Extracting tightens the App composition root and makes the
variant flow unit-testable in isolation.

The unused loadPayloadFromVariant() inline definition (no callers in
App.tsx) is dropped. threadPatchFromVariant gets a thin local wrapper
that closes over the deps so the existing call sites
(handleApplyVariantToActiveThread, handleStartThreadWithVariant) stay
readable.

App.tsx: 2153 → 2116 LOC.
Pulls 4 conversion-flow handlers (prepareCatalogConversion,
prepareLibraryConversion, performConvertModel, pickConversionOutputDir)
out of App.tsx into features/app/conversionActions.ts. Each takes its
dependencies as a typed deps object so the App keeps thin wrappers
that close over the live setters.

App.tsx: 2107 → 2080 LOC.
…on_device

Pulls two more stateless dtype/device helpers out of
DiffusersTextToImageEngine into transformer_loaders.py:

- preferred_torch_dtype — picks bfloat16 for FLUX on CUDA, fp16
  elsewhere on CUDA; on MPS picks fp16 with FU-017 SDXL VAE fix when
  available, fp32 fallback otherwise.
- preferred_execution_device — overrides MPS → CPU for Qwen-Image to
  avoid the silent black-output bug.

Class methods stay 1-line wrappers.

image_runtime/__init__.py: 1069 → 1043 LOC.
Pulls four stateless pipeline helpers out of DiffusersVideoEngine
into a sibling module:

- make_step_callback — per-step diffusers callback wiring progress
  reporting, cooperative cancel, Phase E2 CFG decay, FU-018 part 2
  preview VAE thumbnails.
- invoke_pipeline — run a diffusers video pipeline with the older-
  diffusers fallback for callback_on_step_end / negative_prompt.
- encode_frames_to_mp4 — write PIL frames to mp4 bytes via diffusers
  export_to_video helper or imageio fallback.
- pipeline_class_for_repo — look up the diffusers class via
  PIPELINE_REGISTRY.

Class methods stay 1-line wrappers.

video_runtime/__init__.py: 1357 → 1224 LOC.
… swap_scheduler + build_pipeline_kwargs

Pulls three more pipeline-shaping helpers out of DiffusersVideoEngine:

- finalize_config — apply per-model defaults + frame alignment +
  scheduler resolution + Phase E1 prompt enhancement + Phase E2 CFG
  decay note. Returns frozen config + per-run notes.
- swap_scheduler — replace pipeline.scheduler with the named class
  via diffusers from_config.
- build_pipeline_kwargs — build the dict of kwargs the diffusers
  pipeline accepts; per-pipeline shaping (LTX-Video frame_rate +
  decode params, HunyuanVideo negative-prompt rejection).

Class methods stay 1-line wrappers. build_pipeline_kwargs takes
self._pipeline as an arg so the helper stays state-free.

video_runtime/__init__.py: 1177 → 1029 LOC.
Pulls three stateless helpers out of DiffusersTextToImageEngine into
a sibling module:

- diffuse_message — short status line for VIDEO_PROGRESS publisher.
- format_run_label — gallery card label combining model name + res.
- build_pipeline_kwargs — dict of kwargs the diffusers pipeline
  accepts; per-pipeline shaping (FU-020 AYS timesteps stash, Qwen-
  Image true_cfg_scale).

Class methods stay 1-line wrappers. build_pipeline_kwargs takes
self._pipeline as an arg so the helper stays state-free.

image_runtime/__init__.py: 1043 → 1010 LOC.
Pulls the runtime status callout (~265 LOC of dense JSX) out of
VideoStudioTab.tsx into a focused component covering:

- CUDA torch install banner + restart-required state
- Runtime status chip row (engine, device, loaded model, missing
  deps, LongLive / mlx-video state badges)
- Conditional install action panels: LongLive, mlx-video, mp4
  encoder, missing tokenizer deps, GPU bundle.

VideoStudioTab passes the 33 needed props through directly. The
component is presentational — no hooks, no internal state. tsc clean,
340 ts pass, dev server boots without React errors. Visual
verification deferred until backend cold-start finishes (>2 min idle
on cold cache); the JSX is a mechanical move with prop wiring
mirroring the inline form exactly.

VideoStudioTab.tsx: 1712 → 1479 LOC. Drops 4 imports
(InstallLogPanel, CudaTorchLogPanel, WanRuntimeInstaller,
CudaTorchInstallResult — all consumed inside the new component).
Pulls the runtime status callout (~205 LOC) out of
ImageStudioTab.tsx into a focused component covering:
CUDA torch banner, runtime status chip row, model preload/unload
control row, and the GPU runtime install action stack.

ImageStudioTab passes the 19 needed props through directly. The
component is presentational. tsc clean, 340 ts pass. Drops 3
imports (InstallLogPanel, CudaTorchLogPanel, CudaTorchInstallResult)
that are now consumed inside the new component.

ImageStudioTab.tsx: 1178 → 1014 LOC.
…e + generate

Pulls the remaining ~880 LOC of WorkerState method bodies into three
sibling modules. Each free function takes the WorkerState instance
as the first argument; class methods become 1-3 line wrappers
preserving the public surface.

mlx_worker_lifecycle.py — load_model, unload_model, update_profile,
apply_cache_profile, apply_triattention_mlx_compressor.

mlx_worker_speculative.py — generate_dflash + generate_ddtree.

mlx_worker_generate.py — generate, generate_standard, stream_generate.

One test mock path retargeted: ``backend_service.mlx_worker._emit``
→ ``backend_service.mlx_worker_generate._emit`` so the patch
intercepts calls inside the new module.

mlx_worker.py: 1227 → 318 LOC (-909, -74%). 1f phase complete:
2115 → 318 LOC (-85%) across 1f-1..1f-13 commits. 1302 py tests pass.
@cryptopoly cryptopoly merged commit 45d72a6 into staging May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant