release: development → main — Phase 22 wizard modernisation + site honesty pass-5#42
Conversation
Closes the parity gap between `forgelm --wizard` (CLI) and the
in-browser `site/js/wizard.js` documented in
`docs/analysis/code_reviews/2026-05-07_cli_wizard_ux_analysis.md`
+ `-2.md`. 20 original findings (G1-G20) and 5 independent
findings (I1-I5) were verified against current code; this PR
addresses the actionable subset.
What's covered (CLI):
- **Step reorder (I1):** `compliance.risk_classification` is now
collected BEFORE `evaluation.safety` so the F-compliance-110
strict-tier auto-coercion can fire reliably. The v0.5.5 wizard
asked safety first, then risk — silently producing high-risk
configs that failed to load.
- **Trainer-specific hyperparameters (G1):** `dpo_beta`,
`simpo_beta` / `simpo_gamma`, `kto_beta`, `orpo_beta`,
`grpo_num_generations`, `grpo_max_completion_length`,
`grpo_reward_model` are now surfaced per `trainer_type`. SFT
short-circuits.
- **PEFT method breadth (G2):** the strategy step now offers all
four schema-supported `lora.method` values (`lora`, `dora`,
`pissa`, `rslora`) plus GaLore as a separate axis (6 cards
total).
- **GaLore optimiser variants (G9):** all six `galore_optim`
Literal values surfaced, including the three `_layerwise`
siblings.
- **RoPE scaling types (G10):** `longrope` joined `linear` /
`dynamic` / `yarn` (4-of-4 schema coverage).
- **LoRA `r` schema parity (G11):** `DEFAULT_LORA_R` lowered from
`16` → `8` to match `LoraConfigModel.r`.
- **Compliance depth (G5):** `compliance` (Article 11 + Annex IV
§1) plus optional `risk_assessment` (Article 9),
`data.governance` (Article 10), `retention` (GDPR Article
5(1)(e) + 17), `monitoring` (Article 12+17),
`evaluation.benchmark`, `evaluation.llm_judge`, `synthetic`
blocks now configurable.
- **High-risk auto-coercion (G8):** `risk_classification ∈
{high-risk, unacceptable}` automatically enables
`evaluation.safety` + `evaluation.require_human_approval` with a
visible operator notice. Front-stops the schema-side
`F-compliance-110` `ConfigError`.
- **Webhook URL parsing (G15):** single prompt accepts a literal
URL or `env:VAR_NAME` reference; `urlparse`-validated; HTTPS
recommended (HTTP accepted with warning).
- **Safety probe path (G16):** bundled probe set resolved through
`importlib.resources.files("forgelm.safety_prompts")`, fixing
the `pip install` regression where
`configs/safety_prompts/general_safety.jsonl` was the wizard
default but never shipped in the wheel.
- **Configuration summary (G17):** `_print_wizard_summary` now
dumps the full YAML alongside the labelled headline.
- **Persistence (G6):** state snapshot at
`$XDG_CACHE_HOME/forgelm/wizard_state.yaml` (or
`~/.cache/forgelm/wizard_state.yaml`) saved after every
completed step. Schema versioned (`v: 1`); cleared on
successful completion.
- **Step-diff preview (G7):** each completed step prints `+
key.path: value` / `~ key.path: before → after`.
- **Beginner / expert toggle (G13):** beginner mode prefixes each
step with a 2-3-line tutorial paragraph.
- **Use-case integration (G12):** the curated quickstart-template
list is also offered as Step 2 of the full flow, seeding
defaults for later steps.
- **`back` / `reset` navigation (G3):** `back` / `b` returns to
the previous step; `reset` / `r` clears in-memory state.
- **`POPULAR_MODELS` alignment with web (G14):** CLI list now
matches `site/js/wizard.js`'s preset cards.
What's covered (web):
- **Header comment (I2):** `site/js/wizard.js` opening doc
block updated from "7 steps" to "9 steps" with correct list.
- **Use-case key alignment (I4):** `USE_CASE_PRESETS` keys
renamed (`code-copilot` → `code-assistant`, `medical-tr` →
`medical-qa-tr`) so `forgelm/quickstart.py::TEMPLATES` is the
single source of truth.
What's covered (docs):
- **Design doc (I5):** `docs/design/wizard_mode.md` rewritten
to describe the actual 9-step flow, sub-package layout, and
Phase 22 modernisation status banner.
- **Architecture reference:** `docs/reference/architecture.md`
+ `-tr.md` updated for the `forgelm/wizard/` sub-package.
- **Quickstart guide:** `docs/guides/quickstart.md` + `-tr.md`
updated with the new flow description.
- **CHANGELOG.md:** comprehensive Phase 22 entry under
`[Unreleased]` / `### Added`.
- **Roadmap link fix:** `docs/roadmap/phase-11-5-backlog.md`
now points at `forgelm/wizard/_byod.py` rather than the old
monolith.
Architecture: `forgelm/wizard.py` (976 LOC monolith) split into
the `forgelm/wizard/` sub-package per
`docs/standards/architecture.md`'s 1000-line ceiling rule:
- `_io.py` (213 LOC) — primitive prompts + navigation tokens +
hardware detection.
- `_state.py` (290 LOC) — `_WizardState`, persistence, summary,
schema-default constants.
- `_collectors.py` (688 LOC) — every `_collect_*` helper +
strategy choice + use-case preset registry.
- `_byod.py` (333 LOC) — Phase 11.5 / 12.5 BYOD inline ingest +
audit helpers + quickstart prelude.
- `_orchestrator.py` (566 LOC) — step definitions,
`_drive_wizard_steps`, `_apply_strict_tier_coercion`,
`run_wizard`.
- `__init__.py` (176 LOC) — re-export shim for backward
compatibility.
Tests: 55 new tests in `tests/test_wizard_phase22.py` cover
`_parse_webhook_value`, `_default_safety_probes_path`,
`_collect_trainer_hyperparameters`, `_check_navigation_token`,
`_apply_strict_tier_coercion`, persistence helpers,
`_print_step_diff`, `_collect_webhook_config`, use-case preset
registry parity with `quickstart.py::TEMPLATES`, strategy choice
shape, GaLore optimiser variants, and schema-default parity for
LoRA r / alpha / epochs / batch / LR. Existing
`tests/test_wizard_phase11.py` test_print_wizard_summary cases
updated to use the new dict-based API.
Validation: 8/8 doc CI guards green (bilingual_parity,
anchor_resolution, cli_help_consistency, yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks); ruff check + format clean across the
package; module-size guard reports 0 over the fail-threshold;
`forgelm --config config_template.yaml --dry-run` passes;
strict-tier auto-coercion produces a config that
`ForgeConfig.model_validate` accepts.
i18n decision: CLI wizard remains English-only per operator
explicit instruction. Web wizard's 6-language `wizard.*`
catalogue (en/tr/de/fr/es/zh) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses inline review comments + verified user findings on the
Phase 22 wizard modernisation PR. Critical / major / minor mix:
Critical (2):
- _io.py: torch.cuda.get_device_properties().total_mem → total_memory
(the v0.5.5 bug raised AttributeError on real CUDA hardware).
- _orchestrator.py: WizardReset re-loops with a fresh _WizardState
instead of returning early; returning let _run_full_wizard treat
the reset as a completed run and try to save an empty config.
Major (6):
- _orchestrator.py: WizardBack now restores the prev_config snapshot
so a half-completed step's mutations don't leak into the previous
step's prompts.
- _orchestrator.py: deep copy now uses copy.deepcopy instead of
json.loads(json.dumps(...)) — the JSON path was lossy for tuples
and sets that may sneak in via collector helpers.
- _orchestrator.py: _step_model preserves a use-case preselect that
isn't in POPULAR_MODELS (defaults to the Custom slot pre-filled
with the operator's earlier choice).
- _byod.py: _prompt_dataset_path_with_ingest_offer validates that
the typed value is a directory, JSONL/JSON file, or HF Hub ID;
bare typos no longer silently become data.dataset_name_or_path.
- _collectors.py: Article 9 strict-tier branch enforces non-empty
intended_use / foreseeable_misuse / mitigation_measures.
- _collectors.py: Article 10 strict-tier branch enforces non-empty
collection_method / annotation_process / known_biases.
Medium (4):
- _collectors.py: notify_on_start default flips to False (web wizard
parity — start pings are noisy).
- _collectors.py: _collect_compliance_metadata collects
risk_classification first so strict-tier hint shows up before
Article 11 fields and downstream Article 9/10 branches see the
correct context.
- _state.py: _save_config_to_file uses yaml.safe_dump + UTF-8 +
allow_unicode (yaml.dump could emit !!python/object tags that
ForgeConfig then rejects). Same treatment for the YAML preview.
- _state.py: chmod 0o600 on wizard_state.yaml (best-effort) —
compliance metadata can be sensitive.
- _io.py: _prompt_choice guards negative idx and out-of-range int
alongside the existing IndexError fallback.
Minor (3):
- _byod.py: shlex.quote() the path in BYOD shell-hint output so
paths with spaces survive copy-paste into a real shell.
- docs/design/wizard_mode.md: MD030 — single space after ordered
list markers (was double).
- docs/reference/architecture{,-tr}.md: heading wizard.py → wizard/
(the module became a sub-package in Phase 22).
Coverage (HIGH):
- tests/test_wizard_phase22.py adds end-to-end coverage for
_drive_wizard_steps (back-restores-config, reset-re-loops,
back-at-first-step is no-op, persist-after-each-step) plus
_maybe_resume_state (resume-from-snapshot, decline-clears) and
_step_welcome (backend-hint + beginner branch). Orchestrator
module coverage rises from 14% → 30% on this file alone.
All checks green:
- ruff check + format clean
- 1544 passed, 14 skipped (full suite)
- 8 doc guards: bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks
- module_size: no new warnings
- forgelm --config config_template.yaml --dry-run
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHANGELOG entry for the c0dd57a fix sweep — Critical / Major / Medium / Minor classifications + the orchestrator coverage bump that landed alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 22 review-cycle 2: addresses 17 newly-discovered parity / UX gaps surfaced during the post-PR-#40 audit (P1-P18 + B1-B5 + C1-C2 + D1-D2 + E1+E4). All five HIGH-severity findings closed; both wizards (CLI sub-package + site/js/wizard.js) tightened in lockstep. ## Critical bug fixes - B-NEW-1: ``_print_step_diff`` strips internal ``_wizard_meta.*`` keys before showing the operator the per-step diff (was leaking ``_wizard_meta.use_galore`` from the strategy step). - B-NEW-2: ``_step_compliance`` skips the Article 10 governance re-prompt when step 6 already populated it under non-strict tier (was silently overwriting earlier answers). - B-NEW-3: ``_save_wizard_state`` now writes via ``tempfile`` + ``fsync`` + ``os.replace`` so a SIGKILL / power loss / concurrent wizard process can never leave a half-written state file. - B5: strict-tier coercion fires at the end of ``_step_compliance`` too (not just ``_step_evaluation``), so a Ctrl-C between steps 8 and 9 still produces a loadable YAML if the operator manually saves. - A1: ``_parse_webhook_value`` SSRF preflight — rejects loopback / RFC1918 / link-local destinations up front by reusing ``forgelm._http._is_private_destination``. ## UX guardrails - B1/E1: ``_validate_generated_config`` runs ``ForgeConfig.model_validate`` on the saved YAML before exit. Schema rejections surface inline. - B2: ``_prompt_unique_filename`` + ``_next_free_filename`` — overwrite confirmation with auto-suffix (no more silent clobbers). - B3: ``run_wizard_full`` refuses to launch on non-tty stdin and points operators at ``forgelm quickstart <template>`` for scripted runs (closes the silent-empty-config CI hole). - E4: ``_print_preflight_checklist`` — GPU/VRAM/dataset/risk-tier signals before the configuration summary. ## CLI ↔ web parity - P1+P18: safety-block field union — CLI now emits ``classifier`` + ``max_safety_regression`` (and ``min_classifier_confidence`` under confidence_weighted scoring); web wizard now emits ``scoring`` + ``track_categories`` + ``severity_thresholds``. Both surfaces produce structurally equivalent ``evaluation.safety`` blocks. - P2: judge ``min_score`` default 6.5 → 5.0 (schema parity). - P3: web wizard surfaces ``model.max_length`` as a number input on the training step (was silently capped at 2048). - P9: monitoring collector accepts ``env:VAR_NAME`` like webhook. - P13: HF Hub format hint when custom model name isn't ``<org>/<name>``. - P14: ``auto_revert.max_acceptable_loss`` default 2.0 emission. - P16: CLI emits ``bnb_4bit_quant_type`` + ``bnb_4bit_compute_dtype`` when QLoRA is selected (web parity). - C1/I3: web ``learningRate: '2e-5'``, ``batchSize: 4`` (schema parity); ``2e-5`` added to LR option list. - C2: web wizard cross-tab ``storage`` event listener. ## Distinct exit codes - D2: new ``EXIT_WIZARD_CANCELLED = 5`` constant; new ``WizardOutcome`` dataclass + ``run_wizard_full()`` entry point. Genuine cancels (Ctrl-C, non-tty, decline-to-save) now exit 5 instead of 0. Saved-and-deferred remains exit 0. - D1: best-effort ``import readline`` for arrow-key + history on Linux/macOS (Windows unaffected). ## Tests + isolation - B10: new autouse ``_isolate_wizard_state`` fixture in ``tests/conftest.py`` redirects ``XDG_CACHE_HOME`` for any ``tests/test_wizard_*`` module. - 28 new tests across the new behaviour (146 wizard tests passing in total; full suite 1573 passed / 14 skipped). - Existing ``test_wizard_still_works`` updated for the new ``WizardOutcome`` contract; new ``test_wizard_cancelled_exits_5``. ## Documentation - ``CHANGELOG.md`` — review-cycle 2 entry under [Unreleased]. - ``README.md`` — directory listing ``wizard.py`` → ``wizard/``. - ``docs/standards/architecture.md`` — Mermaid graph + table updated. - ``docs/design/wizard_mode.md`` — review-cycle 2 status banner. - ``docs/reference/usage{,-tr}.md`` — exit-code 5 row. - ``docs/usermanuals/{en,tr}/reference/exit-codes.md`` — full contract row + compatibility note (six integers, was five). - ``docs/guides/quickstart{,-tr}.md`` — operator-guardrail paragraph. ## Validation - ruff check + format clean (184 files). - 8 doc guards pass (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks). - module_size: no new warnings. - ``forgelm --config config_template.yaml --dry-run`` green. - ``pytest tests/`` — 1573 passed, 14 skipped. ## Deferred (roadmap entries, NOT in this PR) - F1: schema-driven single-source-of-truth (``wizard_defaults.json`` build-time generation) — multi-PR refactor, frames every future default drift. - B4: ``--wizard-from-yaml answers.yaml`` mode — surface change pending policy decision on scripted-config UX. - E2: per-prompt rationale catalog — needs ~50 hand-written strings. - E3: ``--wizard --start-from <yaml>`` idempotent re-run — pairs with F1. - F3: state-versioning migration logic. - C3: web wizard mobile-CSS overflow review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the 11 actionable findings from the review-cycle 2 audit.
Low-risk batch — no architectural changes. PR-B (F1 schema-driven
defaults SOT) follows separately.
## Critical / correctness
- A3: ``_save_wizard_state`` unlinks the temp file when ``os.replace``
fails (cross-device home, EACCES, EXDEV) so the
``.wizard_state.*.tmp`` files stop accumulating. Defensive
``finally``-branch sweep also covers the case where
``yaml.safe_dump`` itself raises before the rename runs.
- A4: ``_apply_strict_tier_coercion`` notice prints once per wizard
run instead of twice (compliance-step + evaluation-step idempotent
re-call). Tracked via the ``_wizard_meta.strict_tier_announced``
flag that ``_strip_internal_meta`` removes before save.
- F3: ``_STATE_MIGRATIONS`` registry skeleton in ``_state.py`` ready
for future ``_STATE_VERSION`` bumps. Empty by design; first real
bump only adds ``{1: migrator}``, no plumbing rewrite.
## Performance
- C16: ``_detect_hardware()`` cached on ``_WizardState.hardware``;
``_step_welcome`` + ``_print_preflight_checklist`` share the
result instead of paying a second torch import + CUDA enumeration
(~50–200 ms saved per wizard run).
## Standards / observability
- G30: ``_validate_generated_config`` now emits ``logger.warning(...)``
in addition to the inline ``_print`` notice — CI / log pipelines
see "wizard output failed schema validation" without scraping
stdout (per ``docs/standards/error-handling.md`` rule 2).
- B11: web wizard's ``storage`` event listener ``console.warn``s on
a sync failure instead of silently swallowing the error.
## UX
- C3: expanded YAML accordion on ``<900px`` viewports now caps at
``38vh; overflow:auto`` so long configs no longer push the
Copy/Download/Back/Next footer off-screen.
- E2 (hedefli): inline rationale on five high-impact prompts
(``lora_r``, ``lora_alpha``, ``num_train_epochs``, ``batch_size``,
``max_length``, ``risk_classification``) so first-time operators
see the operational hint without flipping into beginner mode.
## Tests
- E22-24: new ``test_wizard_start_training_routes_through_dispatcher``
covers the previously-uncovered branch in ``_maybe_run_wizard``
where ``outcome.start_training=True`` mutates ``args.config`` and
lets the trainer pipeline run.
- 20+ additional tests: atomic-write cleanup (success / replace-fail
/ dump-fail), announce-once flag, migration registry (unknown
version / newer version / chain runs / non-advancing migrator),
hardware cache reuse, validate-warning log line.
## Documentation
- A9: ``docs/product_strategy{,-tr}.md``, ``docs/design/gdpr_erasure.md``,
``docs/design/data_audit_cli_split.md`` — "0/1/2/3/4 contract"
claims now acknowledge ``EXIT_WIZARD_CANCELLED = 5``.
- F27: ``docs/usermanuals/{en,tr}/compliance/human-approval-gate.md``
cross-references exit code 5 alongside the existing exit-4
discussion (high-risk operators using ``--wizard`` need both
signals).
- B10 doc-only: ``_parse_webhook_value`` comment now notes the SSRF
preflight catches "wrong URL at config time", not "guaranteed-
public destination at runtime" — runtime TOCTOU tracked at #14.
- CHANGELOG entry under [Unreleased] / Added.
## Roadmap reshuffle
- B4 (``--wizard-from-yaml``) removed from consideration: project
identity ("config-driven") + B3 (non-tty refusal + ``quickstart``
hand-off) closed the gap.
- E3 (``--wizard --start-from <yaml>``) explicitly bound to F1:
reverse-mapping duplicates F1's schema-driven SOT work; deferred
until F1 lands.
## Validation
- ruff check + format clean (184 files).
- 8 doc guards pass (bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks).
- module_size: no new warnings.
- 181 wizard-scope tests pass (``test_wizard_*`` + ``test_phase12_5``
+ ``test_cli_phase10``).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the long-running drift surface tracked through P1-P17 across
review-cycles 1, 2, and 3. A schema field-default change without a
matching wizard update was the single biggest source of CLI ↔ web
parity gaps; this PR turns the schema into the canonical source.
## Pydantic field metadata flag
8 wizard-relevant fields in ``forgelm/config.py`` now carry
``json_schema_extra={"wizard": True}``:
- ``ModelConfig.max_length`` (2048)
- ``LoraConfigModel.r`` / ``alpha`` / ``dropout`` (8 / 16 / 0.1)
- ``TrainingConfig.num_train_epochs`` / ``per_device_train_batch_size``
/ ``gradient_accumulation_steps`` / ``learning_rate``
(3 / 4 / 2 / 2e-5)
The flag is a no-op for ``ForgeConfig.model_validate`` — it only
surfaces in ``model_json_schema()`` output, where the generator picks
it up.
## Generator + CI guard
``tools/generate_wizard_defaults.py`` walks every nested ``BaseModel``
reachable from ``ForgeConfig``, finds wizard-flagged fields, and
writes two artefacts:
- ``forgelm/wizard/_defaults.json`` — package data (shipped via
``[tool.setuptools.package-data]``).
- ``site/js/wizard_defaults.js`` — exposes ``window.WIZARD_DEFAULTS``
for the web wizard.
``tools/check_wizard_defaults_sync.py`` re-runs the generator into a
temp comparison and fails the CI if the shipped artefacts drift from
the schema. Operators get a clear "regenerate via …" hint with the
first 5 differing lines so the fix is one command away.
## Python consumer
``forgelm/wizard/_state.py`` now reads the ``DEFAULT_*`` constants
from the shipped JSON via ``importlib.resources``. A new
``_load_defaults()`` + ``_default(section, key, fallback)`` helper
pair handles missing-file / corrupt-JSON cases by returning ``{}``,
so the hardcoded fallbacks below still keep the wizard usable on a
broken pip install.
## JS consumer
``site/js/wizard.js`` ``defaultState()`` now uses a new
``wd(section, key, fallback)`` helper to read schema-derived values
from ``window.WIZARD_DEFAULTS``. ``site/quickstart.html`` loads
``wizard_defaults.js`` BEFORE ``wizard.js`` so the global is set in
time. Special-cases ``learning_rate``: arrives as a number from
JSON (2e-5), uses ``Number.toExponential()`` to canonicalise to
the dropdown's ``'2e-5'`` string format (rather than ``String()``
which would yield ``'0.00002'`` and fail to match).
## Tests
New ``TestSchemaDrivenDefaultsSOT`` class pins the contract:
- ``test_module_constants_match_schema_via_json`` — every
``DEFAULT_*`` equals its ``Field.default``.
- ``test_defaults_json_shipped_and_loadable`` — package data lives
on disk + parses cleanly + has expected sections.
- ``test_default_helper_returns_fallback_when_section_missing`` —
graceful degradation.
- ``test_load_defaults_returns_empty_on_missing_file`` — slim install
scenario.
Full suite green: 1590 passed, 14 skipped.
## Documentation
- ``CHANGELOG.md`` — F1 entry under [Unreleased] / Added with B4
removed-from-roadmap + E3-bound-to-F1 notes.
- ``CLAUDE.md`` — self-review chain extended to 8 commands; tools/
listing names ``generate_wizard_defaults`` + ``check_wizard_defaults_sync``.
- ``pyproject.toml`` — ``[tool.setuptools.package-data]`` ships
``forgelm/wizard/_defaults.json``.
## Why a generator + JSON, not live-import?
- Web wizard is a static site — no Python runtime to query schema
defaults at page-load; JSON-as-data ships with the assets.
- Python side benefits too: low import-cost path, JSON visible in
git diffs (auditable), and round-tripping forces wizard-relevant
defaults to be JSON-serialisable primitives.
## Roadmap reshuffle
- B4 (``--wizard-from-yaml``) confirmed REMOVED from consideration.
- E3 (``--wizard --start-from <yaml>``) explicitly bound to F1:
reverse-mapping reuses this PR's metadata flags + JSON shape
(deferred until needed; no scope rush).
## Validation
- ruff check + format clean (186 files including the two new tools).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks,
**wizard_defaults_sync** (new).
- module_size: ``forgelm/config.py`` crossed 1000 LOC (now 1011); it
was already grandfathered.
- 1590 passed, 14 skipped (full suite).
- ``forgelm --config config_template.yaml --dry-run`` green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d test Closes the actionable findings from the independent review of cycle-3 + F1 commits (5667885 + 91d5726). Low-risk batch — internal hardening + test coverage; no operator-visible behaviour change. ## Generator robustness (PR-B follow-up) - B1: ``_walk_model`` now unwraps ``Optional[BaseModel]`` sub-blocks via the new ``_unwrap_basemodel()`` helper (uses ``typing.get_args``). Pre-fix the walk only matched bare-class annotations, so a wizard flag on a field inside an ``Optional[XxxConfig]`` sub-model was silently skipped. 10 of 14 ForgeConfig top-level fields are ``Optional[...]`` — today no flags exist on them, but a future contributor would have hit the silent- skip trap with no error from the generator and no failure from the CI guard. - B2: ``default is PydanticUndefined`` identity check (imported from ``pydantic_core``) replaces the brittle ``repr(default).startswith("PydanticUndefined")`` substring match. Stable across Pydantic v2 minor releases. - C2: module-level header documenting the JSON-serialisable contract for wizard-flagged fields (``int / float / bool / str / None / list of primitives / dict of primitives`` only). Future flags on Path / datetime / callable defaults will fail loudly at generator run; the contract is now explicit so contributors don't get surprised. ## State module hardening (PR-B follow-up) - B5: ``_default()`` now warns via ``logger.warning`` when the shipped JSON value disagrees with the hardcoded fallback literal. The JSON is canonical (returned), but drift between schema and fallback now surfaces in CI logs — eliminates the second-source-of-truth silently-stale risk. ## CI guard test coverage (PR-A follow-up) - G2: new ``tests/test_check_wizard_defaults_sync.py`` (7 tests): - corrupt-JSON triggers failure + diff message + regeneration hint. - missing-target triggers clean failure (no crash). - diff truncation works at 5 lines. - generator unwraps ``Optional[BaseModel]`` (B1 contract pinned). - required (no-default) wizard fields skipped (B2 contract pinned). - ``check_wizard_defaults_sync.py``: new ``_display_path()`` helper falls back to absolute path when target is outside ``_REPO_ROOT`` (prevents ``Path.relative_to`` crash when tests monkeypatch the target to a tmp dir). ## Test hardening (PR-A follow-up) - G3: ``test_wizard_start_training_routes_through_dispatcher`` now asserts ``ForgeConfig.model_validate(config_data)`` on the test fixture YAML before mocking the trainer. Pre-fix, a future change to ``minimal_config()`` that produced an invalid YAML would still pass this test (because the trainer pipeline is mocked BEFORE validation runs in ``_dispatch.main``), giving false-positive coverage. ## Documentation + cosmetic - A3: hardware-cache freshness caveat documented in ``_cached_hardware()`` docstring (GPU hot-plug mid-session is an acceptable edge case — operator can exit-and-resume to refresh). - F3: README directory tree now mentions ``_defaults.json`` as schema-derived defaults SOT shipped under ``forgelm/wizard/``. - B7: ``defer`` attribute on ``wizard_defaults.js`` script tag. Both wizard scripts now defer; HTML5 spec guarantees execution order (document order) so ``WIZARD_DEFAULTS`` still lands before ``defaultState()`` runs, but neither blocks HTML parsing. ## Skipped (review noted as "soft / acceptable") - C3 (``_forgelm_wizard:`` extra-key prefix): ``model_json_schema()`` is not called anywhere in the codebase — the ``wizard:true`` extra cannot leak into operator-facing artefacts today. Cost > value. - E1, E2, E3, H2: explicit refactor / defer in the review. - B4 (``--wizard-from-yaml``) confirmed REMOVED from roadmap. - E3 (``--start-from <yaml>``) bound to F1; deferred until needed. ## Validation - ruff check + format clean (187 files including new test). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync. - module_size: no new warnings. - 1597 passed, 14 skipped (full suite). - ``forgelm --config config_template.yaml --dry-run`` green. - bandit -ll: 0 HIGH / 19 MEDIUM (all pre-existing B615 unpinned HF Hub downloads — project-wide, intentional, not in PR-C scope). ## PR-comment status - 13 inline comments from coderabbitai on commit ``52b113d`` (Phase 22 origin) — ALL addressed in cycle-2 (``c0dd57a``) + cycle-3 (``5667885``) commits. Threads attached to old commits; will auto-resolve on PR merge. - 6 inline comments from gemini-code-assist on commit ``52b113d`` — ALL addressed. - Codacy + SonarQube hotspot summaries from ``52b113d`` — current HEAD (``91d5726`` + this commit) addresses everything they flagged in scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
External scanners (Codacy / SonarCloud) attached to PR #40 reported findings on the original ``52b113d`` Phase 22 commit. Verified each against the current HEAD; the remaining noise is genuinely non-actionable. Shrinks the public quality-gate to its real signal. ## Codacy 7 HIGH severity — all false positives, suppressed in-tree - **Prospector F401 (5×)** in ``forgelm/wizard/__init__.py`` — the re-export shim already carried ``# ruff: noqa: F401`` at the top but Codacy's Prospector doesn't honor ruff directives. Per-line ``# noqa: F401 -- re-export for tests (pyflakes/Prospector compat)`` markers added to the 5 flagged underscore-prefixed symbols (``_AUDIT_LARGE_FILE_THRESHOLD_BYTES``, ``_GALORE_OPTIMIZERS``, ``_BACK_TOKENS``, ``_STEPS``, ``_STATE_VERSION``). These are intentional re-exports for the wizard test suite, not dead imports. - **Bandit B105** in ``forgelm/wizard/_collectors.py:673`` — ``backend_token == "api"`` flagged as "possible hardcoded password". False positive: ``"api"`` is an enum-style discrim- inator (one of ``"api"``/``"local"``/``"file"``). Suppressed with ``# nosec B105 -- enum-style discriminator, not a secret``. - **PMD InaccurateNumericLiteral** in ``site/js/wizard_defaults.js`` on the ``2e-05`` literal — the file is a build-time artefact emitted by ``tools/generate_wizard_defaults.py`` from Pydantic ``learning_rate`` default. PMD's heuristic flags scientific notation as imprecise; the JS consumer canonicalises with ``Number.toExponential()`` so precision is never operator-visible. New ``.codacy.yaml`` excludes the generated artefact (and its Python sibling ``forgelm/wizard/_defaults.json``) from analysis. ## Codacy 29 INFO severity - **markdownlint MD013 (line-length, 17×)** — ForgeLM bilingual reference + design docs intentionally carry long lines (paragraph- style explanations, table cells, prose with long inline links). The 80-col limit is a code-style holdover that hurts readability for documentation; the project's ``bilingual_parity`` guard already enforces structural alignment. Disabled via ``.markdownlint.json``: ``"MD013": false`` with a rationale comment. - **markdownlint MD030 (9×)** in ``docs/design/wizard_mode.md`` — cycle-2 already fixed the list-marker spacing in this file; the Codacy result was stale (older commit snapshot). No code change needed; will clear on next Codacy re-scan. - **markdownlint MD028 (1×)**, **Semgrep pytorch-detection (2×)** — informational only, no action. ## SonarCloud 8 Security Hotspots — all LOW, all in test fixtures Hotspot review (no code changes — test fixtures are intentional): - ``tests/test_wizard_phase22.py:90`` ``ftp://example.com/hook`` — literal in ``test_unknown_scheme_rejected``; verifies the wizard rejects unsupported schemes. - ``tests/test_wizard_phase22.py:93,94`` ``http://hooks.example.com/x`` — literal in ``test_http_warning_but_accepted``; verifies the HTTP-with-warning behaviour mirrors ``forgelm/webhook.py``. - ``tests/test_wizard_phase22.py:830`` IMDS IP ``169.254.169.254`` / ``:834`` RFC1918 ``10.0.0.5`` — literals in ``TestWebhookSSRFPreflight``; verifies the SSRF preflight rejects loopback / RFC1918 / link-local destinations. - ``tests/test_wizard_phase22.py:924,929`` ``tmp_path`` / ``tests/test_cli_phase10.py:383`` — pytest's standard temp- directory fixture pattern. These hotspots are the ENTIRE point of the tests they appear in — suppressing them would require either disabling the SonarCloud rule project-wide (loses signal on production code) or adding inline comments that confuse the test-source-as-spec pattern. Decision: mark as "Safe / Reviewed" via SonarCloud's UI, no in-tree changes. ## Validation - ruff check + format clean (187 files). - 9 doc guards pass. - 139 wizard-scope tests pass (test_wizard_phase22 + test_check_wizard_defaults_sync + test_cli_phase10). - ``check_wizard_defaults_sync`` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes E3 from the cycle-2 + cycle-3 review backlog. Operators can
now iterate on an existing config without losing prior answers — the
long-asked-for ergonomic improvement that the previous review cycles
all flagged ("wizard always starts from scratch") and bound to F1
(landed in PR-B `91d5726`) so the per-step value-honoring reuses
the schema-derived defaults plumbing.
## CLI surface
- New flag ``--wizard-start-from PATH`` registered in
``forgelm/cli/_parser.py``; ``--wizard-start-from existing.yaml``
alongside ``--wizard`` triggers the new flow.
- ``forgelm/cli/_wizard.py``: ``_maybe_run_wizard`` reads
``args.wizard_start_from`` (via ``getattr`` for back-compat) and
threads it into ``run_wizard_full(start_from=...)``.
## Orchestrator
- New ``_load_initial_state_from_yaml(path)`` helper:
reads YAML, validates it against ``ForgeConfig``, seeds a
:class:`_WizardState` with ``experience="expert"`` +
``use_case=_MANUAL_USE_CASE`` + ``config=copy.deepcopy(data)``.
Three error categories surface to the operator: ``FileNotFoundError``
for missing path, ``ValueError`` for parse failure / non-mapping
root / schema rejection. All caught by ``run_wizard_full`` and
rendered as a cancelled outcome with a printed warning — no
traceback for the operator.
- ``run_wizard``, ``run_wizard_full``, ``_run_full_wizard_outcome``,
and ``_run_full_wizard`` now accept ``start_from: Optional[str] =
None`` (back-compat preserved — None falls back to today's flow).
- When ``start_from`` is supplied the quickstart-template prelude is
skipped — the operator's intent is "edit this file", not "pick a
fresh template".
## Per-step value honouring
- ``_step_strategy``: ``lora_r`` and ``lora_alpha`` prompts default
to ``state.config["lora"]`` values when present (falls back to
schema defaults).
- ``_step_training_params``: ``num_train_epochs`` /
``per_device_train_batch_size`` / ``model.max_length`` /
``output_dir`` prompts honour existing ``state.config`` values.
- ``_step_dataset``: when an existing dataset path is present, the
step prints "Existing dataset: <path>" and offers a "Keep this
dataset?" yes/no early-out so the operator doesn't re-trigger
ingestion / audit on an unchanged path.
Other steps (welcome / use-case / model / trainer / compliance /
evaluation) already honoured existing values via
``state.config.get(...)`` defaults — no change needed.
## Save flow
- ``_run_full_wizard_outcome``: when ``start_from`` is supplied the
save prompt's default filename becomes the start-from path. The
existing ``_prompt_unique_filename`` overwrite confirmation still
fires before clobbering.
## Tests
10 new tests under ``tests/test_wizard_phase22.py``:
- ``TestStartFromYAMLLoad``: valid YAML pre-populates state (with
``copy.deepcopy`` isolation), missing path raises
``FileNotFoundError``, malformed YAML raises ``ValueError``,
non-mapping root raises, schema-invalid YAML raises.
- ``TestRunWizardFullStartFrom``: missing / invalid YAML returns a
cancelled ``WizardOutcome`` instead of crashing.
- ``TestStepHonorsExistingValues``: strategy / training-params /
dataset steps each preserve existing values when the operator
presses Enter at every prompt.
Total wizard-scope tests: 202 (was 192).
## Documentation
- ``CHANGELOG.md`` — E3 / PR-D entry under [Unreleased] / Added.
- ``docs/design/wizard_mode.md`` — PR-D additions block.
- ``docs/reference/usage{,-tr}.md`` — new flag in the cheat-sheet.
- ``docs/usermanuals/{en,tr}/reference/cli.md`` — flag table row.
- ``docs/guides/quickstart{,-tr}.md`` — "idempotent re-run"
paragraph after the existing operator-guardrail block.
## Validation
- ruff check + format clean (187 files).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1607 passed, 14 skipped (full suite, was 1597).
## Roadmap status — fully on-plan
- B4 (``--wizard-from-yaml`` answer-file mode): REMOVED from
consideration earlier (B3 closed the gap).
- E3: SHIPPED in this PR.
- F1 (schema-driven SOT): SHIPPED in PR-B.
- F3 (state-version migration skeleton): SHIPPED in PR-A.
- C3 (mobile YAML clamp): SHIPPED in PR-A.
- E2 (rationale catalog): targeted version SHIPPED in PR-A; full
catalog explicitly deferred (no scope rush).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eview
Independent code review of PR-D (`2ac9df2`) flagged 3 HIGH-severity
bugs where pressing Enter at certain prompts silently regressed the
operator's loaded YAML to wizard defaults — directly violating the
"press Enter to keep" contract documented in PR-D's CHANGELOG entry
and the quickstart guide. Plus 4 MEDIUM/LOW issues. All closed
here.
## Critical: contract violations on `--wizard-start-from`
- **PR-D-A1: ``_step_strategy`` regressed `lora.method` /
`target_modules` / `dropout` / `bias` / `task_type` on Enter.**
``_select_strategy()`` now accepts ``existing_method`` /
``existing_load_in_4bit`` / ``existing_galore`` kwargs and derives
the prompt's ``default_idx`` from the loaded YAML. Target-modules
prompt detects the loaded list shape against
``TARGET_MODULE_PRESETS`` and shifts its default accordingly.
Non-prompted fields (dropout / bias / task_type) now use
``setdefault``.
- **PR-D-A2: ``_step_evaluation`` clobbered existing benchmark /
llm_judge / safety blocks.** Function now seeds ``evaluation``
from ``state.config.get("evaluation")`` via ``copy.deepcopy``,
defaults the ``auto_revert`` gate based on existing state, and
preserves prior blocks when the operator declines a re-prompt.
- **PR-D-A3: ``_step_use_case`` overwrote the operator's existing
model + trainer choices** with the first template's preset (default
index = 1). Combined with save-default-to-``start_from``, this
silently corrupted the original YAML. Step now early-returns
with a clear notice when ``state.use_case == _MANUAL_USE_CASE`` and
``model.name_or_path`` / ``training.trainer_type`` are populated.
Greenfield runs now use ``setdefault`` for the preset application.
## Medium / hardening
- **PR-D-A4: ``_step_welcome`` ``model.backend``** → chained
``setdefault`` instead of the nested-dict-replace pattern. An
operator's explicit ``model.backend: transformers`` no longer
flips silently to ``unsloth`` on a Linux+GPU box.
- **PR-D-A5: ``_load_initial_state_from_yaml`` split try/except**
for the ``ForgeConfig`` import vs. the ``model_validate`` call. A
downstream ``ImportError`` from inside a Pydantic validator can no
longer silently bypass schema validation.
- **PR-D-A6: warns before discarding pre-existing wizard resume
state** when ``--wizard-start-from`` is supplied alongside an
in-progress ``$XDG_CACHE_HOME/forgelm/wizard_state.yaml``. Pre-fix
the snapshot was silently destroyed at save time.
- **PR-D-A7: ``docs/guides/quickstart{,-tr}.md``** scopes the
"press Enter to keep" claim to numeric / text / choice prompts;
documents the use-case skip and the ``setdefault`` pattern for
non-prompted fields.
## UX / test coverage
- **PR-D-B1: new CLI dispatcher integration test**
(``test_wizard_start_from_threads_through_to_run_wizard_full``)
pins parser → dispatcher → ``run_wizard_full`` thread-through.
- **PR-D-B3: ``--wizard-start-from`` without ``--wizard``** now
prints a one-line warning instead of silently no-op'ing.
- **PR-D-B5: ``_collect_data_governance(existing=...)``** kwarg
threads existing Article 10 free-text values through as prompt
defaults. Strict-tier operators iterating from a populated YAML
no longer re-type collection_method / annotation_process /
known_biases.
- **PR-D-B6: ``_canonical_start_from()``** helper applies
``Path(...).expanduser()`` once at entry. Pre-fix
``--wizard-start-from ~/configs/x.yaml`` created a literal
``~/configs/x.yaml`` directory on save because the load did
``expanduser`` but the save flow's overwrite check did not.
## Tests
13 new tests under ``tests/test_wizard_phase22.py`` covering every
fix above:
- ``TestPRDA1StrategyHonorsExisting`` (3): DoRA / target_modules /
GaLore preserved on Enter.
- ``TestPRDA2EvaluationHonorsExisting`` (2): benchmark + judge
preserved when declining re-prompt; auto-revert default reflects
existing state.
- ``TestPRDA3UseCaseSkipsWhenExisting`` (1): early-return with
notice when start-from path + populated model/trainer.
- ``TestPRDA4WelcomeBackendSetdefault`` (1): existing backend
survives Linux+GPU branch.
- ``TestPRDA5ImportErrorBypass`` (1): valid YAML happy path still
validates.
- ``TestPRDA6ResumeStateWarning`` (1): in-progress snapshot warning
emitted.
- ``TestPRDB3StartFromWithoutWizardWarns`` (1): silent no-op
warning.
- ``TestPRDB5GovernanceHonorsExisting`` (1): existing free-text
values preserved on Enter.
- ``TestPRDB6ExpanduserCanonicalisation`` (1): ``~`` expansion at
entry.
Plus 1 dispatcher integration test
(``test_wizard_start_from_threads_through_to_run_wizard_full``)
under ``tests/test_cli_phase10.py``.
## Validation
- ruff check + format clean (187 files).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync.
- module_size: no new warnings.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1620 passed, 14 skipped (full suite, was 1607).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-track cleanup landed atomically:
## Track 1 — Working-memory directories now strictly gitignored
``docs/marketing/`` and ``docs/analysis/`` are operator-local research
+ audit + draft notes that never appear in fresh clones. Pre-fix, 11
``docs/analysis/code_reviews/*.md`` files were exempted via inverse
``.gitignore`` rules and referenced from the public tree (CHANGELOG,
standards, roadmap, design docs, tests). Every such reference rotted
into a 404 the moment the maintainer touched the local working-memory
tree.
- **`.gitignore` simplified.** Dropped 11 ``!docs/analysis/code_reviews/<file>``
exception lines that re-exposed specific closure-plan and master-
review files; the directory-level ignore now applies uniformly.
- **``git rm --cached -r docs/analysis/``** untracked the 11 leaked
files (kept on disk locally; just no longer versioned in this repo).
## Track 2 — Public-tree reference cleanup
19 references removed / restructured. Where the citation was the
load-bearing anchor of a sentence, the surrounding text was rewritten
to convey the same intent without the file pointer.
- **`CHANGELOG.md`**: 6 references removed (review-cycle 2 audit
citation, Phase 22 cycle-1 review citation, closure-plan twice,
wave2a fix-summary, master-review-opus, two `Documentation`
links) — replaced with PR-thread / roadmap pointers.
- **`CLAUDE.md`**: 2 references rewritten — the "Common pitfalls"
block dropped its QKV-Core / Trion citations; the memory section
rewrote the bullet to forbid future references with a clear rule.
- **`docs/standards/coding.md`**: master-review-opus citation
removed; "anti-patterns" intro rewritten as project lessons.
- **`docs/standards/documentation.md`**: the "researcher" audience
row dropped; new top-level "Working-memory directories" section
spells out the no-references rule with rationale.
- **`docs/standards/localization.md`**: the ``docs/analysis/**``
bilingual-policy row dropped (no public-tree reference == no
bilingual concern).
- **`docs/design/iso27001_soc2_alignment.md`**: removed the
marketing/strategy citation and a self-referential "this file
lives under ``docs/analysis/``" note (it lives under
``docs/design/``).
- **`docs/roadmap.md`** / **``phase-12-6-closure-cycle.md``** /
**``phase-12-data-curation-maturity.md``**: closure-plan and
marketing/strategy citations rewritten to point at public roadmap
/ CHANGELOG / CLAUDE.md anchors instead.
- **`tests/test_wizard_phase22.py`** + **``test_check_bilingual_parity.py``**:
test docstrings softened to describe the directory category
(working-memory) rather than naming specific analysis files.
- **`tools/check_anchor_resolution.py`** + **``check_yaml_snippets.py``**:
comments + docstrings softened. Functional path-string filters
(``_SKIP_PATH_FRAGMENTS = ("docs/analysis/", "docs/marketing/")``,
``--exclude analysis`` default) are KEPT as path filters (they're
not citations; they're directory exclusions).
- **`.claude/skills/{cut-release,sync-bilingual-docs}/SKILL.md`**:
marketing/strategy template + content-strategy citations rewritten
to reference local-only working memory rather than specific paths.
## Track 3 — Standard + CI guard preventing regression
- **New "Working-memory directories" rule** in
`docs/standards/documentation.md` codifies the ban with examples
+ path-string-filter exemption.
- **New CI guard `tools/check_no_analysis_refs.py`** scans every
git-tracked file (uses ``git ls-files``, so the gitignored dirs
are excluded by construction) and fails the run on any citation
into the working-memory tree. Per-file ``_EXEMPT`` allowlist for
policy statements (``CLAUDE.md``, ``docs/standards/documentation.md``,
``.gitignore``, the skill that warns about the dirs) and
functional path filters; each entry carries a written
justification comment.
- **CLAUDE.md self-review chain** extended to 9 commands; tools/
listing names ``check_no_analysis_refs``.
## Validation
- ruff check + format clean (188 files including the new guard).
- 10 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync,
**check_no_analysis_refs** (new — clean).
- module_size: no new warnings.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1620 passed, 14 skipped (full suite).
## Site verification (read-only side-effect)
The audit pass produced a 270-line site verification report
(3 CRITICAL + 5 HIGH + 6 MEDIUM + 3 LOW findings on ``site/``
claims vs. live code). The report itself lives in the now-
gitignored ``docs/analysis/`` working-memory tree per the new
policy; follow-up site fixes will be tracked as separate PRs
referencing the live code, not the report.
Top three site findings to address in a follow-up PR:
1. ``forgelm.api`` import path advertised on ``index.html`` /
``features.html`` does not exist (real surface is
``from forgelm import ForgeTrainer, audit_dataset, ...``).
2. ``forgelm quickstart byod`` is shown on ``index.html`` but
isn't a real template (BYOD is a wizard sub-flow, not a
quickstart key).
3. ``compliance.html`` cites ``forgelm/data_audit.py`` as a
single-file module; it became a sub-package in Phase 14.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(wizard): Phase 22 — CLI wizard parity with web + sub-package split
Closes 11 CRITICAL + 10 HIGH + 9 MEDIUM + 7 LOW findings from pass-4 site-vs-code audit. Every site claim now matches the live forgelm/ surface; the four non-en/tr locales achieve full key parity. CRITICAL — schema drift killed: - Y1-Y6: index.html YAML demo rewritten to use real Pydantic field names (num_train_epochs, per_device_train_batch_size, evaluation.benchmark.min_score, evaluation.safety.classifier, evaluation.require_human_approval, top-level webhook block). - T1-T5: compliance.html artefact tree redrawn (compliance/ + final_model/ instead of artifacts/, audit_log.jsonl at checkpoint root, config_snapshot.yaml ghost dropped). - X1: forgelm verify-audit example points at audit_log.jsonl, not the directory. - F1: compliance.config_hash ghost replaced with forgelm verify-audit chain integrity claim (HTML + en+tr i18n). - F2: compliance.human_approval ghost YAML replaced with evaluation.require_human_approval (HTML + 6 i18n + usermanuals source markdown). - C1: forgelm.api -> "from forgelm import ..." with real forgelm.__all__ symbols. - C2: forgelm quickstart byod -> forgelm quickstart domain-expert. - C3: forgelm/data_audit.py -> forgelm/data_audit/ (Phase 14 package). HIGH — overclaims softened to match live behaviour: - H4: exit codes 0/1/2/3/4 -> 0-5. - H5: Annex IV "eight" -> "nine required §1-9 sections" matching _ANNEX_IV_REQUIRED_FIELDS. - H6: 168 i18n strings (42 keys * 4 langs) backfilled into de/fr/es/zh — every locale now at 731 keys (was 689). - H7: chat "safety routing" -> "planned for upcoming Pro CLI release" (chat.py has no Llama Guard hook in v0.5.5). - H8: wizard step counter 7 -> 9; fr/es/zh "~ 3 minutes" semantic drift swapped for step-count parity. - R1-R5: auto-revert wording, verify-annex-iv audit-chain claim, safety-eval input formats, --model-card ghost flag, purge data- provenance manifest claim — all reworded to match real behaviour. - W1: wizard.js USE_CASE_PRESETS synced with quickstart.py::TEMPLATES. MEDIUM + LOW: reverse-pii wording (M10), Mermaid jsDelivr disclosure on privacy (M11), audit-event scope (M14), QMS SOP names, PII regex specifics, docker-compose.yaml, doctor pass/warn/fail, Discord webhook, dpia-template ghost link, customer-support/code-assistant body softening, quickstart template parentheticals, JSON-LD operatingSystem + Windows on 8 pages, Last updated 2026-04-28 -> 2026-05-09. M9 wizard step-name divergence + O1 STATE_VERSION drift documented inline in _orchestrator.py + _state.py + wizard.js. CHANGELOG.md added to check_no_analysis_refs.py exempt list (the review-cycle 5 entry that announces this very policy quotes the dir names verbatim — that is the documentation, not a content reference). CI gauntlet (14 guards + ruff + dry-run + 92 guard tests + 139 wizard tests) verified clean post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes 3 of 4 inline review comments on PR #41; skips the 4th with a brief reason. Accepted (3): - coderabbit @ features.html:239 — safety-eval output filename was wrong: forgelm/safety.py:417 writes safety_results.json, not safety_evaluation.json. Fixed in HTML + all 6 i18n locales. - gemini @ index.html:542 — YAML demo comment "lora / dora / pissa / rslora" updated to "lora / DoRA / PiSSA / rsLoRA" to match the cross-surface academic-prose convention used in the wizard tutorial (forgelm/wizard/_orchestrator.py:339). The YAML *value* "dora" stays lowercase because the schema's Literal accepts lowercase. - gemini @ compliance.html:199 — Article 10 PII evidence row mixed a directory path (forgelm/data_audit/) with a Python symbol (forgelm.mask_pii); aligned with the file-path convention used by every other Article card. Gemini's literal suggestion (forgelm/data_audit/masking.py) does not exist — the real file is forgelm/data_audit/_pii_regex.py:127, so the evidence cell now points there. Skipped (1): - coderabbit @ quickstart.html:154 — wizard STEPS reconciliation between site/js/wizard.js and forgelm/wizard/_orchestrator.py. The divergence is by design and already documented inline at _orchestrator.py:719: both surfaces produce the same generated YAML through different UX flows (different stores: localStorage vs XDG cache; different preset-driven step ordering). Reconciling step IDs would force one surface's UX onto the other with no functional benefit; the user-visible "9 guided steps" counter accurately reflects the JS wizard's STEPS.length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer flagged pre-existing prose drift on two strings where the
en/tr versions describe a thicker mechanism than the four other
locales. Token-level claim parity now restored.
compliance.art14.body — de/fr/es/zh previously only described the
notification side-effect ("blocks model promotion until a human signs.
Webhook to your incident tool…") and never named the load-bearing CLI
surface (forgelm approve / forgelm reject + forgelm approvals + the
staging-directory mechanism). Rewritten to mirror en's wording — the
three CLI commands plus the staging-directory move now appear in
every locale. Token check across all 6 langs:
forgelm approve / reject / approvals — present.
privacy.body — de/fr/es/zh + (zh especially) were missing the entire
"Data-subject rights you can ship in your own pipeline" paragraph
that names forgelm reverse-pii (Article 15), forgelm purge
(Article 17), and the ISO 27001 / SOC 2 Type II alignment claim. The
paragraph was inserted into all four locales as a new <h2> section
between "The toolkit itself" and "Changes," matching en/tr's
information architecture. Token check across all 6 langs:
forgelm purge / reverse-pii / ISO 27001 — present.
This addresses the only "notable drift to flag" item in the PR-41
review verdict. Pre-existing prose drift, not introduced by this PR;
the wave-3 i18n batch never re-synced these two strings when the en/tr
versions were widened.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(site): pass-5 closure — site-vs-code honesty sweep + i18n parity
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Sorry @cemililik, your pull request is larger than the review limit of 150000 diff characters
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughPhase 22 CLI wizard modernization: the monolithic ChangesPhase 22 CLI Wizard Refactor & Modernization
Sequence Diagram(s)sequenceDiagram
participant User as User (tty)
participant CLI as forgelm CLI
participant Orch as wizard Orchestrator
participant State as XDG Wizard State
participant Trainer as Training Pipeline
User->>CLI: forgelm --wizard [--wizard-start-from path]
CLI->>Orch: run_wizard_full(start_from=path)
Orch->>State: _load_wizard_state() / load start_from YAML
Orch->>User: prompt steps (welcome → ... → evaluation)
User->>Orch: answers / navigation tokens
Orch->>State: _save_wizard_state() after each step (atomic write)
alt cancelled / non-tty / decline-to-save
Orch-->>CLI: WizardOutcome(cancelled=True)
CLI->>User: exit with EXIT_WIZARD_CANCELLED (5)
else produced YAML (maybe start_training)
Orch->>State: _save_config_to_file()
Orch-->>CLI: WizardOutcome(config_path, start_training)
CLI->>Trainer: if start_training True, set args.config and run pipeline
CLI->>User: exit with EXIT_SUCCESS (0)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| BestPractice | 1 minor |
| ErrorProne | 5 high |
| Security | 2 medium 1 minor |
🟢 Metrics 709 complexity · 2 duplication
Metric Results Complexity 709 Duplication 2
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request modernizes the interactive configuration wizard by splitting it into a sub-package, aligning it with the web-based wizard, and introducing an idempotent re-run feature via the --wizard-start-from flag. It also implements strict CI guards to prevent documentation drift and references to gitignored working-memory directories. Feedback identifies a logic flaw in the idempotent re-run flow where previously enabled features, such as auto-revert or safety evaluation, cannot be disabled through the prompts. Improvements are also suggested for ensuring atomic writes of the final configuration file and gating the training start prompt on successful schema validation of the generated YAML.
| existing_evaluation = state.config.get("evaluation") or {} | ||
| evaluation: Dict[str, Any] = copy.deepcopy(existing_evaluation) | ||
| auto_revert_default = bool(existing_evaluation.get("auto_revert", False)) | ||
| if _prompt_yes_no( | ||
| "Enable auto-revert (discard model if quality drops)?", | ||
| default=auto_revert_default, | ||
| ): | ||
| evaluation["auto_revert"] = True | ||
| existing_max_loss = existing_evaluation.get("max_acceptable_loss") | ||
| # P14: when auto-revert is on but no explicit threshold is given, | ||
| # emit the web wizard's safe ``2.0`` default rather than leaving | ||
| # the field unset. Mirrors ``site/js/wizard.js:163`` so the | ||
| # two surfaces produce equivalent YAMLs for the same answers. | ||
| prompt_default = str(existing_max_loss) if existing_max_loss is not None else "" | ||
| max_loss = _prompt("Max acceptable loss (leave empty for default 2.0)", prompt_default) | ||
| if max_loss.strip(): | ||
| try: | ||
| evaluation["max_acceptable_loss"] = float(max_loss) | ||
| except ValueError: | ||
| _print(f" '{max_loss}' is not a number; using default 2.0.") | ||
| evaluation["max_acceptable_loss"] = 2.0 | ||
| else: | ||
| evaluation["max_acceptable_loss"] = 2.0 | ||
| elif "auto_revert" in existing_evaluation and not auto_revert_default: | ||
| # No-op branch: operator left auto-revert disabled (matches | ||
| # existing state). Nothing to update. | ||
| pass |
There was a problem hiding this comment.
The logic for auto_revert in an idempotent re-run is flawed. If auto_revert was previously enabled (auto_revert_default is True) and the operator answers "no" to the prompt, the if block is skipped and the elif block is also skipped (because not auto_revert_default is False). Since evaluation is a deepcopy of the existing config, auto_revert remains True, ignoring the operator's choice to disable it. To fix this, explicitly remove the keys if the user answers "no".
if _prompt_yes_no(
"Enable auto-revert (discard model if quality drops)?",
default=auto_revert_default,
):
evaluation["auto_revert"] = True
existing_max_loss = existing_evaluation.get("max_acceptable_loss")
# P14: when auto-revert is on but no explicit threshold is given,
# emit the web wizard's safe ``2.0`` default rather than leaving
# the field unset. Mirrors ``site/js/wizard.js:163`` so the
# two surfaces produce equivalent YAMLs for the same answers.
prompt_default = str(existing_max_loss) if existing_max_loss is not None else ""
max_loss = _prompt("Max acceptable loss (leave empty for default 2.0)", prompt_default)
if max_loss.strip():
try:
evaluation["max_acceptable_loss"] = float(max_loss)
except ValueError:
_print(f" '{max_loss}' is not a number; using default 2.0.")
evaluation["max_acceptable_loss"] = 2.0
else:
evaluation["max_acceptable_loss"] = 2.0
else:
evaluation.pop("auto_revert", None)
evaluation.pop("max_acceptable_loss", None)| if safety: | ||
| evaluation["safety"] = safety |
There was a problem hiding this comment.
As noted above, if the operator answers "no" to the safety evaluation prompt, the existing safety block should be removed from the configuration to allow disabling the feature during an idempotent re-run.
| if safety: | |
| evaluation["safety"] = safety | |
| if safety: | |
| evaluation["safety"] = safety | |
| else: | |
| evaluation.pop("safety", None) |
| if benchmark: | ||
| evaluation["benchmark"] = benchmark |
There was a problem hiding this comment.
As noted above, if the operator answers "no" to the benchmark evaluation prompt, the existing benchmark block should be removed from the configuration.
| if benchmark: | |
| evaluation["benchmark"] = benchmark | |
| if benchmark: | |
| evaluation["benchmark"] = benchmark | |
| else: | |
| evaluation.pop("benchmark", None) |
| evaluation["llm_judge"] = judge | ||
| if evaluation: |
There was a problem hiding this comment.
| yaml.safe_dump(config, f, default_flow_style=False, sort_keys=False, allow_unicode=True) | ||
| _print(f"\n Config saved to: {requested_filename}") |
There was a problem hiding this comment.
The final configuration file is saved using a direct open(..., "w") call, which is not atomic. If the wizard is overwriting an existing config (supported via --wizard-start-from) and the write fails (e.g., disk full, power loss), the original file will be corrupted or lost. Consider using the same atomic write pattern (temp file + os.replace) implemented in _save_wizard_state to ensure the integrity of the operator's configuration files.
| } | ||
|
|
||
|
|
||
| def _collect_monitoring() -> Optional[Dict[str, Any]]: |
There was a problem hiding this comment.
Most section collectors (like _collect_monitoring, _collect_retention, _collect_benchmark, etc.) do not accept existing values from the loaded configuration. When an operator iterates on an existing config via --wizard-start-from and chooses to reconfigure a section, they are forced to re-type every field from scratch because the prompts use hardcoded defaults. To improve the UX of idempotent re-runs, these collectors should accept an existing dictionary and use its values as prompt defaults, mirroring the pattern used in _collect_data_governance.
| _validate_generated_config(config_filename) | ||
| _clear_wizard_state() | ||
| if _prompt_yes_no("Start training now?", default=False): |
There was a problem hiding this comment.
The "Start training now?" prompt is offered even if the generated configuration fails schema validation. Since ForgeConfig validation failures are fatal for the trainer, starting a run with an invalid config is guaranteed to fail. Consider making _validate_generated_config return a boolean and using it to gate the training start prompt.
| _validate_generated_config(config_filename) | |
| _clear_wizard_state() | |
| if _prompt_yes_no("Start training now?", default=False): | |
| valid = _validate_generated_config(config_filename) | |
| _clear_wizard_state() | |
| if valid and _prompt_yes_no("Start training now?", default=False): |
| counter += 1 | ||
|
|
||
|
|
||
| def _validate_generated_config(config_filename: str) -> None: |
| _print(" ✓ Schema validation passed.") | ||
|
|
Two scopes bundled together since both are release-prep housekeeping on `development` ahead of the next PyPI tag: 1. Issue #37 — nightly pip-audit gate - `transformers` CVE-2026-1839 was failing the Supply-chain security workflow; published fix lives in 5.0.0rc3 (release candidate) and ForgeLM's pyproject pins `<5.0.0`. - Added `--ignore-vuln CVE-2026-1839` to `.github/workflows/nightly.yml` with multi-line rationale + remove-after condition. Per the policy in `tools/check_pip_audit.py` each ignore must carry a written justification; this one is reviewed at every release. 2. Release-prep doc refresh - CHANGELOG `[Unreleased]` rewritten in public-facing style (grouped by user-visible capability, no internal review-cycle / PR-letter / round-N terminology). Added entries for Phase 22 wizard parity-with-web, the site documentation correction sweep (PR #41), the working-memory directory policy, and the issue #37 stop-gap. - `docs/roadmap.md` + `docs/roadmap-tr.md` add Phase 22 + site sweep rows in the "Status at a glance" table; "Current state" sentence updated from the stale "release imminent" framing to reflect that v0.5.5 was paper-merged on main but the PyPI tag was deferred so the upcoming release can fold in the post- merge work. - `docs/roadmap/releases.md` marks v0.5.5 as "Merged on main; PyPI tag deferred"; new "Next PyPI release" section documents the upcoming release scope and surfaces the version-slot decision (`0.5.6` vs `0.6.0`) for the maintainer with rationale on each side. - `tools/check_no_analysis_refs.py` `_EXEMPT` list extended with `docs/roadmap/releases.md` (parallel to the CHANGELOG.md exempt for the policy-announcement entry that names the gitignored dirs verbatim). - `docs/roadmap/phase-12-6-closure-cycle.md` link to v0.5.5 section anchor updated to match the new heading. CI gauntlet (anchor resolution / bilingual parity / no-analysis-refs / yaml snippets / audit catalog / library API / numerical claims / wizard defaults / site claims / cli help / ruff / dry-run) verified clean. No code path or test changes; pure docs + workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/usermanuals/en/reference/cli.md (1)
205-214:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing exit code 5 documentation — bilingual parity violation.
The Turkish CLI reference (
docs/usermanuals/tr/reference/cli.mdline 210) documents exit code5(wizard cancelled), but this English version does not include it in the exit codes table. As per coding guidelines, user-facing docs must be bilingual mirrors.➕ Proposed fix: Add exit code 5 to the table
| 3 | Auto-revert / regression | | 4 | Awaiting human approval (training pipeline) | +| 5 | Wizard cancelled | `forgelm --wizard` exited without writing YAML (Ctrl+C, non-tty rejection, save refusal) — distinct from `0` so CI can distinguish "wizard completed" from "wizard wrote nothing" | | 130 | User interrupted (Ctrl+C) |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/usermanuals/en/reference/cli.md` around lines 205 - 214, The English CLI exit codes table is missing exit code 5 present in the Turkish version; update the table in docs/usermanuals/en/reference/cli.md to include a new row for "5 | Wizard cancelled" (matching the Turkish description) so both language docs remain in parity; ensure the new entry follows the same table formatting and update the "See [Exit Codes]" reference only if needed.docs/usermanuals/tr/reference/exit-codes.md (1)
113-113:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBilingual parity issue — compatibility range not updated.
The compatibility guarantee still states "Exit kodları 0-4" (exit codes 0-4) but should be "0-5" to match the English version and include the newly documented exit code 5. The English version (line 113 in
docs/usermanuals/en/reference/exit-codes.md) correctly states "Exit codes 0-5 are stable."📝 Proposed fix
-Exit kodları 0-4 sürümler arası kararlıdır. Yeni kodlar eklenebilir (5, 6, …) ama mevcutların semantiği değişmez. +Exit kodları 0-5 sürümler arası kararlıdır. Yeni kodlar eklenebilir (6, 7, …) ama mevcutların semantiği değişmez.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/usermanuals/tr/reference/exit-codes.md` at line 113, The Turkish text still reads "Exit kodları 0-4" but must match the English doc and include the new stable code 5; update the sentence containing "Exit kodları 0-4" to "Exit kodları 0-5" (preserve the rest of the sentence and punctuation) so the bilingual parity and the documented compatibility range align with the English version.
🧹 Nitpick comments (1)
tools/check_no_analysis_refs.py (1)
144-144: 💤 Low valueOptional: simplify set comprehension.
The comprehension
{p for p in files}can be simplified toset(files)for better readability, as flagged by SonarCloud.♻️ Proposed refactor
- return sorted({p for p in files}) + return sorted(set(files))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/check_no_analysis_refs.py` at line 144, Replace the set comprehension in the return of the function that currently uses "sorted({p for p in files})" with "sorted(set(files))" to improve readability; locate the return statement in tools/check_no_analysis_refs.py (the line returning sorted({p for p in files})) and change it to use set(files) instead of the comprehension.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Around line 9-16: Update the CHANGELOG entry to soften the claim about CLI/web
parity: replace "same 9-step flow as the in-browser wizard" with wording like
"same nine user-visible stages and same generated YAML" (or similar) and note
that internal step IDs/trainer–model order may differ as documented in
forgelm/wizard/_orchestrator.py; ensure the sentence about the wizard module
split remains but remove the stronger parity wording.
- Around line 108-119: Replace the explicit mentions of the gitignored paths
'docs/marketing/' and 'docs/analysis/' in the CHANGELOG entry with the generic
term "working-memory directories"; update the sentence that currently reads
those paths to instead read something like "working-memory directories
(gitignored) are now strictly ignored..." and keep the rest of the paragraph
intact, ensuring any subsequent references (e.g.,
`docs/standards/documentation.md`, `tools/check_no_analysis_refs.py`, `_EXEMPT`)
remain unchanged.
In `@docs/design/wizard_mode.md`:
- Around line 15-16: The doc introduces a new public exit code symbol
EXIT_WIZARD_CANCELLED = 5 which violates the repository's public exit-code
contract (0–4); either remove or mark EXIT_WIZARD_CANCELLED as internal/private
(not part of the public contract) and revert public documentation to the
approved contract, or update the central contract docs in
docs/standards/error-handling.md to include the new code and ensure all
wrappers/CI are updated; locate usages/definitions of EXIT_WIZARD_CANCELLED and
the wizard_mode.md text and either change the symbol to an internal-only
constant or coordinate a contract update in the standards document to keep the
public API consistent.
- Around line 47-105: The documentation's implemented flow for
forgelm/wizard::run_wizard is out of sync with site/js/wizard.js; update the
steps in docs/design/wizard_mode.md to match the browser order (welcome →
use-case → trainer → model → dataset → training → compliance → operations →
review), remove the standalone Strategy step or explicitly note its relocation
into Model/Training UI, and add the missing Review step and any moved Trainer
details so the text and the listed step numbers/names align exactly with
site/js/wizard.js (reference run_wizard, Strategy, Trainer, and the
Review/Operations step names when editing).
In `@docs/reference/architecture-tr.md`:
- Around line 148-149: The heading "### `wizard/`" is missing the required blank
line after it; insert a single blank line immediately after the `### `wizard/``
heading so the following paragraph (starting with "Etkileşimli CLI
sihirbazı...") is separated as per docs/standards and markdownlint MD022
expectations.
In `@docs/reference/architecture.md`:
- Around line 148-149: Missing blank line before the "### `wizard/`" heading;
add a single empty line immediately above the "### `wizard/`" heading in
docs/reference/architecture.md so the markdown linter recognizes the heading
separation and preserves readability (locate the "### `wizard/`" heading text
and insert one blank line directly before it).
In `@forgelm/cli/_exit_codes.py`:
- Around line 15-31: Remove EXIT_WIZARD_CANCELLED from the public contract: keep
the constant (EXIT_WIZARD_CANCELLED) for internal use but do not include it in
the _PUBLIC_EXIT_CODES frozenset so the public contract remains 0–4; update the
nearby comment to note that EXIT_WIZARD_CANCELLED is internal-only and not part
of the exported contract, and ensure only EXIT_SUCCESS, EXIT_CONFIG_ERROR,
EXIT_TRAINING_ERROR, EXIT_EVAL_FAILURE, and EXIT_AWAITING_APPROVAL remain in
_PUBLIC_EXIT_CODES.
In `@forgelm/wizard/_byod.py`:
- Around line 351-355: The printed follow-up shell commands use quickstart_path
unquoted which will break when the path contains spaces; update the two _print
calls that display the suggested commands (the ones referencing quickstart_path
and the "Running: forgelm --config ..." / "forgelm --config ..." messages) to
wrap/quote the quickstart_path (e.g., with single quotes) so the printed
commands are valid shell invocations even when quickstart_path contains spaces.
In `@forgelm/wizard/_collectors.py`:
- Around line 343-367: Collectors like _collect_rope_scaling currently overwrite
any existing advanced settings by always recomputing defaults; change them to
preserve loaded values unless the user explicitly chooses to recompute: in
_collect_rope_scaling check for an existing rope_scaling value (or accept an
optional existing parameter), if present show it and prompt "Keep existing RoPE
scaling?" (default yes) and return the existing dict if the user keeps it,
otherwise only then compute new rope_factor and type; apply the same pattern to
the GaLore-related collectors referenced at 394-412 (check for existing galore_*
tuning, prompt to keep vs recompute, and only compute defaults when creating new
or when user confirms recompute).
- Around line 258-335: The _collect_trainer_hyperparameters helper currently
always uses schema literals as prompt defaults, which causes reruns started from
an existing config to overwrite prior per-trainer knobs; change
_collect_trainer_hyperparameters to accept an optional existing TrainingConfig
(or plain dict) and use its trainer-specific fields as the prompt defaults
(e.g., use existing.dpo_beta, simpo_gamma, kto_beta, orpo_beta,
grpo_num_generations, grpo_max_completion_length, grpo_reward_model) instead of
the hard-coded numbers/strings, and update the caller (_step_trainer) to pass
the loaded config when available so Enter preserves prior values.
- Around line 200-250: The helper _collect_safety_config currently rebuilds the
safety block with hardcoded defaults and overwrites loaded settings; change it
to accept an optional existing_safety: Optional[Dict[str, Any]] parameter
(seeded from _step_evaluation) and use existing_safety values as the defaults
for prompts (e.g., classifier default, test_prompts default, scoring default,
max_safety_regression default, min_safety_score/min_classifier_confidence, and
severity_thresholds), and when a prompt returns empty/unchanged preserve the
existing field instead of replacing it; ensure you only add new keys when the
user explicitly provides new values and keep existing
track_categories/severity_thresholds if not modified.
In `@forgelm/wizard/_orchestrator.py`:
- Around line 1015-1018: The call to _validate_generated_config(config_filename)
currently only logs errors so the flow still prompts to start training even when
validation failed; change _validate_generated_config to return a boolean success
flag (True on valid, False on invalid) and update the caller in the block around
_validate_generated_config, _clear_wizard_state, and _prompt_yes_no so it checks
the returned flag and skips the prompt/short-circuits the immediate training
branch when validation returned False (i.e., only call _prompt_yes_no("Start
training now?", default=False) and print the forgelm command when
_validate_generated_config(...) returned success).
In `@site/js/wizard.js`:
- Around line 2026-2030: The new wizard row uses hard-coded English strings for
the label and hint (created in maxLengthRow via el(..., text: 'Max sequence
length') and el(..., text: 'Tokens per training example...')), so wrap those
strings with the translation helper (tr(...)) like the other wizard fields and
add corresponding keys/entries to the locales file(s) used by the wizard
training UI; update the label and hint usages (referencing maxLengthRow and
maxLengthInput) to use the new locale keys to ensure proper translation.
- Around line 1244-1278: The storage handler must treat e.newValue === null
(remote reset/delete) as an explicit reset: inside the
window.addEventListener('storage', ...) callback (the block that checks e.key
!== STORAGE_KEY and uses loadState(), state, and render()), detect if e.newValue
=== null and in that case clear the in-memory state object (delete all keys from
state), set any derived state to its empty/default form if applicable, then call
render() so the visible wizard reflects the reset; keep the existing try/catch
and console.warn behavior for other errors and still return early when modal has
'hidden'.
In `@tests/test_wizard_phase22.py`:
- Around line 1592-1611: The test test_auto_revert_default_reflects_existing
currently uses a non-specific assertion that will always pass; update it to
assert the exact expected outcome after operator explicitly answers "no" to
auto-revert: replace the loose check on
state.config["evaluation"].get("auto_revert") in (True, False) with a precise
assertion that the value is False (e.g., assert
state.config["evaluation"].get("auto_revert") is False) so that the behavior of
_WizardState/config after wizard._orchestrator._step_evaluation is properly
verified.
In `@tools/check_anchor_resolution.py`:
- Around line 327-333: The help text promises that passing "--exclude ''" will
scan everything but _resolve_excludes currently filters out empty values and
thus falls back to the default excludes; update _resolve_excludes so that it
distinguishes "no --exclude given" (keep defaults) from " --exclude given with
an explicit empty string" (treat as an explicit override to use an empty exclude
set), i.e. stop dropping empty-string entries when the user explicitly provided
excludes and return an empty set in that case; key symbol: _resolve_excludes.
---
Outside diff comments:
In `@docs/usermanuals/en/reference/cli.md`:
- Around line 205-214: The English CLI exit codes table is missing exit code 5
present in the Turkish version; update the table in
docs/usermanuals/en/reference/cli.md to include a new row for "5 | Wizard
cancelled" (matching the Turkish description) so both language docs remain in
parity; ensure the new entry follows the same table formatting and update the
"See [Exit Codes]" reference only if needed.
In `@docs/usermanuals/tr/reference/exit-codes.md`:
- Line 113: The Turkish text still reads "Exit kodları 0-4" but must match the
English doc and include the new stable code 5; update the sentence containing
"Exit kodları 0-4" to "Exit kodları 0-5" (preserve the rest of the sentence and
punctuation) so the bilingual parity and the documented compatibility range
align with the English version.
---
Nitpick comments:
In `@tools/check_no_analysis_refs.py`:
- Line 144: Replace the set comprehension in the return of the function that
currently uses "sorted({p for p in files})" with "sorted(set(files))" to improve
readability; locate the return statement in tools/check_no_analysis_refs.py (the
line returning sorted({p for p in files})) and change it to use set(files)
instead of the comprehension.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5ea8c55e-ca8f-4526-b5b7-c65350c8e168
📒 Files selected for processing (88)
.claude/skills/cut-release/SKILL.md.claude/skills/sync-bilingual-docs/SKILL.md.codacy.yaml.github/workflows/nightly.yml.gitignore.markdownlint.jsonCHANGELOG.mdCLAUDE.mdREADME.mddocs/analysis/code_reviews/closure-plan-202604300906.mddocs/analysis/code_reviews/foundation-pr-review-prompt.mddocs/analysis/code_reviews/ghost-features-analysis-20260502.mddocs/analysis/code_reviews/master-review-opus-202604300906.mddocs/analysis/code_reviews/wave2a-round2-review-prompt.mddocs/analysis/code_reviews/wave2b-final-merge-review-prompt.mddocs/analysis/code_reviews/wave3-followup-review-prompt.mddocs/analysis/code_reviews/wave3-review-prompt.mddocs/analysis/code_reviews/wave4-followup-review-prompt.mddocs/analysis/code_reviews/wave4-review-prompt.mddocs/analysis/code_reviews/wave5-review-prompt.mddocs/design/data_audit_cli_split.mddocs/design/gdpr_erasure.mddocs/design/iso27001_soc2_alignment.mddocs/design/wizard_mode.mddocs/guides/quickstart-tr.mddocs/guides/quickstart.mddocs/product_strategy-tr.mddocs/product_strategy.mddocs/reference/architecture-tr.mddocs/reference/architecture.mddocs/reference/usage-tr.mddocs/reference/usage.mddocs/roadmap-tr.mddocs/roadmap.mddocs/roadmap/phase-11-5-backlog.mddocs/roadmap/phase-12-6-closure-cycle.mddocs/roadmap/phase-12-data-curation-maturity.mddocs/roadmap/releases.mddocs/standards/architecture.mddocs/standards/coding.mddocs/standards/documentation.mddocs/standards/localization.mddocs/usermanuals/en/compliance/gdpr.mddocs/usermanuals/en/compliance/human-approval-gate.mddocs/usermanuals/en/reference/cli.mddocs/usermanuals/en/reference/exit-codes.mddocs/usermanuals/tr/compliance/gdpr.mddocs/usermanuals/tr/compliance/human-approval-gate.mddocs/usermanuals/tr/reference/cli.mddocs/usermanuals/tr/reference/exit-codes.mdforgelm/cli/__init__.pyforgelm/cli/_exit_codes.pyforgelm/cli/_parser.pyforgelm/cli/_wizard.pyforgelm/config.pyforgelm/wizard.pyforgelm/wizard/__init__.pyforgelm/wizard/_byod.pyforgelm/wizard/_collectors.pyforgelm/wizard/_defaults.jsonforgelm/wizard/_io.pyforgelm/wizard/_orchestrator.pyforgelm/wizard/_state.pypyproject.tomlsite/compliance.htmlsite/contact.htmlsite/css/style.csssite/features.htmlsite/guide.htmlsite/index.htmlsite/js/translations.jssite/js/wizard.jssite/js/wizard_defaults.jssite/privacy.htmlsite/quickstart.htmlsite/terms.htmltests/conftest.pytests/test_check_bilingual_parity.pytests/test_check_wizard_defaults_sync.pytests/test_cli_phase10.pytests/test_phase12_5.pytests/test_wizard_phase11.pytests/test_wizard_phase22.pytools/check_anchor_resolution.pytools/check_no_analysis_refs.pytools/check_wizard_defaults_sync.pytools/check_yaml_snippets.pytools/generate_wizard_defaults.py
💤 Files with no reviewable changes (12)
- docs/analysis/code_reviews/wave3-followup-review-prompt.md
- docs/analysis/code_reviews/ghost-features-analysis-20260502.md
- docs/analysis/code_reviews/wave4-followup-review-prompt.md
- docs/analysis/code_reviews/wave2a-round2-review-prompt.md
- docs/analysis/code_reviews/foundation-pr-review-prompt.md
- docs/analysis/code_reviews/master-review-opus-202604300906.md
- docs/standards/localization.md
- docs/analysis/code_reviews/wave3-review-prompt.md
- docs/analysis/code_reviews/wave5-review-prompt.md
- forgelm/wizard.py
- docs/analysis/code_reviews/wave2b-final-merge-review-prompt.md
- docs/analysis/code_reviews/wave4-review-prompt.md
| > cross-tab `storage` sync, distinct `EXIT_WIZARD_CANCELLED = 5` | ||
| > exit code, best-effort `readline` integration. Schema-default |
There was a problem hiding this comment.
Don't introduce a new public exit code here without updating the contract.
This doc advertises EXIT_WIZARD_CANCELLED = 5, but the repo’s public exit-code contract is still 0/1/2/3/4. Unless that contract changed in the same release, this is a compatibility break for wrappers and CI that only handle the documented range. As per coding guidelines, "Error paths must follow docs/standards/error-handling.md with exit codes 0/1/2/3/4 as a public contract".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/design/wizard_mode.md` around lines 15 - 16, The doc introduces a new
public exit code symbol EXIT_WIZARD_CANCELLED = 5 which violates the
repository's public exit-code contract (0–4); either remove or mark
EXIT_WIZARD_CANCELLED as internal/private (not part of the public contract) and
revert public documentation to the approved contract, or update the central
contract docs in docs/standards/error-handling.md to include the new code and
ensure all wrappers/CI are updated; locate usages/definitions of
EXIT_WIZARD_CANCELLED and the wizard_mode.md text and either change the symbol
to an internal-only constant or coordinate a contract update in the standards
document to keep the public API consistent.
|
Both review verdicts (APPROVE with NITs) absorbed. Single concrete action taken: §1 / §5 NIT — PR description undercount (16 → 17 commits, 4th bucket missing). PR description updated to reflect:
Other findings — no action needed (confirmed by both reviewers as non-blocking):
Both reviewers explicitly recommend APPROVE / merge. No blockers. |
External-scanner verdict — both Quality Gates fail on pre-existing intentional fixturesCross-checked the failing scanner outputs against the silencing baseline established in SonarCloud — all 10 hotspots verified pre-existing false positivesFetched via the SonarCloud hotspot API. Every entry is LOW severity and lives in test files where the literal IS the test's load-bearing assertion: These are the exact 8 patterns documented as Safe / Reviewed in The Safe / Reviewed marks made on PR #40's SonarCloud view do not transfer automatically to PR #42's view — that's a SonarCloud cross-PR artefact, not a code regression. Suggested action: re-mark as Safe on the PR #42 dashboard (one-click per hotspot) or set a per-folder rule exemption for Codacy — 10 issues (5 high · 2 medium · 3 minor)Codacy doesn't expose an unauthenticated PR-issue API. From the categorical breakdown (
PR #42's If any of the 10 Codacy issues is a genuinely new pattern (i.e. something that landed in a commit other than Verdict
No code-level action recommended. Maintainer call: re-mark Safe on SonarCloud's PR #42 view, or merge with the failing Quality Gate on the rationale that the underlying findings are documented + intentional. |
Closes 25 inline review comments (16 CodeRabbit + 9 Gemini) plus 2 outside-diff items and 1 nitpick on PR #42. Two reviewer findings explicitly skipped — see "Skipped" below. ## Wizard idempotent re-run — preserves prior answers + lets operator disable previously-enabled gates The pre-fix ``_step_evaluation`` deepcopy of the loaded YAML left ``auto_revert`` / ``safety`` / ``benchmark`` / ``llm_judge`` blocks intact even when the operator answered "no" to a gate prompt — making it impossible to disable a feature on a ``--wizard-start-from`` rerun. Fixed across all four gates with explicit pop-on-decline semantics: - **auto_revert:** explicit "no" sets ``auto_revert = False`` and pops ``max_acceptable_loss`` so the rebuild reflects the answer. - **safety / benchmark / llm_judge:** when the collector returns ``None`` (operator declined the gate), the existing block is popped. Paired with a ``default_enabled`` plumbing pass on the gate prompts so a bare-Enter rerun of a previously-enabled gate keeps the block intact: - ``_collect_safety_config(existing=...)`` — uses the loaded ``classifier``, ``test_prompts``, ``scoring``, ``max_safety_regression``, ``min_safety_score``, ``min_classifier_confidence``, ``track_categories`` / ``severity_thresholds`` as prompt defaults. - ``_collect_trainer_hyperparameters(trainer_type, existing=...)`` — uses the loaded ``dpo_beta`` / ``simpo_beta+gamma`` / ``kto_beta`` / ``orpo_beta`` / ``grpo_*`` as prompt defaults; no more schema-literal regression on a rerun. - ``_collect_rope_scaling(max_length, existing=...)`` — offers "Keep existing RoPE scaling?" when a prior block is loaded. - ``_collect_galore_config(use_galore, existing=...)`` — same "Keep existing GaLore tuning?" prompt. - ``_collect_benchmark(existing=...)`` and ``_collect_judge(existing=...)`` — gate prompt defaults to "yes" when the loaded block has the inner contract field set, so a bare Enter preserves; explicit "n" still drops. Tests updated to match the new semantics: - ``test_existing_benchmark_and_judge_preserved_on_enter`` (renamed) pins the bare-Enter preserve path. - ``test_existing_benchmark_and_judge_dropped_on_explicit_no`` (new) pins the explicit-disable contract. - ``test_auto_revert_default_reflects_existing`` tightened to ``is False`` (was ``in (True, False)`` — non-specific). ## Wizard validation gates "Start training now?" prompt ``_validate_generated_config`` now returns ``bool``; the orchestrator short-circuits the "Start training now?" prompt and the immediate- train branch when validation fails. Pre-fix the prompt was offered even after the operator was told the YAML would fail to load. ## Wizard saves config atomically ``_save_config_to_file`` and the fallback save path now route through new ``_atomic_yaml_write`` helper (``tempfile`` + ``fsync`` + ``os.replace``). An interrupted save during a ``--wizard-start-from`` rerun can no longer corrupt the operator's pre-existing config. ## Exit-code contract sync ``EXIT_WIZARD_CANCELLED = 5`` was added to the contract during Phase 22 review-cycle 2 but the standards / user-manual docs were not all updated in lockstep. Closed: - ``docs/standards/error-handling.md`` — adds the constant + table row. - ``docs/usermanuals/en/reference/cli.md`` — adds row "5 | Wizard cancelled". - ``docs/usermanuals/tr/reference/cli.md`` — same TR mirror. - ``docs/usermanuals/tr/reference/exit-codes.md`` — "Exit kodları 0-4" → "Exit kodları 0-5" matching the EN file. ## Site / JS - ``site/js/wizard.js`` storage-event handler now treats ``e.newValue === null`` (cross-tab reset / delete) as an explicit reset: clears in-memory state, restores defaults, and re-renders. Pre-fix the current tab kept editing a config that no longer existed. - Max-sequence-length wizard row now uses ``tr('wizard.training. max_length.{label,hint}')`` instead of hard-coded English; new locale entries added across all 6 languages so the field localises with the rest of the wizard UI. ## Smaller polish - ``forgelm/wizard/_byod.py`` — ``_finalize_quickstart_path`` quotes the printed ``--config <path>`` via ``shlex.quote`` so suggested shell commands are runnable even when the path contains spaces. Same fix applied in ``_orchestrator.py``'s post-save messages. - ``tools/check_anchor_resolution.py`` — ``--exclude`` help text now honestly describes that empty values are dropped (and points at ``--exclude __none__`` as the no-exclude path) instead of promising a "scan everything" path the code does not execute. - ``tools/check_no_analysis_refs.py`` — replace ``sorted({p for p in files})`` with ``sorted(set(files))`` (readability nit). ## Docs - ``CHANGELOG.md`` ``[Unreleased]`` Phase 22 entry softened to "covers the same nine user-visible stages as the in-browser wizard and produces the same generated YAML" with a cross-reference to the inline NOTE about by-design step-ID divergence. Working-memory cleanup paragraph rewritten to use the generic "working-memory directories" term so the policy-announcement no longer names the gitignored paths verbatim. Same generic rewrite applied to ``docs/roadmap/releases.md``; the corresponding ``CHANGELOG.md`` / ``docs/roadmap/releases.md`` entries removed from ``check_no_analysis_refs.py::_EXEMPT`` because they no longer need the carve-out. - ``docs/design/wizard_mode.md`` — prelude to the step list cross- references the web wizard's different IDs / ``trainer ↔ model`` order swap so readers don't expect lockstep between the two surfaces. - ``docs/reference/architecture{,-tr}.md`` — blank line inserted after the ``### `wizard/`` heading to satisfy markdownlint MD022. ## Skipped (with reason) - **Revert ``EXIT_WIZARD_CANCELLED = 5`` from the public contract** (CodeRabbit @ ``docs/design/wizard_mode.md:16`` and ``forgelm/cli/_exit_codes.py:31``). The 0–5 extension was a deliberate Phase 22 review-cycle 2 decision, captured in the CHANGELOG entry under the "Distinct exit code for wizard cancel" bullet. The right action for the documented-contract gap was to update the standards / user-manual docs (Group A above), not to walk the contract back. ## Validation - ruff check + format clean (178 files). - 14 doc / parity / consistency CI guards green. - ``forgelm --config config_template.yaml --dry-run`` valid. - 159 wizard tests pass; 191 check/cli tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docs/design/wizard_mode.md (1)
8-8: 💤 Low valueRemove blank lines inside blockquotes to fix markdown lint warnings.
Multiple linters flag blank lines inside blockquotes (MD028). While this doesn't affect functionality, it breaks markdown formatting conventions.
📝 Proposed fix
Remove the blank line at line 8 (between the two blockquote paragraphs in the status note) and line 19 (between the review-cycle 2 and PR-D blockquote paragraphs), or break them into separate blockquote blocks if the visual separation is intentional.
Also applies to: 19-19
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/wizard_mode.md` at line 8, Remove the blank lines inside blockquotes that trigger MD028: in the "status note" blockquote (between the two quote paragraphs) and in the blockquote containing "review-cycle 2" and "PR-D" (the blank line between those quote paragraphs); either delete those empty lines so the paragraphs remain in a single blockquote or split them into separate blockquote blocks if intentional spacing is desired, ensuring the blockquote markers (">") directly precede each paragraph line.forgelm/wizard/_collectors.py (1)
619-637: 💤 Low valueExtract Article 10 prompt strings to reduce duplication.
The same prompt strings appear three times each in the strict vs non-strict branches. Extracting them as module-level constants improves maintainability.
♻️ Proposed extraction
+_ARTICLE_10_COLLECTION_METHOD = "Article 10(2)(b): how was data collected?" +_ARTICLE_10_ANNOTATION = "Article 10(2)(b): annotation methodology" +_ARTICLE_10_BIASES = "Article 10(2)(f): known_biases" + def _collect_data_governance( ... if mandatory: ... collection_method = ( - _prompt("Article 10(2)(b): how was data collected?", existing["collection_method"]) + _prompt(_ARTICLE_10_COLLECTION_METHOD, existing["collection_method"]) if existing.get("collection_method") - else _prompt_required("Article 10(2)(b): how was data collected?") + else _prompt_required(_ARTICLE_10_COLLECTION_METHOD) ) # ... similar for annotation_process and known_biases else: - collection_method = _prompt("Article 10(2)(b): how was data collected?", existing.get("collection_method", "")) + collection_method = _prompt(_ARTICLE_10_COLLECTION_METHOD, existing.get("collection_method", ""))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@forgelm/wizard/_collectors.py` around lines 619 - 637, The prompt text literals for Article 10 are duplicated across the strict/non-strict branches; extract them into module-level constants (e.g., ARTICLE10_COLLECTION_PROMPT, ARTICLE10_ANNOTATION_PROMPT, ARTICLE10_KNOWN_BIASES_PROMPT) and replace the inline strings used with _prompt and _prompt_required in the collection_method, annotation_process, and known_biases assignments so both branches reference the same constants; keep existing calls to _prompt and _prompt_required unchanged, only substitute the literal strings with the new constants and update any imports/exports if needed.forgelm/wizard/_byod.py (1)
112-122: 💤 Low valueRedundant exception classes in chained except.
PermissionErrorandIsADirectoryErrorare subclasses ofOSError, so catching them separately alongsideOSErroris redundant — theOSErrorhandler already covers them. The same pattern appears at line 175. This is harmless but slightly confusing.♻️ Proposed simplification
- except (FileNotFoundError, ValueError) as exc: - _print(f" Ingest failed: {exc}") - return None - except (PermissionError, IsADirectoryError, OSError) as exc: - _print(f" Ingest failed due to filesystem error: {exc} — check permissions or output path.") - return None + except (FileNotFoundError, ValueError) as exc: + _print(f" Ingest failed: {exc}") + return None + except OSError as exc: + _print(f" Ingest failed due to filesystem error: {exc} — check permissions or output path.") + return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@forgelm/wizard/_byod.py` around lines 112 - 122, The except clause currently catches (PermissionError, IsADirectoryError, OSError) which is redundant because PermissionError and IsADirectoryError are subclasses of OSError; change the tuple to just (OSError,) to simplify the handler while keeping behavior the same, and apply the same simplification to the repeated handler later (the similar except tuple at line 175) so both places only catch OSError as exc.forgelm/wizard/_orchestrator.py (1)
982-990: 💤 Low valueConsider renaming the fallback variable to avoid shadowing the class name.
The pattern
ForgeConfig = Nonewhen the import fails works but triggers static analysis warnings about naming conventions. A clearer alternative uses a distinct name.♻️ Alternative pattern
try: from ..config import ForgeConfig + _forge_config_cls = ForgeConfig except ImportError: # pragma: no cover — config always present - ForgeConfig = None # type: ignore[assignment] - if ForgeConfig is not None: + _forge_config_cls = None + if _forge_config_cls is not None: try: - ForgeConfig.model_validate(data) + _forge_config_cls.model_validate(data)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@forgelm/wizard/_orchestrator.py` around lines 982 - 990, The code currently assigns ForgeConfig = None on import failure which shadows the imported class name and triggers static-analysis warnings; change the fallback variable to a distinct identifier (e.g., _ForgeConfig or ForgeConfigModule) and update the subsequent conditional and validation call accordingly so you check the new identifier (e.g., if _ForgeConfig is not None) and call its model_validate method name (e.g., _ForgeConfig.model_validate(data)) to preserve behavior without shadowing the class name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/check_no_analysis_refs.py`:
- Around line 49-51: The current _PROHIBITED_RE only matches
docs/(analysis|marketing)/ or relative refs when followed by a hardcoded set of
subfolders, so paths like "analysis/new_topic/..." slip through; update
_PROHIBITED_RE to match any reference to analysis or marketing subtrees (e.g.,
match docs/(analysis|marketing)/ and also (?:^|[^A-Za-z./])(analysis|marketing)/
followed by any path segment) so it flags arbitrary subfolders instead of just
the fixed list—modify the regex in tools/check_no_analysis_refs.py where
_PROHIBITED_RE is defined to broaden the second alternative to allow any
following path component.
- Around line 151-154: The current except block around
path.read_text(encoding="utf-8") swallows OSError/UnicodeDecodeError and returns
no findings (false-green); change it to fail-closed by recording the error into
the findings list instead of silently returning: catch (OSError,
UnicodeDecodeError) and append a finding that includes the path (path) and the
exception message (str(e)) or otherwise raise the exception so CI fails; update
the handler around path.read_text to use the findings list (findings) or
re-raise to ensure unreadable files produce a visible failure.
---
Nitpick comments:
In `@docs/design/wizard_mode.md`:
- Line 8: Remove the blank lines inside blockquotes that trigger MD028: in the
"status note" blockquote (between the two quote paragraphs) and in the
blockquote containing "review-cycle 2" and "PR-D" (the blank line between those
quote paragraphs); either delete those empty lines so the paragraphs remain in a
single blockquote or split them into separate blockquote blocks if intentional
spacing is desired, ensuring the blockquote markers (">") directly precede each
paragraph line.
In `@forgelm/wizard/_byod.py`:
- Around line 112-122: The except clause currently catches (PermissionError,
IsADirectoryError, OSError) which is redundant because PermissionError and
IsADirectoryError are subclasses of OSError; change the tuple to just (OSError,)
to simplify the handler while keeping behavior the same, and apply the same
simplification to the repeated handler later (the similar except tuple at line
175) so both places only catch OSError as exc.
In `@forgelm/wizard/_collectors.py`:
- Around line 619-637: The prompt text literals for Article 10 are duplicated
across the strict/non-strict branches; extract them into module-level constants
(e.g., ARTICLE10_COLLECTION_PROMPT, ARTICLE10_ANNOTATION_PROMPT,
ARTICLE10_KNOWN_BIASES_PROMPT) and replace the inline strings used with _prompt
and _prompt_required in the collection_method, annotation_process, and
known_biases assignments so both branches reference the same constants; keep
existing calls to _prompt and _prompt_required unchanged, only substitute the
literal strings with the new constants and update any imports/exports if needed.
In `@forgelm/wizard/_orchestrator.py`:
- Around line 982-990: The code currently assigns ForgeConfig = None on import
failure which shadows the imported class name and triggers static-analysis
warnings; change the fallback variable to a distinct identifier (e.g.,
_ForgeConfig or ForgeConfigModule) and update the subsequent conditional and
validation call accordingly so you check the new identifier (e.g., if
_ForgeConfig is not None) and call its model_validate method name (e.g.,
_ForgeConfig.model_validate(data)) to preserve behavior without shadowing the
class name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a7b694a1-f100-44b5-beaf-6c88d3390125
📒 Files selected for processing (18)
CHANGELOG.mddocs/design/wizard_mode.mddocs/reference/architecture-tr.mddocs/reference/architecture.mddocs/roadmap/releases.mddocs/standards/error-handling.mddocs/usermanuals/en/reference/cli.mddocs/usermanuals/tr/reference/cli.mddocs/usermanuals/tr/reference/exit-codes.mdforgelm/wizard/_byod.pyforgelm/wizard/_collectors.pyforgelm/wizard/_orchestrator.pyforgelm/wizard/_state.pysite/js/translations.jssite/js/wizard.jstests/test_wizard_phase22.pytools/check_anchor_resolution.pytools/check_no_analysis_refs.py
✅ Files skipped from review due to trivial changes (6)
- docs/standards/error-handling.md
- docs/reference/architecture.md
- docs/reference/architecture-tr.md
- docs/roadmap/releases.md
- tools/check_anchor_resolution.py
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
- docs/usermanuals/en/reference/cli.md
- docs/usermanuals/tr/reference/exit-codes.md
- site/js/wizard.js
- forgelm/wizard/_state.py
…its) Closes 2 valid issues + 4 nitpicks from PR #42's second review pass. ## tools/check_no_analysis_refs.py — guard hardening - **_PROHIBITED_RE broadened to catch arbitrary working-memory subfolders.** The pre-fix regex hardcoded a fixed list of subfolder names (``code_reviews|QKV-Core|Trion|ART|...``); any newly-created folder under ``analysis/`` or ``marketing/`` slipped past the guard. The relative-form alternative now matches ``(analysis|marketing)/<segment>`` followed by either another path separator or a known file extension (``.md`` / ``.py`` / ``.html`` / ``.js`` / ``.css`` / ``.yaml`` / ``.yml`` / ``.json`` / ``.txt`` / ``.jsonl`` / ``.sh`` / ``.cfg`` / ``.toml`` / ``.ini``). The continuation-or-extension constraint preserves the existing no-false-positive-on-prose property: phrases like "marketing/product copy" and "analysis/report results" stay clean (slash-as-"or" in English prose), while real path references (``marketing/strategy/``, ``analysis/code_reviews/x.md``, ``marketing/some-future-folder/z.md``) are flagged. Self-exempt entry extended with the two example patterns the guard's own docstring uses to document this behaviour. - **_check_file fail-closed on read errors.** Pre-fix, an unreadable file (``OSError``) or a file with non-UTF-8 bytes (``UnicodeDecodeError``) silently returned an empty findings list, so CI went green on a stale or malformed input. Now records the exception as a finding (with class + message) so the failure surfaces in the same channel as a real prohibited reference. ## docs/design/wizard_mode.md — markdownlint MD028 - Three distinct status-note blockquotes (``Status``, ``Review-cycle 2 additions``, ``PR-D additions``) had bare blank lines between them, which markdown renderers treat as a single blockquote with a gap and which markdownlint MD028 flags. Split with explicit ``<!-- separator -->`` HTML comments so each block stays semantically distinct without violating the rule. ## forgelm/wizard/_byod.py — except-tuple simplification - ``except (PermissionError, IsADirectoryError, OSError)`` → ``except OSError``. Both ``PermissionError`` and ``IsADirectoryError`` subclass ``OSError``; the bare catch handles every filesystem failure mode. - ``except (FileNotFoundError, ValueError, OSError)`` → ``except (OSError, ValueError)``. ``FileNotFoundError`` subclasses ``OSError``; ``ValueError`` is independent and intentional (audit raises it on malformed JSONL), so it stays in the tuple — this is a deviation from the reviewer's literal "OSError only" suggestion to preserve behaviour. ## forgelm/wizard/_collectors.py — Article 10 prompt constants - Three prompt strings (collection method / annotation methodology / known biases) were duplicated across the strict-tier and non-strict-tier branches of ``_collect_data_governance``. Hoisted to module-level ``_ARTICLE10_{COLLECTION,ANNOTATION,KNOWN_BIASES}_ PROMPT`` constants so a future EU-AI-Act wording tweak only changes one site instead of six. ## forgelm/wizard/_orchestrator.py — class-name shadow - ``from ..config import ForgeConfig`` followed by ``ForgeConfig = None`` on the ``ImportError`` path shadowed the imported class name (static analyzers flag the rebind). Renamed both occurrences to a module-private ``_ForgeConfig`` identifier so the import- failure fallback no longer collides with the public class name. ## Validation - ruff check + format clean (178 files). - 14 doc / parity / consistency CI guards green. - ``forgelm --config config_template.yaml --dry-run`` valid. - 132 wizard-phase22 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Promotes the [Unreleased] block into the [0.5.5] section under a new "Post-merge follow-up" subsection that captures the three PRs landed on top of the paper-merged closure cycle: - PR #40 (2026-05-08) — Phase 22 CLI wizard parity-with-web - PR #41 (2026-05-09) — Site documentation correction sweep + i18n parity backfill - PR #42 (2026-05-10) — Release-prep + nightly pip-audit gate fix + two review-absorption rounds The CHANGELOG [0.5.5] header date is bumped to 2026-05-10 (the actual PyPI release date) and a fresh empty [Unreleased] block is added at the top. __api_version__ stays at 1.0.0 — Phase 22 is a CLI/wizard surface change, not a Python library API addition; the 30-symbol forgelm.__all__ surface that this release anchors was already added during the Phase 12.6 closure cycle (paper-merged as v0.5.5). Roadmap (en + tr) marks v0.5.5 as released; "Latest release on PyPI" flipped from v0.5.0 to v0.5.5; "Current state" sentence simplified (no more "merged but tag deferred" framing). releases.md drops the "Next PyPI release (slot pending)" section since v0.5.5 now bundles everything that was staged in [Unreleased]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>




Summary
Promotes 17 commits from
developmenttomain, covering two merged feature PRs, the working-memory documentation policy, and a release-preparation pass.forgelm/wizard/), schema-driven defaults SOT (tools/generate_wizard_defaults.py+_defaults.json), idempotent re-run via--wizard-start-from <yaml>, and 5 review absorption cycles (PR-A through PR-E + 18 verified findings sweep).compliance.config_hash,compliance.human_approval), ghost CLI flags (--model-card), and stale paths (forgelm/data_audit.py).46d2518) —docs/marketing/anddocs/analysis/are now strictly gitignored with zero exceptions; newtools/check_no_analysis_refs.pyCI guard prevents public-tree citations into either directory.ecf3dcc) — added after the initial PR description: nightly pip-audit--ignore-vuln CVE-2026-1839stop-gap with documented rationale + remove-after condition (closes #37); CHANGELOG[Unreleased]rewritten in public-facing style (grouped by user-visible capability, no internal review-cycle / wave-N / round-N terminology);docs/roadmap.md+roadmap-tr.mdadd Phase 22 + site sweep rows;docs/roadmap/releases.mdmarks v0.5.5 as "Merged on main; PyPI tag deferred" and adds an "upcoming PyPI release" section that surfaces the0.5.6vs0.6.0version-slot decision for the maintainer.What changed (high level)
docs/analysis/)check_no_analysis_refs.py,check_wizard_defaults_sync.py,generate_wizard_defaults.py(generator),check_anchor_resolution.pyhardeningtests/test_wizard_phase22.py(1759 new lines), updates totest_cli_phase10.py,test_phase12_5.py,test_wizard_phase11.pyforgelm/wizard/package split (orchestrator, state, collectors, BYOD, IO, defaults JSON)--wizard-start-from, refreshed_exit_codes.py(EXIT_WIZARD_CANCELLED = 5), updated_parser.pypyproject.toml: package data inclusion forforgelm.wizard/_defaults.json.github/workflows/nightly.yml:--ignore-vuln CVE-2026-1839flag added to the pip-audit stepTest plan
ecf3dccre-run)ruff check + formatclean across forgelm/ tools/ tests/forgelm --config config_template.yaml --dry-run✓developmentafter 5 review cycles + independent verification0.5.6vs0.6.0) before tag pushNotes
pyproject.tomlstays at0.5.5. The version-slot decision for the upcoming tagged release is documented indocs/roadmap/releases.md("Next PyPI release") with rationale on each side.[Unreleased]section is the staging area for the upcoming PyPI release. Once the version-slot decision is made, the[Unreleased]header gets promoted to the chosen version (## [0.5.6]or## [0.6.0]) with the release date, the version is bumped inpyproject.toml, and the tag push triggerspublish.ymlto PyPI.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
--wizard-start-from, resumable state persistence across sessions, and 9-step guided flow with parity to in-browser wizard.Improvements
Documentation
--wizard-start-fromflag and exit codes.