feat(wizard): Phase 22 — CLI wizard parity with web + sub-package split by cemililik · Pull Request #40 · HodeTech/ForgeLM

cemililik · 2026-05-08T21:16:33Z

Summary

Closes the parity gap between forgelm --wizard (CLI) and the in-browser site/js/wizard.js documented in docs/analysis/code_reviews/2026-05-07_cli_wizard_ux_analysis.md + -2.md. 20 original findings (G1-G20) and 5 independent findings (I1-I5) verified against current code; this PR addresses the actionable subset.

What's in this PR

CLI wizard (`forgelm/wizard/`)

Adım sırası refactor (I1): compliance.risk_classification artık evaluation.safety'den önce soruluyor. v0.5.5'te tersine sıralı olduğu için F-compliance-110 strict-tier auto-coercion'ı yapısal olarak imkansızdı.
Trainer hyperparameter'ları (G1): DPO / SimPO / KTO / ORPO / GRPO için per-trainer knob'lar (dpo_beta, simpo_beta+gamma, kto_beta, orpo_beta, grpo_*).
PEFT method genişliği (G2): 6 strateji kartı (QLoRA / LoRA / DoRA / PiSSA / rsLoRA / GaLore).
GaLore varyantları (G9): 6 optimizer (3 base + 3 _layerwise).
RoPE tipleri (G10): linear / dynamic / yarn + longrope (4-of-4).
LoRA r schema parite (G11): DEFAULT_LORA_R 16 → 8.
Compliance derinliği (G5): EU AI Act Madde 9 (risk_assessment), Madde 10 (data.governance), Madde 11+Annex IV §1, Madde 12+17 (monitoring), GDPR Madde 5(1)(e)+17 (retention), evaluation.benchmark, evaluation.llm_judge, synthetic blokları.
High-risk auto-coercion (G8): strict tier seçildiğinde safety.enabled + require_human_approval otomatik True; F-compliance-110 schema-side gate'in front-stop'u.
Webhook tek-prompt (G15): literal URL veya env:VAR_NAME; urlparse validation; HTTP uyarıyla kabul.
Safety probe path fix (G16): importlib.resources.files("forgelm.safety_prompts") ile bundled probe set — pip install regression'u kapatıyor.
Persistence (G6): \$XDG_CACHE_HOME/forgelm/wizard_state.yaml snapshot (schema-versioned).
Step-diff preview (G7): her adım sonrası + key: value / ~ key: before → after.
Beginner / expert toggle (G13): beginner mode tutorial paragrafları.
Use-case integration (G12): quickstart-template listesi full wizard Step 2'de.
Navigation (G3): back/b + reset/r token'ları.
POPULAR_MODELS parite (G14): web wizard preset kartlarıyla aynı.
Full YAML preview summary (G17): _print_wizard_summary artık tam YAML dump ediyor.

Web wizard

I2: site/js/wizard.js header comment "7 step" → "9 step" + correct list.
I4: USE_CASE_PRESETS keys aligned with forgelm/quickstart.py::TEMPLATES (code-copilot → code-assistant, medical-tr → medical-qa-tr).

Mimari

forgelm/wizard.py (976 LOC monolit) → forgelm/wizard/ sub-package:

Submodule	LOC	Sorumluluk
`_io.py`	213	primitive prompt'lar + nav token'ları + hardware detection
`_state.py`	290	`_WizardState`, persistence, summary, schema-default constant'lar
`_collectors.py`	688	tüm `_collect_*` + strategy + use-case preset registry
`_byod.py`	333	Phase 11.5/12.5 BYOD + quickstart prelude
`_orchestrator.py`	566	step defs + driver + coercion + `run_wizard`
`__init__.py`	176	re-export shim (backward compat)

Hiçbir submodule warn-threshold'u (1000 LOC) aşmıyor.

Docs

docs/design/wizard_mode.md — 9-adım flow rewrite (I5).
docs/reference/architecture.md + -tr.md — sub-package referansı.
docs/guides/quickstart.md + -tr.md — wizard akış güncellemesi.
docs/roadmap/phase-11-5-backlog.md — link fix (_byod.py).
CHANGELOG.md — [Unreleased] ### Added Phase 22 entry.

Tests

Yeni: tests/test_wizard_phase22.py — 55 test (webhook parsing, safety probe path, trainer hyperparams, navigation, strict-tier coercion, persistence, step diff, use-case parity, schema-default parity).

Güncellenen: tests/test_wizard_phase11.py (3 _print_wizard_summary test'i yeni dict API), tests/test_phase12_5.py (3 patch hedefi sub-package binding'ine).

i18n kararı

CLI wizard English-only kalıyor (operatör explicit talebi). Web wizard'ın 6-dil katalogu korundu.

Test plan

pytest tests/ --ignore=tests/test_compliance.py — 1487 pass, 14 skip, 0 fail (5:49)
pytest tests/test_wizard_*.py tests/test_phase12_5.py — 108 pass, 0 fail
8/8 doc CI guards green (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks)
ruff check forgelm/wizard/ + ruff format --check — temiz
tools/check_module_size.py — 0 over fail-threshold; 5 grandfathered (unchanged)
forgelm --config config_template.yaml --dry-run — başarılı
End-to-end smoke: SFT/DPO/GRPO YAML'larının ForgeConfig.model_validate()'i geçtiği doğrulandı; high-risk auto-coerced config schema'ya uyum sağlıyor
Notebooks: 0 etkilenen (grep -rn "from forgelm import wizard")
CLI dispatcher (forgelm/cli/_wizard.py) ve pyproject.toml packages.find glob'u sub-package'ı otomatik kapsıyor

Reviewing

Independent review prompt prepared at /tmp/review-agent-prompt.md (içerik PR commit message'ında özetlenmiş).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Enhanced interactive configuration wizard with 9-step flow, keyboard navigation (back/reset controls), webhook URL parsing, safety probe resolution, and XDG-aware state persistence.
- Added compliance metadata and strict-tier auto-gating configuration in wizard.
- Introduced step-diff previewing and beginner/expert verbosity toggling.
Changed
- Minimum required torch version bumped from 2.1.0 to 2.3.0.
Documentation
- Updated wizard architecture and design documentation to reflect new 9-step flow and modernized capabilities.
Tests
- Added comprehensive test coverage for Phase 22 wizard modernization.

Closes the parity gap between `forgelm --wizard` (CLI) and the in-browser `site/js/wizard.js` documented in `docs/analysis/code_reviews/2026-05-07_cli_wizard_ux_analysis.md` + `-2.md`. 20 original findings (G1-G20) and 5 independent findings (I1-I5) were verified against current code; this PR addresses the actionable subset. What's covered (CLI): - **Step reorder (I1):** `compliance.risk_classification` is now collected BEFORE `evaluation.safety` so the F-compliance-110 strict-tier auto-coercion can fire reliably. The v0.5.5 wizard asked safety first, then risk — silently producing high-risk configs that failed to load. - **Trainer-specific hyperparameters (G1):** `dpo_beta`, `simpo_beta` / `simpo_gamma`, `kto_beta`, `orpo_beta`, `grpo_num_generations`, `grpo_max_completion_length`, `grpo_reward_model` are now surfaced per `trainer_type`. SFT short-circuits. - **PEFT method breadth (G2):** the strategy step now offers all four schema-supported `lora.method` values (`lora`, `dora`, `pissa`, `rslora`) plus GaLore as a separate axis (6 cards total). - **GaLore optimiser variants (G9):** all six `galore_optim` Literal values surfaced, including the three `_layerwise` siblings. - **RoPE scaling types (G10):** `longrope` joined `linear` / `dynamic` / `yarn` (4-of-4 schema coverage). - **LoRA `r` schema parity (G11):** `DEFAULT_LORA_R` lowered from `16` → `8` to match `LoraConfigModel.r`. - **Compliance depth (G5):** `compliance` (Article 11 + Annex IV §1) plus optional `risk_assessment` (Article 9), `data.governance` (Article 10), `retention` (GDPR Article 5(1)(e) + 17), `monitoring` (Article 12+17), `evaluation.benchmark`, `evaluation.llm_judge`, `synthetic` blocks now configurable. - **High-risk auto-coercion (G8):** `risk_classification ∈ {high-risk, unacceptable}` automatically enables `evaluation.safety` + `evaluation.require_human_approval` with a visible operator notice. Front-stops the schema-side `F-compliance-110` `ConfigError`. - **Webhook URL parsing (G15):** single prompt accepts a literal URL or `env:VAR_NAME` reference; `urlparse`-validated; HTTPS recommended (HTTP accepted with warning). - **Safety probe path (G16):** bundled probe set resolved through `importlib.resources.files("forgelm.safety_prompts")`, fixing the `pip install` regression where `configs/safety_prompts/general_safety.jsonl` was the wizard default but never shipped in the wheel. - **Configuration summary (G17):** `_print_wizard_summary` now dumps the full YAML alongside the labelled headline. - **Persistence (G6):** state snapshot at `$XDG_CACHE_HOME/forgelm/wizard_state.yaml` (or `~/.cache/forgelm/wizard_state.yaml`) saved after every completed step. Schema versioned (`v: 1`); cleared on successful completion. - **Step-diff preview (G7):** each completed step prints `+ key.path: value` / `~ key.path: before → after`. - **Beginner / expert toggle (G13):** beginner mode prefixes each step with a 2-3-line tutorial paragraph. - **Use-case integration (G12):** the curated quickstart-template list is also offered as Step 2 of the full flow, seeding defaults for later steps. - **`back` / `reset` navigation (G3):** `back` / `b` returns to the previous step; `reset` / `r` clears in-memory state. - **`POPULAR_MODELS` alignment with web (G14):** CLI list now matches `site/js/wizard.js`'s preset cards. What's covered (web): - **Header comment (I2):** `site/js/wizard.js` opening doc block updated from "7 steps" to "9 steps" with correct list. - **Use-case key alignment (I4):** `USE_CASE_PRESETS` keys renamed (`code-copilot` → `code-assistant`, `medical-tr` → `medical-qa-tr`) so `forgelm/quickstart.py::TEMPLATES` is the single source of truth. What's covered (docs): - **Design doc (I5):** `docs/design/wizard_mode.md` rewritten to describe the actual 9-step flow, sub-package layout, and Phase 22 modernisation status banner. - **Architecture reference:** `docs/reference/architecture.md` + `-tr.md` updated for the `forgelm/wizard/` sub-package. - **Quickstart guide:** `docs/guides/quickstart.md` + `-tr.md` updated with the new flow description. - **CHANGELOG.md:** comprehensive Phase 22 entry under `[Unreleased]` / `### Added`. - **Roadmap link fix:** `docs/roadmap/phase-11-5-backlog.md` now points at `forgelm/wizard/_byod.py` rather than the old monolith. Architecture: `forgelm/wizard.py` (976 LOC monolith) split into the `forgelm/wizard/` sub-package per `docs/standards/architecture.md`'s 1000-line ceiling rule: - `_io.py` (213 LOC) — primitive prompts + navigation tokens + hardware detection. - `_state.py` (290 LOC) — `_WizardState`, persistence, summary, schema-default constants. - `_collectors.py` (688 LOC) — every `_collect_*` helper + strategy choice + use-case preset registry. - `_byod.py` (333 LOC) — Phase 11.5 / 12.5 BYOD inline ingest + audit helpers + quickstart prelude. - `_orchestrator.py` (566 LOC) — step definitions, `_drive_wizard_steps`, `_apply_strict_tier_coercion`, `run_wizard`. - `__init__.py` (176 LOC) — re-export shim for backward compatibility. Tests: 55 new tests in `tests/test_wizard_phase22.py` cover `_parse_webhook_value`, `_default_safety_probes_path`, `_collect_trainer_hyperparameters`, `_check_navigation_token`, `_apply_strict_tier_coercion`, persistence helpers, `_print_step_diff`, `_collect_webhook_config`, use-case preset registry parity with `quickstart.py::TEMPLATES`, strategy choice shape, GaLore optimiser variants, and schema-default parity for LoRA r / alpha / epochs / batch / LR. Existing `tests/test_wizard_phase11.py` test_print_wizard_summary cases updated to use the new dict-based API. Validation: 8/8 doc CI guards green (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks); ruff check + format clean across the package; module-size guard reports 0 over the fail-threshold; `forgelm --config config_template.yaml --dry-run` passes; strict-tier auto-coercion produces a config that `ForgeConfig.model_validate` accepts. i18n decision: CLI wizard remains English-only per operator explicit instruction. Web wizard's 6-language `wizard.*` catalogue (en/tr/de/fr/es/zh) is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

qodo-code-review · 2026-05-08T21:16:37Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

sourcery-ai

Sorry @cemililik, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

coderabbitai · 2026-05-08T21:16:51Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56db7ed8-e686-4fd0-8afc-afff9b9b7f2b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phase22-wizard-modernisation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-05-08T21:18:00Z

Not up to standards ⛔

🔴 Issues 5 high · 2 medium · 3 minor

Alerts:
⚠ 10 issues (≤ 0 issues of at least minor severity)

Results:
10 new issues

Category Results

BestPractice 1 minor

ErrorProne 5 high

Security 2 medium
1 minor

Comprehensibility 1 minor

View in Codacy

🟢 Metrics 671 complexity · 2 duplication

Metric Results

Complexity 671

Duplication 2

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request refactors the forgelm/wizard.py monolith into a modular sub-package forgelm/wizard/ and modernizes the CLI wizard to achieve parity with the web wizard's 9-step flow. It introduces support for advanced training configurations, compliance metadata, and improved dataset ingestion and auditing. The review feedback highlights opportunities to enforce mandatory compliance fields, fix index validation in menu prompts, correct default index logic for model selection, and ensure consistent UTF-8 encoding for file operations.

gemini-code-assist · 2026-05-08T21:23:07Z

+    intended_use = (
+        _prompt("Article 9(2)(a): intended_use", "")
+        if is_strict
+        else _prompt("intended_use (Article 9(2)(a)) — optional", "")
+    )


Under the EU AI Act, Article 9 risk management evidence is mandatory for high-risk and unacceptable systems. The intended_use field should be enforced as non-empty using _prompt_required when is_strict is true, rather than allowing an empty default.

References

EU AI Act Article 9(2)(a) requires documentation of the intended purpose of the system. ^(link)

High-risk AI systems must have a risk management system that includes documentation of intended use and foreseeable misuse.

gemini-code-assist · 2026-05-08T21:23:07Z

+        "collection_method": _prompt("Article 10(2)(b): how was data collected?", ""),
+        "annotation_process": _prompt("Article 10(2)(b): annotation methodology", ""),
+        "known_biases": _prompt("Article 10(2)(f): known_biases", ""),


Article 10 data governance evidence is mandatory for strict risk tiers. When mandatory is true, the collection_method, annotation_process, and known_biases fields should be enforced as non-empty using _prompt_required.

References

EU AI Act Article 10 requires documentation of data collection, annotation, and bias assessment. ^(link)

High-risk AI systems must use high-quality datasets and have appropriate data governance and management practices.

gemini-code-assist · 2026-05-08T21:23:07Z

+def _collect_compliance_metadata() -> Dict[str, Any]:
+    """Article 11 + Annex IV §1: provider + system metadata."""
+    return {
+        "provider_name": _prompt("Organization (legal-entity) name", ""),
+        "provider_contact": _prompt("Provider regulatory contact (email or phone)", ""),
+        "system_name": _prompt("Human-readable system name", ""),
+        "intended_purpose": _prompt("Intended purpose of the system", ""),
+        "known_limitations": _prompt("Known limitations operator wants documented (free-text)", ""),
+        "system_version": _prompt("System version string", "v0.1.0"),
+        "risk_classification": _prompt_choice(
+            "Risk classification (mirrored at risk_assessment.risk_category):",
+            list(_RISK_TIERS),
+            default=2,  # ``minimal-risk`` — same default as the Pydantic field.
+        ),
+    }


The _collect_compliance_metadata function collects the risk classification at the end of the block. For high-risk or unacceptable tiers, Article 11 and Annex IV §1 require specific provider and system metadata. To ensure compliance and provide a better user experience (matching the web wizard's behavior), the risk classification should be prompted first. This allows the function to use _prompt_required for mandatory fields when a strict tier is selected.

gemini-code-assist · 2026-05-08T21:23:07Z

+    _check_navigation_token(choice)
+    try:
+        idx = int(choice) if choice else default
+        return options[idx - 1]


The _prompt_choice function does not validate if the user-provided index is within the positive bounds of the options list. If a user enters 0, Python's negative indexing will return the last element of the list (options[-1]), which is counter-intuitive for a 1-based menu and may lead to unintended configuration choices.

Suggested change

return options[idx - 1]

if 1 <= idx <= len(options):

return options[idx - 1]

return options[default - 1]

gemini-code-assist · 2026-05-08T21:23:07Z

+    if preset_default and preset_default in POPULAR_MODELS:
+        default_idx = POPULAR_MODELS.index(preset_default) + 1
+    else:
+        default_idx = 1


The logic for determining default_idx in the model selection step clobbers the user's current choice if it is not present in the POPULAR_MODELS list (e.g., a model preselected by a use-case template like code-assistant). It should default to the 'Custom' option index when a non-popular model is already set in the state.

Suggested change

if preset_default and preset_default in POPULAR_MODELS:

default_idx = POPULAR_MODELS.index(preset_default) + 1

else:

default_idx = 1

if preset_default:

if preset_default in POPULAR_MODELS:

default_idx = POPULAR_MODELS.index(preset_default) + 1

else:

default_idx = len(options) # Default to 'Custom'

else:

default_idx = 1

gemini-code-assist · 2026-05-08T21:23:07Z

+
+def _save_config_to_file(config: Dict[str, Any], requested_filename: str) -> str:
+    """Write *config* as YAML; falls back to a unique filename on OSError."""
+    try:


The configuration file should be written using UTF-8 encoding to ensure portability and correct handling of non-ASCII characters (e.g., in provider names or intended purposes), consistent with how the wizard state is saved.

Suggested change

try:

try:

with open(requested_filename, "w", encoding="utf-8") as f:

coderabbitai

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/wizard_mode.md`:
- Around line 33-79: The ordered list entries (markers "0." through "9.") in the
wizard flow block use two spaces after the list marker which triggers MD030;
update each ordered-list line to use a single space after the numeric marker
(e.g., change "0.  **Quickstart..." to "0. **Quickstart...") so the markdownlint
warning is resolved while keeping the exact text and formatting of the list
items (refer to the numbered entries in the wizard flow block).

In `@docs/reference/architecture.md`:
- Line 149: Update the heading and corresponding section text in
docs/reference/architecture.md so the wizard component reads and is titled
"wizard/" (reflecting the forgelm/wizard/ sub-package) instead of "wizard.py",
and make the identical change in docs/reference/architecture-tr.md to keep EN/TR
mirrors consistent; also scan the wizard section for any remaining monolithic
phrasing (e.g., references to a single-file implementation) and adjust wording
to reference the package layout and functions like _offer_ingest_for_directory
and _offer_audit_for_jsonl where the doc currently implies a monolith.

In `@forgelm/wizard/_byod.py`:
- Around line 80-83: The printed shell hints use raw interpolated paths
(variable resolved) inside _print calls, which breaks for paths with
spaces/metacharacters; update each occurrence (the _print call around the first
snippet and the other hints at the locations mentioned) to wrap the path with
shlex.quote(resolved) (import shlex if not present) so the displayed commands
are safely quoted and copy-pastable; ensure every formatted string that inserts
resolved (and any similar path variables) uses shlex.quote(...) in the f-strings
for the three occurrences referenced.
- Around line 194-209: The prompt currently accepts any non-directory string as
valid; change _prompt_dataset_path_with_ingest_offer to reuse the quickstart
validation flow: after reading raw, set candidate = Path(raw).expanduser(); if
candidate.is_dir() keep the _offer_ingest_for_directory path; else if
candidate.exists() and candidate.is_file() and candidate.suffix.lower() in
(".jsonl", ".json") call _offer_audit_for_jsonl(candidate.resolve()) and return
the resolved path; else if candidate.exists() but is not a valid JSONL, surface
an error and continue; if candidate does not exist, call the existing
_validate_local_jsonl(...) (or the project’s HF-id fallback helper) to validate
whether raw is a valid local JSONL or a Hugging Face dataset id and only return
raw when that validation succeeds, otherwise show an error and loop. Ensure you
reference and reuse _offer_ingest_for_directory, _offer_audit_for_jsonl, and
_validate_local_jsonl in the updated logic.

In `@forgelm/wizard/_collectors.py`:
- Around line 151-153: The default for webhook start notifications is wrong;
change the call that sets the default for "notify_on_start" so it matches the
web wizard (set to False) instead of True; specifically update the line that
calls section.setdefault("notify_on_start", True) to
section.setdefault("notify_on_start", False) while leaving the other keys
("notify_on_success", "notify_on_failure") unchanged to keep parity across
surfaces.
- Around line 412-424: When mandatory=True, _collect_data_governance currently
still allows empty strings for "collection_method", "annotation_process", and
"known_biases"; change the prompts so those three fields are required when
mandatory is True by re-prompting until a non-empty response (e.g., loop on
_prompt for "collection_method", "annotation_process", and "known_biases" when
mandatory) or by using a required/validate parameter if _prompt supports it;
ensure the rest of the function returns the same keys ("collection_method",
"annotation_process", "known_biases", "personal_data_included",
"dpia_completed") but with non-empty values for the three text fields when
mandatory is set.
- Around line 389-405: The strict-tier branch currently treats Article 9 fields
as optional — ensure when is_strict is True that intended_use,
foreseeable_misuse and mitigation_measures cannot be empty by re-prompting or
validating input: for intended_use, replace the single _prompt call with logic
that loops (or validates) until a non-empty string is returned when is_strict is
True (still allow empty when False); for foreseeable_misuse and
mitigation_measures, use _prompt_optional_list but enforce at least one item
when is_strict is True by re-prompting until the returned list is non-empty;
keep vulnerable_groups_considered behavior unchanged. Target the calls to
_prompt("Article 9(2)(a): intended_use", ""), _prompt_optional_list("Article
9(2)(b): foreseeable_misuse — list at least one realistic misuse"), and
_prompt_optional_list("Article 9(2)(c): mitigation_measures the deployer
applies") and add the minimal loop/validation around them keyed to is_strict.

In `@forgelm/wizard/_io.py`:
- Around line 191-193: The CUDA VRAM lookup uses
torch.cuda.get_device_properties(0).total_mem which doesn't exist and raises an
AttributeError; update the VRAM calculation in the block that sets
info["vram_gb"] (the code using torch.cuda.get_device_properties in
forgelm/wizard/_io.py) to use the correct attribute total_memory and keep the
same rounding/GB conversion so info["vram_gb"] =
round(torch.cuda.get_device_properties(0).total_memory / (1024**3), 1).

In `@forgelm/wizard/_orchestrator.py`:
- Around line 500-511: The WizardBack handler fails to restore the prior config
snapshot so partial mutations leak into persisted state; when catching
WizardBack in the block that calls step.runner, restore state.config from the
previously saved prev_config (deep copy) before decrementing state.current_step
and before calling _persist_state, and ensure any changes to
state.completed_steps still reflect the rolled-back step (use prev_config to
decide if completed_steps needs adjustment); update the exception branch around
step.runner, prev_config, state.config, WizardBack, state.current_step,
state.completed_steps and _persist_state accordingly.
- Around line 512-515: The except block handling WizardReset should not return a
new _WizardState() because that makes _run_full_wizard() treat the reset as a
completed run; instead after calling _clear_wizard_state() re-raise WizardReset
so the reset propagates to the caller and triggers the restart flow. Change the
except WizardReset block to call _clear_wizard_state() and then "raise" (not
"return _WizardState()") so _run_full_wizard and surrounding logic can handle
restarting correctly.

In `@tests/test_wizard_phase22.py`:
- Around line 299-304: The current test
test_load_returns_none_on_corrupt_snapshot uses a string that PyYAML can parse,
so update the written snapshot content to an unambiguously invalid YAML string
(e.g., an unbalanced bracket or a lone colon line) so the parser throws
yaml.YAMLError; keep using wizard._wizard_state_path() to create the file and
assert wizard._load_wizard_state() is None so the corrupt-YAML except branch is
exercised.
- Around line 88-96: The tests are inconsistent: wizard._parse_webhook_value
currently raises ValueError with "must use `https://`" for unknown schemes while
http:// is accepted with a warning; update the ValueError text in
wizard._parse_webhook_value to state the real rule (e.g., "must use http:// or
https://") and then update the test test_unknown_scheme_rejected to expect that
new message so the rejection for ftp:// matches the acceptance behavior for
http://; references: function wizard._parse_webhook_value and test
test_unknown_scheme_rejected.
- Around line 104-120: Tests TestDefaultSafetyProbesPath
(test_resolves_to_packaged_jsonl and test_path_is_absolute_or_resolvable) make
unconditional on-disk assertions against the path returned by
wizard._default_safety_probes_path, which will fail in source-only checkouts;
either mock importlib.resources.files (or the resource lookup used by
wizard._default_safety_probes_path / pkgutil.get_data) to return a temporary
file path or gate the tests with pytest.mark.skipif that checks the resource
exists before asserting Path(...).is_file() so the tests do not assume
package-data is present in non-installed/dev environments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09191bc6-c98c-4257-a2a8-64ad91848f51

📥 Commits

Reviewing files that changed from the base of the PR and between 755eaf7 and 52b113d.

📒 Files selected for processing (18)

CHANGELOG.md
docs/design/wizard_mode.md
docs/guides/quickstart-tr.md
docs/guides/quickstart.md
docs/reference/architecture-tr.md
docs/reference/architecture.md
docs/roadmap/phase-11-5-backlog.md
forgelm/wizard.py
forgelm/wizard/__init__.py
forgelm/wizard/_byod.py
forgelm/wizard/_collectors.py
forgelm/wizard/_io.py
forgelm/wizard/_orchestrator.py
forgelm/wizard/_state.py
site/js/wizard.js
tests/test_phase12_5.py
tests/test_wizard_phase11.py
tests/test_wizard_phase22.py

💤 Files with no reviewable changes (1)

forgelm/wizard.py

coderabbitai · 2026-05-08T21:29:25Z

+    def test_unknown_scheme_rejected(self):
+        with pytest.raises(ValueError, match="must use `https://`"):
+            wizard._parse_webhook_value("ftp://example.com/hook")
+
+    def test_http_warning_but_accepted(self, capsys):
+        section = wizard._parse_webhook_value("http://hooks.example.com/x")
+        assert section == {"url": "http://hooks.example.com/x"}
+        captured = capsys.readouterr().out
+        assert "uses HTTP, not HTTPS" in captured


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Error message "must use \https://`"` is inconsistent with HTTP being accepted.

test_unknown_scheme_rejected (Line 89) asserts that ftp:// raises ValueError with the message "must use \https://`", but test_http_warning_but_accepted(Lines 92–96) showshttp://is silently accepted with only a warning. An operator who hits theftp://rejection would read the error as "only HTTPS is allowed", then be surprised to findhttp://works. The error message should reflect the actual acceptance rule — e.g.,"must use https:// or http://"`.

🐛 Suggested test correction to match the intended error message fix

def test_unknown_scheme_rejected(self): - with pytest.raises(ValueError, match="must use `https://`"): + with pytest.raises(ValueError, match="must use `https://` or `http://`"): wizard._parse_webhook_value("ftp://example.com/hook")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_unknown_scheme_rejected(self):

with pytest.raises(ValueError, match="must use `https://`"):

wizard._parse_webhook_value("ftp://example.com/hook")

def test_http_warning_but_accepted(self, capsys):

section = wizard._parse_webhook_value("http://hooks.example.com/x")

assert section == {"url": "http://hooks.example.com/x"}

captured = capsys.readouterr().out

assert "uses HTTP, not HTTPS" in captured

def test_unknown_scheme_rejected(self):

with pytest.raises(ValueError, match="must use `https://` or `http://`"):

wizard._parse_webhook_value("ftp://example.com/hook")

def test_http_warning_but_accepted(self, capsys):

section = wizard._parse_webhook_value("http://hooks.example.com/x")

assert section == {"url": "http://hooks.example.com/x"}

captured = capsys.readouterr().out

assert "uses HTTP, not HTTPS" in captured

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_wizard_phase22.py` around lines 88 - 96, The tests are inconsistent: wizard._parse_webhook_value currently raises ValueError with "must use `https://`" for unknown schemes while http:// is accepted with a warning; update the ValueError text in wizard._parse_webhook_value to state the real rule (e.g., "must use http:// or https://") and then update the test test_unknown_scheme_rejected to expect that new message so the rejection for ftp:// matches the acceptance behavior for http://; references: function wizard._parse_webhook_value and test test_unknown_scheme_rejected.

coderabbitai · 2026-05-08T21:29:25Z

+    def test_load_returns_none_on_corrupt_snapshot(self, isolated_state_dir):
+        path = wizard._wizard_state_path()
+        path.parent.mkdir(parents=True, exist_ok=True)
+        # Invalid YAML — must not raise; just silently miss.
+        path.write_text("not: valid: yaml: at: all", encoding="utf-8")
+        assert wizard._load_wizard_state() is None


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

"not: valid: yaml: at: all" may be parsed as valid YAML, silently exercising version-mismatch rather than corrupt-data handling.

PyYAML parses not: valid: yaml: at: all as {"not": "valid: yaml: at: all"} (the first : starts a block mapping; the remainder is a scalar value). If the parser succeeds, _load_wizard_state returns None due to a version-mismatch check, not because the except yaml.YAMLError branch was reached. The corrupt-YAML code path goes untested.

Use a string that is unambiguously unparseable:

🛡️ Proposed fix

- path.write_text("not: valid: yaml: at: all", encoding="utf-8") + # Tabs in block mappings are illegal in YAML; this guarantees a ScannerError. + path.write_text("key:\n\t- value", encoding="utf-8")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_wizard_phase22.py` around lines 299 - 304, The current test test_load_returns_none_on_corrupt_snapshot uses a string that PyYAML can parse, so update the written snapshot content to an unambiguously invalid YAML string (e.g., an unbalanced bracket or a lone colon line) so the parser throws yaml.YAMLError; keep using wizard._wizard_state_path() to create the file and assert wizard._load_wizard_state() is None so the corrupt-YAML except branch is exercised.

Addresses inline review comments + verified user findings on the Phase 22 wizard modernisation PR. Critical / major / minor mix: Critical (2): - _io.py: torch.cuda.get_device_properties().total_mem → total_memory (the v0.5.5 bug raised AttributeError on real CUDA hardware). - _orchestrator.py: WizardReset re-loops with a fresh _WizardState instead of returning early; returning let _run_full_wizard treat the reset as a completed run and try to save an empty config. Major (6): - _orchestrator.py: WizardBack now restores the prev_config snapshot so a half-completed step's mutations don't leak into the previous step's prompts. - _orchestrator.py: deep copy now uses copy.deepcopy instead of json.loads(json.dumps(...)) — the JSON path was lossy for tuples and sets that may sneak in via collector helpers. - _orchestrator.py: _step_model preserves a use-case preselect that isn't in POPULAR_MODELS (defaults to the Custom slot pre-filled with the operator's earlier choice). - _byod.py: _prompt_dataset_path_with_ingest_offer validates that the typed value is a directory, JSONL/JSON file, or HF Hub ID; bare typos no longer silently become data.dataset_name_or_path. - _collectors.py: Article 9 strict-tier branch enforces non-empty intended_use / foreseeable_misuse / mitigation_measures. - _collectors.py: Article 10 strict-tier branch enforces non-empty collection_method / annotation_process / known_biases. Medium (4): - _collectors.py: notify_on_start default flips to False (web wizard parity — start pings are noisy). - _collectors.py: _collect_compliance_metadata collects risk_classification first so strict-tier hint shows up before Article 11 fields and downstream Article 9/10 branches see the correct context. - _state.py: _save_config_to_file uses yaml.safe_dump + UTF-8 + allow_unicode (yaml.dump could emit !!python/object tags that ForgeConfig then rejects). Same treatment for the YAML preview. - _state.py: chmod 0o600 on wizard_state.yaml (best-effort) — compliance metadata can be sensitive. - _io.py: _prompt_choice guards negative idx and out-of-range int alongside the existing IndexError fallback. Minor (3): - _byod.py: shlex.quote() the path in BYOD shell-hint output so paths with spaces survive copy-paste into a real shell. - docs/design/wizard_mode.md: MD030 — single space after ordered list markers (was double). - docs/reference/architecture{,-tr}.md: heading wizard.py → wizard/ (the module became a sub-package in Phase 22). Coverage (HIGH): - tests/test_wizard_phase22.py adds end-to-end coverage for _drive_wizard_steps (back-restores-config, reset-re-loops, back-at-first-step is no-op, persist-after-each-step) plus _maybe_resume_state (resume-from-snapshot, decline-clears) and _step_welcome (backend-hint + beginner branch). Orchestrator module coverage rises from 14% → 30% on this file alone. All checks green: - ruff check + format clean - 1544 passed, 14 skipped (full suite) - 8 doc guards: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks - module_size: no new warnings - forgelm --config config_template.yaml --dry-run Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CHANGELOG entry for the c0dd57a fix sweep — Critical / Major / Medium / Minor classifications + the orchestrator coverage bump that landed alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 22 review-cycle 2: addresses 17 newly-discovered parity / UX gaps surfaced during the post-PR-#40 audit (P1-P18 + B1-B5 + C1-C2 + D1-D2 + E1+E4). All five HIGH-severity findings closed; both wizards (CLI sub-package + site/js/wizard.js) tightened in lockstep. ## Critical bug fixes - B-NEW-1: ``_print_step_diff`` strips internal ``_wizard_meta.*`` keys before showing the operator the per-step diff (was leaking ``_wizard_meta.use_galore`` from the strategy step). - B-NEW-2: ``_step_compliance`` skips the Article 10 governance re-prompt when step 6 already populated it under non-strict tier (was silently overwriting earlier answers). - B-NEW-3: ``_save_wizard_state`` now writes via ``tempfile`` + ``fsync`` + ``os.replace`` so a SIGKILL / power loss / concurrent wizard process can never leave a half-written state file. - B5: strict-tier coercion fires at the end of ``_step_compliance`` too (not just ``_step_evaluation``), so a Ctrl-C between steps 8 and 9 still produces a loadable YAML if the operator manually saves. - A1: ``_parse_webhook_value`` SSRF preflight — rejects loopback / RFC1918 / link-local destinations up front by reusing ``forgelm._http._is_private_destination``. ## UX guardrails - B1/E1: ``_validate_generated_config`` runs ``ForgeConfig.model_validate`` on the saved YAML before exit. Schema rejections surface inline. - B2: ``_prompt_unique_filename`` + ``_next_free_filename`` — overwrite confirmation with auto-suffix (no more silent clobbers). - B3: ``run_wizard_full`` refuses to launch on non-tty stdin and points operators at ``forgelm quickstart <template>`` for scripted runs (closes the silent-empty-config CI hole). - E4: ``_print_preflight_checklist`` — GPU/VRAM/dataset/risk-tier signals before the configuration summary. ## CLI ↔ web parity - P1+P18: safety-block field union — CLI now emits ``classifier`` + ``max_safety_regression`` (and ``min_classifier_confidence`` under confidence_weighted scoring); web wizard now emits ``scoring`` + ``track_categories`` + ``severity_thresholds``. Both surfaces produce structurally equivalent ``evaluation.safety`` blocks. - P2: judge ``min_score`` default 6.5 → 5.0 (schema parity). - P3: web wizard surfaces ``model.max_length`` as a number input on the training step (was silently capped at 2048). - P9: monitoring collector accepts ``env:VAR_NAME`` like webhook. - P13: HF Hub format hint when custom model name isn't ``<org>/<name>``. - P14: ``auto_revert.max_acceptable_loss`` default 2.0 emission. - P16: CLI emits ``bnb_4bit_quant_type`` + ``bnb_4bit_compute_dtype`` when QLoRA is selected (web parity). - C1/I3: web ``learningRate: '2e-5'``, ``batchSize: 4`` (schema parity); ``2e-5`` added to LR option list. - C2: web wizard cross-tab ``storage`` event listener. ## Distinct exit codes - D2: new ``EXIT_WIZARD_CANCELLED = 5`` constant; new ``WizardOutcome`` dataclass + ``run_wizard_full()`` entry point. Genuine cancels (Ctrl-C, non-tty, decline-to-save) now exit 5 instead of 0. Saved-and-deferred remains exit 0. - D1: best-effort ``import readline`` for arrow-key + history on Linux/macOS (Windows unaffected). ## Tests + isolation - B10: new autouse ``_isolate_wizard_state`` fixture in ``tests/conftest.py`` redirects ``XDG_CACHE_HOME`` for any ``tests/test_wizard_*`` module. - 28 new tests across the new behaviour (146 wizard tests passing in total; full suite 1573 passed / 14 skipped). - Existing ``test_wizard_still_works`` updated for the new ``WizardOutcome`` contract; new ``test_wizard_cancelled_exits_5``. ## Documentation - ``CHANGELOG.md`` — review-cycle 2 entry under [Unreleased]. - ``README.md`` — directory listing ``wizard.py`` → ``wizard/``. - ``docs/standards/architecture.md`` — Mermaid graph + table updated. - ``docs/design/wizard_mode.md`` — review-cycle 2 status banner. - ``docs/reference/usage{,-tr}.md`` — exit-code 5 row. - ``docs/usermanuals/{en,tr}/reference/exit-codes.md`` — full contract row + compatibility note (six integers, was five). - ``docs/guides/quickstart{,-tr}.md`` — operator-guardrail paragraph. ## Validation - ruff check + format clean (184 files). - 8 doc guards pass (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks). - module_size: no new warnings. - ``forgelm --config config_template.yaml --dry-run`` green. - ``pytest tests/`` — 1573 passed, 14 skipped. ## Deferred (roadmap entries, NOT in this PR) - F1: schema-driven single-source-of-truth (``wizard_defaults.json`` build-time generation) — multi-PR refactor, frames every future default drift. - B4: ``--wizard-from-yaml answers.yaml`` mode — surface change pending policy decision on scripted-config UX. - E2: per-prompt rationale catalog — needs ~50 hand-written strings. - E3: ``--wizard --start-from <yaml>`` idempotent re-run — pairs with F1. - F3: state-versioning migration logic. - C3: web wizard mobile-CSS overflow review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the 11 actionable findings from the review-cycle 2 audit. Low-risk batch — no architectural changes. PR-B (F1 schema-driven defaults SOT) follows separately. ## Critical / correctness - A3: ``_save_wizard_state`` unlinks the temp file when ``os.replace`` fails (cross-device home, EACCES, EXDEV) so the ``.wizard_state.*.tmp`` files stop accumulating. Defensive ``finally``-branch sweep also covers the case where ``yaml.safe_dump`` itself raises before the rename runs. - A4: ``_apply_strict_tier_coercion`` notice prints once per wizard run instead of twice (compliance-step + evaluation-step idempotent re-call). Tracked via the ``_wizard_meta.strict_tier_announced`` flag that ``_strip_internal_meta`` removes before save. - F3: ``_STATE_MIGRATIONS`` registry skeleton in ``_state.py`` ready for future ``_STATE_VERSION`` bumps. Empty by design; first real bump only adds ``{1: migrator}``, no plumbing rewrite. ## Performance - C16: ``_detect_hardware()`` cached on ``_WizardState.hardware``; ``_step_welcome`` + ``_print_preflight_checklist`` share the result instead of paying a second torch import + CUDA enumeration (~50–200 ms saved per wizard run). ## Standards / observability - G30: ``_validate_generated_config`` now emits ``logger.warning(...)`` in addition to the inline ``_print`` notice — CI / log pipelines see "wizard output failed schema validation" without scraping stdout (per ``docs/standards/error-handling.md`` rule 2). - B11: web wizard's ``storage`` event listener ``console.warn``s on a sync failure instead of silently swallowing the error. ## UX - C3: expanded YAML accordion on ``<900px`` viewports now caps at ``38vh; overflow:auto`` so long configs no longer push the Copy/Download/Back/Next footer off-screen. - E2 (hedefli): inline rationale on five high-impact prompts (``lora_r``, ``lora_alpha``, ``num_train_epochs``, ``batch_size``, ``max_length``, ``risk_classification``) so first-time operators see the operational hint without flipping into beginner mode. ## Tests - E22-24: new ``test_wizard_start_training_routes_through_dispatcher`` covers the previously-uncovered branch in ``_maybe_run_wizard`` where ``outcome.start_training=True`` mutates ``args.config`` and lets the trainer pipeline run. - 20+ additional tests: atomic-write cleanup (success / replace-fail / dump-fail), announce-once flag, migration registry (unknown version / newer version / chain runs / non-advancing migrator), hardware cache reuse, validate-warning log line. ## Documentation - A9: ``docs/product_strategy{,-tr}.md``, ``docs/design/gdpr_erasure.md``, ``docs/design/data_audit_cli_split.md`` — "0/1/2/3/4 contract" claims now acknowledge ``EXIT_WIZARD_CANCELLED = 5``. - F27: ``docs/usermanuals/{en,tr}/compliance/human-approval-gate.md`` cross-references exit code 5 alongside the existing exit-4 discussion (high-risk operators using ``--wizard`` need both signals). - B10 doc-only: ``_parse_webhook_value`` comment now notes the SSRF preflight catches "wrong URL at config time", not "guaranteed- public destination at runtime" — runtime TOCTOU tracked at #14. - CHANGELOG entry under [Unreleased] / Added. ## Roadmap reshuffle - B4 (``--wizard-from-yaml``) removed from consideration: project identity ("config-driven") + B3 (non-tty refusal + ``quickstart`` hand-off) closed the gap. - E3 (``--wizard --start-from <yaml>``) explicitly bound to F1: reverse-mapping duplicates F1's schema-driven SOT work; deferred until F1 lands. ## Validation - ruff check + format clean (184 files). - 8 doc guards pass (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks). - module_size: no new warnings. - 181 wizard-scope tests pass (``test_wizard_*`` + ``test_phase12_5`` + ``test_cli_phase10``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the long-running drift surface tracked through P1-P17 across review-cycles 1, 2, and 3. A schema field-default change without a matching wizard update was the single biggest source of CLI ↔ web parity gaps; this PR turns the schema into the canonical source. ## Pydantic field metadata flag 8 wizard-relevant fields in ``forgelm/config.py`` now carry ``json_schema_extra={"wizard": True}``: - ``ModelConfig.max_length`` (2048) - ``LoraConfigModel.r`` / ``alpha`` / ``dropout`` (8 / 16 / 0.1) - ``TrainingConfig.num_train_epochs`` / ``per_device_train_batch_size`` / ``gradient_accumulation_steps`` / ``learning_rate`` (3 / 4 / 2 / 2e-5) The flag is a no-op for ``ForgeConfig.model_validate`` — it only surfaces in ``model_json_schema()`` output, where the generator picks it up. ## Generator + CI guard ``tools/generate_wizard_defaults.py`` walks every nested ``BaseModel`` reachable from ``ForgeConfig``, finds wizard-flagged fields, and writes two artefacts: - ``forgelm/wizard/_defaults.json`` — package data (shipped via ``[tool.setuptools.package-data]``). - ``site/js/wizard_defaults.js`` — exposes ``window.WIZARD_DEFAULTS`` for the web wizard. ``tools/check_wizard_defaults_sync.py`` re-runs the generator into a temp comparison and fails the CI if the shipped artefacts drift from the schema. Operators get a clear "regenerate via …" hint with the first 5 differing lines so the fix is one command away. ## Python consumer ``forgelm/wizard/_state.py`` now reads the ``DEFAULT_*`` constants from the shipped JSON via ``importlib.resources``. A new ``_load_defaults()`` + ``_default(section, key, fallback)`` helper pair handles missing-file / corrupt-JSON cases by returning ``{}``, so the hardcoded fallbacks below still keep the wizard usable on a broken pip install. ## JS consumer ``site/js/wizard.js`` ``defaultState()`` now uses a new ``wd(section, key, fallback)`` helper to read schema-derived values from ``window.WIZARD_DEFAULTS``. ``site/quickstart.html`` loads ``wizard_defaults.js`` BEFORE ``wizard.js`` so the global is set in time. Special-cases ``learning_rate``: arrives as a number from JSON (2e-5), uses ``Number.toExponential()`` to canonicalise to the dropdown's ``'2e-5'`` string format (rather than ``String()`` which would yield ``'0.00002'`` and fail to match). ## Tests New ``TestSchemaDrivenDefaultsSOT`` class pins the contract: - ``test_module_constants_match_schema_via_json`` — every ``DEFAULT_*`` equals its ``Field.default``. - ``test_defaults_json_shipped_and_loadable`` — package data lives on disk + parses cleanly + has expected sections. - ``test_default_helper_returns_fallback_when_section_missing`` — graceful degradation. - ``test_load_defaults_returns_empty_on_missing_file`` — slim install scenario. Full suite green: 1590 passed, 14 skipped. ## Documentation - ``CHANGELOG.md`` — F1 entry under [Unreleased] / Added with B4 removed-from-roadmap + E3-bound-to-F1 notes. - ``CLAUDE.md`` — self-review chain extended to 8 commands; tools/ listing names ``generate_wizard_defaults`` + ``check_wizard_defaults_sync``. - ``pyproject.toml`` — ``[tool.setuptools.package-data]`` ships ``forgelm/wizard/_defaults.json``. ## Why a generator + JSON, not live-import? - Web wizard is a static site — no Python runtime to query schema defaults at page-load; JSON-as-data ships with the assets. - Python side benefits too: low import-cost path, JSON visible in git diffs (auditable), and round-tripping forces wizard-relevant defaults to be JSON-serialisable primitives. ## Roadmap reshuffle - B4 (``--wizard-from-yaml``) confirmed REMOVED from consideration. - E3 (``--wizard --start-from <yaml>``) explicitly bound to F1: reverse-mapping reuses this PR's metadata flags + JSON shape (deferred until needed; no scope rush). ## Validation - ruff check + format clean (186 files including the two new tools). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, **wizard_defaults_sync** (new). - module_size: ``forgelm/config.py`` crossed 1000 LOC (now 1011); it was already grandfathered. - 1590 passed, 14 skipped (full suite). - ``forgelm --config config_template.yaml --dry-run`` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d test Closes the actionable findings from the independent review of cycle-3 + F1 commits (5667885 + 91d5726). Low-risk batch — internal hardening + test coverage; no operator-visible behaviour change. ## Generator robustness (PR-B follow-up) - B1: ``_walk_model`` now unwraps ``Optional[BaseModel]`` sub-blocks via the new ``_unwrap_basemodel()`` helper (uses ``typing.get_args``). Pre-fix the walk only matched bare-class annotations, so a wizard flag on a field inside an ``Optional[XxxConfig]`` sub-model was silently skipped. 10 of 14 ForgeConfig top-level fields are ``Optional[...]`` — today no flags exist on them, but a future contributor would have hit the silent- skip trap with no error from the generator and no failure from the CI guard. - B2: ``default is PydanticUndefined`` identity check (imported from ``pydantic_core``) replaces the brittle ``repr(default).startswith("PydanticUndefined")`` substring match. Stable across Pydantic v2 minor releases. - C2: module-level header documenting the JSON-serialisable contract for wizard-flagged fields (``int / float / bool / str / None / list of primitives / dict of primitives`` only). Future flags on Path / datetime / callable defaults will fail loudly at generator run; the contract is now explicit so contributors don't get surprised. ## State module hardening (PR-B follow-up) - B5: ``_default()`` now warns via ``logger.warning`` when the shipped JSON value disagrees with the hardcoded fallback literal. The JSON is canonical (returned), but drift between schema and fallback now surfaces in CI logs — eliminates the second-source-of-truth silently-stale risk. ## CI guard test coverage (PR-A follow-up) - G2: new ``tests/test_check_wizard_defaults_sync.py`` (7 tests): - corrupt-JSON triggers failure + diff message + regeneration hint. - missing-target triggers clean failure (no crash). - diff truncation works at 5 lines. - generator unwraps ``Optional[BaseModel]`` (B1 contract pinned). - required (no-default) wizard fields skipped (B2 contract pinned). - ``check_wizard_defaults_sync.py``: new ``_display_path()`` helper falls back to absolute path when target is outside ``_REPO_ROOT`` (prevents ``Path.relative_to`` crash when tests monkeypatch the target to a tmp dir). ## Test hardening (PR-A follow-up) - G3: ``test_wizard_start_training_routes_through_dispatcher`` now asserts ``ForgeConfig.model_validate(config_data)`` on the test fixture YAML before mocking the trainer. Pre-fix, a future change to ``minimal_config()`` that produced an invalid YAML would still pass this test (because the trainer pipeline is mocked BEFORE validation runs in ``_dispatch.main``), giving false-positive coverage. ## Documentation + cosmetic - A3: hardware-cache freshness caveat documented in ``_cached_hardware()`` docstring (GPU hot-plug mid-session is an acceptable edge case — operator can exit-and-resume to refresh). - F3: README directory tree now mentions ``_defaults.json`` as schema-derived defaults SOT shipped under ``forgelm/wizard/``. - B7: ``defer`` attribute on ``wizard_defaults.js`` script tag. Both wizard scripts now defer; HTML5 spec guarantees execution order (document order) so ``WIZARD_DEFAULTS`` still lands before ``defaultState()`` runs, but neither blocks HTML parsing. ## Skipped (review noted as "soft / acceptable") - C3 (``_forgelm_wizard:`` extra-key prefix): ``model_json_schema()`` is not called anywhere in the codebase — the ``wizard:true`` extra cannot leak into operator-facing artefacts today. Cost > value. - E1, E2, E3, H2: explicit refactor / defer in the review. - B4 (``--wizard-from-yaml``) confirmed REMOVED from roadmap. - E3 (``--start-from <yaml>``) bound to F1; deferred until needed. ## Validation - ruff check + format clean (187 files including new test). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync. - module_size: no new warnings. - 1597 passed, 14 skipped (full suite). - ``forgelm --config config_template.yaml --dry-run`` green. - bandit -ll: 0 HIGH / 19 MEDIUM (all pre-existing B615 unpinned HF Hub downloads — project-wide, intentional, not in PR-C scope). ## PR-comment status - 13 inline comments from coderabbitai on commit ``52b113d`` (Phase 22 origin) — ALL addressed in cycle-2 (``c0dd57a``) + cycle-3 (``5667885``) commits. Threads attached to old commits; will auto-resolve on PR merge. - 6 inline comments from gemini-code-assist on commit ``52b113d`` — ALL addressed. - Codacy + SonarQube hotspot summaries from ``52b113d`` — current HEAD (``91d5726`` + this commit) addresses everything they flagged in scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cemililik · 2026-05-09T08:08:22Z

Reviewer findings — resolution summary (HEAD: `9fe85df`)

This thread accumulated automated review feedback from coderabbitai, gemini-code-assist, codacy, and sonarqube on the original Phase 22 commit (52b113d). The branch has since landed three follow-up commits that address every actionable finding plus the Independent Review of cycle-3 + F1.

Commit chain on this PR

Commit	Scope	Validation
`52b113d`	Phase 22 — CLI wizard parity-with-web modernisation (sub-package split + 25 review findings G1-G20 + I1-I5)	65 wizard tests
`c0dd57a`	Phase 22 review-cycle 1 — 18 PR-#40 review findings (commit msg + cycle-1 audit)	1544 passed
`7d08bba`	docs(changelog): cycle-1 documentation	(docs only)
`9c32fe9`	Phase 22 review-cycle 2 — 17 newly-discovered parity / UX gaps (P1-P18, B1-B3, C1-C2, D1-D2, E1, E4)	1573 passed
`5667885`	review-cycle 3 — 11 cycle-2 audit findings (A3, A4, F3, C16, G30, B11, C3, E2, E22-24, A9, F27, B10)	1573 passed
`91d5726`	F1 — schema-driven wizard defaults SOT (8 fields, generator + CI guard, Python + JS consumers)	1590 passed
`9fe85df`	review-cycle 3 follow-up — generator hardening (B1, B2, C2), state parity warning (B5), CI guard fail-path tests (G2), dispatcher test hardening (G3), README + docs (F3, A3, B7)	1597 passed

Per-comment status

coderabbitai (13 inline on 52b113d) — all addressed:

MD030 list markers (docs/design/wizard_mode.md) → c0dd57a
architecture.md wizard.py → wizard/ heading → c0dd57a
shlex.quote BYOD shell hints → c0dd57a
BYOD path validation → c0dd57a
notify_on_start web-parity default → c0dd57a
Article 9 strict-tier non-empty enforcement → c0dd57a
Article 10 strict-tier non-empty enforcement → c0dd57a
total_mem → total_memory → c0dd57a (note: cycle-2 audit verified the field was always total_memory; the "fix" was overcorrection — no functional bug existed)
WizardBack restore prev_config → c0dd57a
WizardReset re-loop instead of return → c0dd57a
HTTP/HTTPS error message consistency → c0dd57a
Safety probe path source-only env → resolved (probes file IS shipped at forgelm/safety_prompts/default_probes.jsonl)
Corrupt YAML test → resolved (PyYAML actually raises ScannerError on the test string; cycle-2 verified)

gemini-code-assist (6 inline on 52b113d) — all addressed:

Article 9 + 10 mandatory-field enforcement → c0dd57a
risk_classification collected first → c0dd57a
_prompt_choice negative-idx guard → c0dd57a
Model-selection default_idx preservation → c0dd57a
UTF-8 encoding for config files → c0dd57a

codacy + sonarqube summaries on 52b113d — current HEAD addresses everything they flagged in scope. Repo-wide bandit shows 0 HIGH severity in wizard sub-package; pre-existing B615 (HF Hub unpinned downloads) is project-wide and intentional, not in this PR's scope.

Independent Review of cycle-3 + F1 — top-5 follow-ups all in 9fe85df:

B1: generator unwraps Optional[BaseModel]
B2: PydanticUndefined identity comparison
C2: JSON-serialisable contract documented
B5: schema↔fallback drift WARNING
G2: CI guard fail-path test (7 tests)
G3: dispatcher test asserts model_validate
A3, B7, F3: doc + cosmetic

Roadmap reshuffle (explicit)

B4 (--wizard-from-yaml) — REMOVED from consideration. Politik çelişki ("config-driven is the identity") + B3 (non-tty refusal + quickstart hand-off) closed the gap.
E3 (--wizard --start-from <yaml>) — bound to F1's metadata flags + JSON shape; deferred until needed.

Validation at HEAD `9fe85df`

ruff check + format clean (187 files)
9 doc guards green: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync (new)
module_size: 6 grandfathered modules over warn-threshold; 0 over fail-threshold (1500)
1597 passed, 14 skipped (full pytest)
bandit -ll repo-wide: 0 HIGH / 19 MEDIUM (all pre-existing B615 HF Hub unpinned, project-wide, intentional)
forgelm --config config_template.yaml --dry-run green

Ready for human review.

External scanners (Codacy / SonarCloud) attached to PR #40 reported findings on the original ``52b113d`` Phase 22 commit. Verified each against the current HEAD; the remaining noise is genuinely non-actionable. Shrinks the public quality-gate to its real signal. ## Codacy 7 HIGH severity — all false positives, suppressed in-tree - **Prospector F401 (5×)** in ``forgelm/wizard/__init__.py`` — the re-export shim already carried ``# ruff: noqa: F401`` at the top but Codacy's Prospector doesn't honor ruff directives. Per-line ``# noqa: F401 -- re-export for tests (pyflakes/Prospector compat)`` markers added to the 5 flagged underscore-prefixed symbols (``_AUDIT_LARGE_FILE_THRESHOLD_BYTES``, ``_GALORE_OPTIMIZERS``, ``_BACK_TOKENS``, ``_STEPS``, ``_STATE_VERSION``). These are intentional re-exports for the wizard test suite, not dead imports. - **Bandit B105** in ``forgelm/wizard/_collectors.py:673`` — ``backend_token == "api"`` flagged as "possible hardcoded password". False positive: ``"api"`` is an enum-style discrim- inator (one of ``"api"``/``"local"``/``"file"``). Suppressed with ``# nosec B105 -- enum-style discriminator, not a secret``. - **PMD InaccurateNumericLiteral** in ``site/js/wizard_defaults.js`` on the ``2e-05`` literal — the file is a build-time artefact emitted by ``tools/generate_wizard_defaults.py`` from Pydantic ``learning_rate`` default. PMD's heuristic flags scientific notation as imprecise; the JS consumer canonicalises with ``Number.toExponential()`` so precision is never operator-visible. New ``.codacy.yaml`` excludes the generated artefact (and its Python sibling ``forgelm/wizard/_defaults.json``) from analysis. ## Codacy 29 INFO severity - **markdownlint MD013 (line-length, 17×)** — ForgeLM bilingual reference + design docs intentionally carry long lines (paragraph- style explanations, table cells, prose with long inline links). The 80-col limit is a code-style holdover that hurts readability for documentation; the project's ``bilingual_parity`` guard already enforces structural alignment. Disabled via ``.markdownlint.json``: ``"MD013": false`` with a rationale comment. - **markdownlint MD030 (9×)** in ``docs/design/wizard_mode.md`` — cycle-2 already fixed the list-marker spacing in this file; the Codacy result was stale (older commit snapshot). No code change needed; will clear on next Codacy re-scan. - **markdownlint MD028 (1×)**, **Semgrep pytorch-detection (2×)** — informational only, no action. ## SonarCloud 8 Security Hotspots — all LOW, all in test fixtures Hotspot review (no code changes — test fixtures are intentional): - ``tests/test_wizard_phase22.py:90`` ``ftp://example.com/hook`` — literal in ``test_unknown_scheme_rejected``; verifies the wizard rejects unsupported schemes. - ``tests/test_wizard_phase22.py:93,94`` ``http://hooks.example.com/x`` — literal in ``test_http_warning_but_accepted``; verifies the HTTP-with-warning behaviour mirrors ``forgelm/webhook.py``. - ``tests/test_wizard_phase22.py:830`` IMDS IP ``169.254.169.254`` / ``:834`` RFC1918 ``10.0.0.5`` — literals in ``TestWebhookSSRFPreflight``; verifies the SSRF preflight rejects loopback / RFC1918 / link-local destinations. - ``tests/test_wizard_phase22.py:924,929`` ``tmp_path`` / ``tests/test_cli_phase10.py:383`` — pytest's standard temp- directory fixture pattern. These hotspots are the ENTIRE point of the tests they appear in — suppressing them would require either disabling the SonarCloud rule project-wide (loses signal on production code) or adding inline comments that confuse the test-source-as-spec pattern. Decision: mark as "Safe / Reviewed" via SonarCloud's UI, no in-tree changes. ## Validation - ruff check + format clean (187 files). - 9 doc guards pass. - 139 wizard-scope tests pass (test_wizard_phase22 + test_check_wizard_defaults_sync + test_cli_phase10). - ``check_wizard_defaults_sync`` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes E3 from the cycle-2 + cycle-3 review backlog. Operators can now iterate on an existing config without losing prior answers — the long-asked-for ergonomic improvement that the previous review cycles all flagged ("wizard always starts from scratch") and bound to F1 (landed in PR-B `91d5726`) so the per-step value-honoring reuses the schema-derived defaults plumbing. ## CLI surface - New flag ``--wizard-start-from PATH`` registered in ``forgelm/cli/_parser.py``; ``--wizard-start-from existing.yaml`` alongside ``--wizard`` triggers the new flow. - ``forgelm/cli/_wizard.py``: ``_maybe_run_wizard`` reads ``args.wizard_start_from`` (via ``getattr`` for back-compat) and threads it into ``run_wizard_full(start_from=...)``. ## Orchestrator - New ``_load_initial_state_from_yaml(path)`` helper: reads YAML, validates it against ``ForgeConfig``, seeds a :class:`_WizardState` with ``experience="expert"`` + ``use_case=_MANUAL_USE_CASE`` + ``config=copy.deepcopy(data)``. Three error categories surface to the operator: ``FileNotFoundError`` for missing path, ``ValueError`` for parse failure / non-mapping root / schema rejection. All caught by ``run_wizard_full`` and rendered as a cancelled outcome with a printed warning — no traceback for the operator. - ``run_wizard``, ``run_wizard_full``, ``_run_full_wizard_outcome``, and ``_run_full_wizard`` now accept ``start_from: Optional[str] = None`` (back-compat preserved — None falls back to today's flow). - When ``start_from`` is supplied the quickstart-template prelude is skipped — the operator's intent is "edit this file", not "pick a fresh template". ## Per-step value honouring - ``_step_strategy``: ``lora_r`` and ``lora_alpha`` prompts default to ``state.config["lora"]`` values when present (falls back to schema defaults). - ``_step_training_params``: ``num_train_epochs`` / ``per_device_train_batch_size`` / ``model.max_length`` / ``output_dir`` prompts honour existing ``state.config`` values. - ``_step_dataset``: when an existing dataset path is present, the step prints "Existing dataset: <path>" and offers a "Keep this dataset?" yes/no early-out so the operator doesn't re-trigger ingestion / audit on an unchanged path. Other steps (welcome / use-case / model / trainer / compliance / evaluation) already honoured existing values via ``state.config.get(...)`` defaults — no change needed. ## Save flow - ``_run_full_wizard_outcome``: when ``start_from`` is supplied the save prompt's default filename becomes the start-from path. The existing ``_prompt_unique_filename`` overwrite confirmation still fires before clobbering. ## Tests 10 new tests under ``tests/test_wizard_phase22.py``: - ``TestStartFromYAMLLoad``: valid YAML pre-populates state (with ``copy.deepcopy`` isolation), missing path raises ``FileNotFoundError``, malformed YAML raises ``ValueError``, non-mapping root raises, schema-invalid YAML raises. - ``TestRunWizardFullStartFrom``: missing / invalid YAML returns a cancelled ``WizardOutcome`` instead of crashing. - ``TestStepHonorsExistingValues``: strategy / training-params / dataset steps each preserve existing values when the operator presses Enter at every prompt. Total wizard-scope tests: 202 (was 192). ## Documentation - ``CHANGELOG.md`` — E3 / PR-D entry under [Unreleased] / Added. - ``docs/design/wizard_mode.md`` — PR-D additions block. - ``docs/reference/usage{,-tr}.md`` — new flag in the cheat-sheet. - ``docs/usermanuals/{en,tr}/reference/cli.md`` — flag table row. - ``docs/guides/quickstart{,-tr}.md`` — "idempotent re-run" paragraph after the existing operator-guardrail block. ## Validation - ruff check + format clean (187 files). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency (439 invocations), yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync. - ``forgelm --config config_template.yaml --dry-run`` green. - 1607 passed, 14 skipped (full suite, was 1597). ## Roadmap status — fully on-plan - B4 (``--wizard-from-yaml`` answer-file mode): REMOVED from consideration earlier (B3 closed the gap). - E3: SHIPPED in this PR. - F1 (schema-driven SOT): SHIPPED in PR-B. - F3 (state-version migration skeleton): SHIPPED in PR-A. - C3 (mobile YAML clamp): SHIPPED in PR-A. - E2 (rationale catalog): targeted version SHIPPED in PR-A; full catalog explicitly deferred (no scope rush). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eview Independent code review of PR-D (`2ac9df2`) flagged 3 HIGH-severity bugs where pressing Enter at certain prompts silently regressed the operator's loaded YAML to wizard defaults — directly violating the "press Enter to keep" contract documented in PR-D's CHANGELOG entry and the quickstart guide. Plus 4 MEDIUM/LOW issues. All closed here. ## Critical: contract violations on `--wizard-start-from` - **PR-D-A1: ``_step_strategy`` regressed `lora.method` / `target_modules` / `dropout` / `bias` / `task_type` on Enter.** ``_select_strategy()`` now accepts ``existing_method`` / ``existing_load_in_4bit`` / ``existing_galore`` kwargs and derives the prompt's ``default_idx`` from the loaded YAML. Target-modules prompt detects the loaded list shape against ``TARGET_MODULE_PRESETS`` and shifts its default accordingly. Non-prompted fields (dropout / bias / task_type) now use ``setdefault``. - **PR-D-A2: ``_step_evaluation`` clobbered existing benchmark / llm_judge / safety blocks.** Function now seeds ``evaluation`` from ``state.config.get("evaluation")`` via ``copy.deepcopy``, defaults the ``auto_revert`` gate based on existing state, and preserves prior blocks when the operator declines a re-prompt. - **PR-D-A3: ``_step_use_case`` overwrote the operator's existing model + trainer choices** with the first template's preset (default index = 1). Combined with save-default-to-``start_from``, this silently corrupted the original YAML. Step now early-returns with a clear notice when ``state.use_case == _MANUAL_USE_CASE`` and ``model.name_or_path`` / ``training.trainer_type`` are populated. Greenfield runs now use ``setdefault`` for the preset application. ## Medium / hardening - **PR-D-A4: ``_step_welcome`` ``model.backend``** → chained ``setdefault`` instead of the nested-dict-replace pattern. An operator's explicit ``model.backend: transformers`` no longer flips silently to ``unsloth`` on a Linux+GPU box. - **PR-D-A5: ``_load_initial_state_from_yaml`` split try/except** for the ``ForgeConfig`` import vs. the ``model_validate`` call. A downstream ``ImportError`` from inside a Pydantic validator can no longer silently bypass schema validation. - **PR-D-A6: warns before discarding pre-existing wizard resume state** when ``--wizard-start-from`` is supplied alongside an in-progress ``$XDG_CACHE_HOME/forgelm/wizard_state.yaml``. Pre-fix the snapshot was silently destroyed at save time. - **PR-D-A7: ``docs/guides/quickstart{,-tr}.md``** scopes the "press Enter to keep" claim to numeric / text / choice prompts; documents the use-case skip and the ``setdefault`` pattern for non-prompted fields. ## UX / test coverage - **PR-D-B1: new CLI dispatcher integration test** (``test_wizard_start_from_threads_through_to_run_wizard_full``) pins parser → dispatcher → ``run_wizard_full`` thread-through. - **PR-D-B3: ``--wizard-start-from`` without ``--wizard``** now prints a one-line warning instead of silently no-op'ing. - **PR-D-B5: ``_collect_data_governance(existing=...)``** kwarg threads existing Article 10 free-text values through as prompt defaults. Strict-tier operators iterating from a populated YAML no longer re-type collection_method / annotation_process / known_biases. - **PR-D-B6: ``_canonical_start_from()``** helper applies ``Path(...).expanduser()`` once at entry. Pre-fix ``--wizard-start-from ~/configs/x.yaml`` created a literal ``~/configs/x.yaml`` directory on save because the load did ``expanduser`` but the save flow's overwrite check did not. ## Tests 13 new tests under ``tests/test_wizard_phase22.py`` covering every fix above: - ``TestPRDA1StrategyHonorsExisting`` (3): DoRA / target_modules / GaLore preserved on Enter. - ``TestPRDA2EvaluationHonorsExisting`` (2): benchmark + judge preserved when declining re-prompt; auto-revert default reflects existing state. - ``TestPRDA3UseCaseSkipsWhenExisting`` (1): early-return with notice when start-from path + populated model/trainer. - ``TestPRDA4WelcomeBackendSetdefault`` (1): existing backend survives Linux+GPU branch. - ``TestPRDA5ImportErrorBypass`` (1): valid YAML happy path still validates. - ``TestPRDA6ResumeStateWarning`` (1): in-progress snapshot warning emitted. - ``TestPRDB3StartFromWithoutWizardWarns`` (1): silent no-op warning. - ``TestPRDB5GovernanceHonorsExisting`` (1): existing free-text values preserved on Enter. - ``TestPRDB6ExpanduserCanonicalisation`` (1): ``~`` expansion at entry. Plus 1 dispatcher integration test (``test_wizard_start_from_threads_through_to_run_wizard_full``) under ``tests/test_cli_phase10.py``. ## Validation - ruff check + format clean (187 files). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency (439 invocations), yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync. - module_size: no new warnings. - ``forgelm --config config_template.yaml --dry-run`` green. - 1620 passed, 14 skipped (full suite, was 1607). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two-track cleanup landed atomically: ## Track 1 — Working-memory directories now strictly gitignored ``docs/marketing/`` and ``docs/analysis/`` are operator-local research + audit + draft notes that never appear in fresh clones. Pre-fix, 11 ``docs/analysis/code_reviews/*.md`` files were exempted via inverse ``.gitignore`` rules and referenced from the public tree (CHANGELOG, standards, roadmap, design docs, tests). Every such reference rotted into a 404 the moment the maintainer touched the local working-memory tree. - **`.gitignore` simplified.** Dropped 11 ``!docs/analysis/code_reviews/<file>`` exception lines that re-exposed specific closure-plan and master- review files; the directory-level ignore now applies uniformly. - **``git rm --cached -r docs/analysis/``** untracked the 11 leaked files (kept on disk locally; just no longer versioned in this repo). ## Track 2 — Public-tree reference cleanup 19 references removed / restructured. Where the citation was the load-bearing anchor of a sentence, the surrounding text was rewritten to convey the same intent without the file pointer. - **`CHANGELOG.md`**: 6 references removed (review-cycle 2 audit citation, Phase 22 cycle-1 review citation, closure-plan twice, wave2a fix-summary, master-review-opus, two `Documentation` links) — replaced with PR-thread / roadmap pointers. - **`CLAUDE.md`**: 2 references rewritten — the "Common pitfalls" block dropped its QKV-Core / Trion citations; the memory section rewrote the bullet to forbid future references with a clear rule. - **`docs/standards/coding.md`**: master-review-opus citation removed; "anti-patterns" intro rewritten as project lessons. - **`docs/standards/documentation.md`**: the "researcher" audience row dropped; new top-level "Working-memory directories" section spells out the no-references rule with rationale. - **`docs/standards/localization.md`**: the ``docs/analysis/**`` bilingual-policy row dropped (no public-tree reference == no bilingual concern). - **`docs/design/iso27001_soc2_alignment.md`**: removed the marketing/strategy citation and a self-referential "this file lives under ``docs/analysis/``" note (it lives under ``docs/design/``). - **`docs/roadmap.md`** / **``phase-12-6-closure-cycle.md``** / **``phase-12-data-curation-maturity.md``**: closure-plan and marketing/strategy citations rewritten to point at public roadmap / CHANGELOG / CLAUDE.md anchors instead. - **`tests/test_wizard_phase22.py`** + **``test_check_bilingual_parity.py``**: test docstrings softened to describe the directory category (working-memory) rather than naming specific analysis files. - **`tools/check_anchor_resolution.py`** + **``check_yaml_snippets.py``**: comments + docstrings softened. Functional path-string filters (``_SKIP_PATH_FRAGMENTS = ("docs/analysis/", "docs/marketing/")``, ``--exclude analysis`` default) are KEPT as path filters (they're not citations; they're directory exclusions). - **`.claude/skills/{cut-release,sync-bilingual-docs}/SKILL.md`**: marketing/strategy template + content-strategy citations rewritten to reference local-only working memory rather than specific paths. ## Track 3 — Standard + CI guard preventing regression - **New "Working-memory directories" rule** in `docs/standards/documentation.md` codifies the ban with examples + path-string-filter exemption. - **New CI guard `tools/check_no_analysis_refs.py`** scans every git-tracked file (uses ``git ls-files``, so the gitignored dirs are excluded by construction) and fails the run on any citation into the working-memory tree. Per-file ``_EXEMPT`` allowlist for policy statements (``CLAUDE.md``, ``docs/standards/documentation.md``, ``.gitignore``, the skill that warns about the dirs) and functional path filters; each entry carries a written justification comment. - **CLAUDE.md self-review chain** extended to 9 commands; tools/ listing names ``check_no_analysis_refs``. ## Validation - ruff check + format clean (188 files including the new guard). - 10 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency (439 invocations), yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync, **check_no_analysis_refs** (new — clean). - module_size: no new warnings. - ``forgelm --config config_template.yaml --dry-run`` green. - 1620 passed, 14 skipped (full suite). ## Site verification (read-only side-effect) The audit pass produced a 270-line site verification report (3 CRITICAL + 5 HIGH + 6 MEDIUM + 3 LOW findings on ``site/`` claims vs. live code). The report itself lives in the now- gitignored ``docs/analysis/`` working-memory tree per the new policy; follow-up site fixes will be tracked as separate PRs referencing the live code, not the report. Top three site findings to address in a follow-up PR: 1. ``forgelm.api`` import path advertised on ``index.html`` / ``features.html`` does not exist (real surface is ``from forgelm import ForgeTrainer, audit_dataset, ...``). 2. ``forgelm quickstart byod`` is shown on ``index.html`` but isn't a real template (BYOD is a wizard sub-flow, not a quickstart key). 3. ``compliance.html`` cites ``forgelm/data_audit.py`` as a single-file module; it became a sub-package in Phase 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-09T10:08:44Z

Quality Gate failed

Failed conditions
10 Security Hotspots
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Promotes the [Unreleased] block into the [0.5.5] section under a new "Post-merge follow-up" subsection that captures the three PRs landed on top of the paper-merged closure cycle: - PR #40 (2026-05-08) — Phase 22 CLI wizard parity-with-web - PR #41 (2026-05-09) — Site documentation correction sweep + i18n parity backfill - PR #42 (2026-05-10) — Release-prep + nightly pip-audit gate fix + two review-absorption rounds The CHANGELOG [0.5.5] header date is bumped to 2026-05-10 (the actual PyPI release date) and a fresh empty [Unreleased] block is added at the top. __api_version__ stays at 1.0.0 — Phase 22 is a CLI/wizard surface change, not a Python library API addition; the 30-symbol forgelm.__all__ surface that this release anchors was already added during the Phase 12.6 closure cycle (paper-merged as v0.5.5). Roadmap (en + tr) marks v0.5.5 as released; "Latest release on PyPI" flipped from v0.5.0 to v0.5.5; "Current state" sentence simplified (no more "merged but tag deferred" framing). releases.md drops the "Next PyPI release (slot pending)" section since v0.5.5 now bundles everything that was staged in [Unreleased]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sourcery-ai Bot reviewed May 8, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

cemililik and others added 6 commits May 9, 2026 01:58

docs(changelog): Phase 22 wizard PR #40 review-cycle 1

7d08bba

CHANGELOG entry for the c0dd57a fix sweep — Critical / Major / Medium / Minor classifications + the orchestrator coverage bump that landed alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cemililik and others added 4 commits May 9, 2026 11:42

cemililik merged commit 1aa457b into development May 9, 2026
2 of 4 checks passed

coderabbitai Bot mentioned this pull request May 9, 2026

fix(site): pass-5 closure — site-vs-code honesty sweep + i18n parity #41

Merged

17 tasks

cemililik mentioned this pull request May 9, 2026

release: development → main — Phase 22 wizard modernisation + site honesty pass-5 #42

Merged

8 tasks

cemililik deleted the feat/phase22-wizard-modernisation branch May 10, 2026 11:36

	try:
	try:
	with open(requested_filename, "w", encoding="utf-8") as f:

Conversation

cemililik commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

CLI wizard (forgelm/wizard/)

Web wizard

Mimari

Docs

Tests

i18n kararı

Test plan

Reviewing

Summary by CodeRabbit

Release Notes

Uh oh!

qodo-code-review Bot commented May 8, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codacy-production Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

cemililik commented May 9, 2026

Reviewer findings — resolution summary (HEAD: 9fe85df)

Commit chain on this PR

Per-comment status

Roadmap reshuffle (explicit)

Validation at HEAD 9fe85df

Uh oh!

sonarqubecloud Bot commented May 9, 2026

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

cemililik commented May 8, 2026 •

edited by coderabbitai Bot

Loading

CLI wizard (`forgelm/wizard/`)

coderabbitai Bot commented May 8, 2026 •

edited

Loading

codacy-production Bot commented May 8, 2026 •

edited

Loading

Reviewer findings — resolution summary (HEAD: `9fe85df`)

Validation at HEAD `9fe85df`