feat(wizard): Phase 22 — CLI wizard parity with web + sub-package split#40
Conversation
Closes the parity gap between `forgelm --wizard` (CLI) and the
in-browser `site/js/wizard.js` documented in
`docs/analysis/code_reviews/2026-05-07_cli_wizard_ux_analysis.md`
+ `-2.md`. 20 original findings (G1-G20) and 5 independent
findings (I1-I5) were verified against current code; this PR
addresses the actionable subset.
What's covered (CLI):
- **Step reorder (I1):** `compliance.risk_classification` is now
collected BEFORE `evaluation.safety` so the F-compliance-110
strict-tier auto-coercion can fire reliably. The v0.5.5 wizard
asked safety first, then risk — silently producing high-risk
configs that failed to load.
- **Trainer-specific hyperparameters (G1):** `dpo_beta`,
`simpo_beta` / `simpo_gamma`, `kto_beta`, `orpo_beta`,
`grpo_num_generations`, `grpo_max_completion_length`,
`grpo_reward_model` are now surfaced per `trainer_type`. SFT
short-circuits.
- **PEFT method breadth (G2):** the strategy step now offers all
four schema-supported `lora.method` values (`lora`, `dora`,
`pissa`, `rslora`) plus GaLore as a separate axis (6 cards
total).
- **GaLore optimiser variants (G9):** all six `galore_optim`
Literal values surfaced, including the three `_layerwise`
siblings.
- **RoPE scaling types (G10):** `longrope` joined `linear` /
`dynamic` / `yarn` (4-of-4 schema coverage).
- **LoRA `r` schema parity (G11):** `DEFAULT_LORA_R` lowered from
`16` → `8` to match `LoraConfigModel.r`.
- **Compliance depth (G5):** `compliance` (Article 11 + Annex IV
§1) plus optional `risk_assessment` (Article 9),
`data.governance` (Article 10), `retention` (GDPR Article
5(1)(e) + 17), `monitoring` (Article 12+17),
`evaluation.benchmark`, `evaluation.llm_judge`, `synthetic`
blocks now configurable.
- **High-risk auto-coercion (G8):** `risk_classification ∈
{high-risk, unacceptable}` automatically enables
`evaluation.safety` + `evaluation.require_human_approval` with a
visible operator notice. Front-stops the schema-side
`F-compliance-110` `ConfigError`.
- **Webhook URL parsing (G15):** single prompt accepts a literal
URL or `env:VAR_NAME` reference; `urlparse`-validated; HTTPS
recommended (HTTP accepted with warning).
- **Safety probe path (G16):** bundled probe set resolved through
`importlib.resources.files("forgelm.safety_prompts")`, fixing
the `pip install` regression where
`configs/safety_prompts/general_safety.jsonl` was the wizard
default but never shipped in the wheel.
- **Configuration summary (G17):** `_print_wizard_summary` now
dumps the full YAML alongside the labelled headline.
- **Persistence (G6):** state snapshot at
`$XDG_CACHE_HOME/forgelm/wizard_state.yaml` (or
`~/.cache/forgelm/wizard_state.yaml`) saved after every
completed step. Schema versioned (`v: 1`); cleared on
successful completion.
- **Step-diff preview (G7):** each completed step prints `+
key.path: value` / `~ key.path: before → after`.
- **Beginner / expert toggle (G13):** beginner mode prefixes each
step with a 2-3-line tutorial paragraph.
- **Use-case integration (G12):** the curated quickstart-template
list is also offered as Step 2 of the full flow, seeding
defaults for later steps.
- **`back` / `reset` navigation (G3):** `back` / `b` returns to
the previous step; `reset` / `r` clears in-memory state.
- **`POPULAR_MODELS` alignment with web (G14):** CLI list now
matches `site/js/wizard.js`'s preset cards.
What's covered (web):
- **Header comment (I2):** `site/js/wizard.js` opening doc
block updated from "7 steps" to "9 steps" with correct list.
- **Use-case key alignment (I4):** `USE_CASE_PRESETS` keys
renamed (`code-copilot` → `code-assistant`, `medical-tr` →
`medical-qa-tr`) so `forgelm/quickstart.py::TEMPLATES` is the
single source of truth.
What's covered (docs):
- **Design doc (I5):** `docs/design/wizard_mode.md` rewritten
to describe the actual 9-step flow, sub-package layout, and
Phase 22 modernisation status banner.
- **Architecture reference:** `docs/reference/architecture.md`
+ `-tr.md` updated for the `forgelm/wizard/` sub-package.
- **Quickstart guide:** `docs/guides/quickstart.md` + `-tr.md`
updated with the new flow description.
- **CHANGELOG.md:** comprehensive Phase 22 entry under
`[Unreleased]` / `### Added`.
- **Roadmap link fix:** `docs/roadmap/phase-11-5-backlog.md`
now points at `forgelm/wizard/_byod.py` rather than the old
monolith.
Architecture: `forgelm/wizard.py` (976 LOC monolith) split into
the `forgelm/wizard/` sub-package per
`docs/standards/architecture.md`'s 1000-line ceiling rule:
- `_io.py` (213 LOC) — primitive prompts + navigation tokens +
hardware detection.
- `_state.py` (290 LOC) — `_WizardState`, persistence, summary,
schema-default constants.
- `_collectors.py` (688 LOC) — every `_collect_*` helper +
strategy choice + use-case preset registry.
- `_byod.py` (333 LOC) — Phase 11.5 / 12.5 BYOD inline ingest +
audit helpers + quickstart prelude.
- `_orchestrator.py` (566 LOC) — step definitions,
`_drive_wizard_steps`, `_apply_strict_tier_coercion`,
`run_wizard`.
- `__init__.py` (176 LOC) — re-export shim for backward
compatibility.
Tests: 55 new tests in `tests/test_wizard_phase22.py` cover
`_parse_webhook_value`, `_default_safety_probes_path`,
`_collect_trainer_hyperparameters`, `_check_navigation_token`,
`_apply_strict_tier_coercion`, persistence helpers,
`_print_step_diff`, `_collect_webhook_config`, use-case preset
registry parity with `quickstart.py::TEMPLATES`, strategy choice
shape, GaLore optimiser variants, and schema-default parity for
LoRA r / alpha / epochs / batch / LR. Existing
`tests/test_wizard_phase11.py` test_print_wizard_summary cases
updated to use the new dict-based API.
Validation: 8/8 doc CI guards green (bilingual_parity,
anchor_resolution, cli_help_consistency, yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks); ruff check + format clean across the
package; module-size guard reports 0 over the fail-threshold;
`forgelm --config config_template.yaml --dry-run` passes;
strict-tier auto-coercion produces a config that
`ForgeConfig.model_validate` accepts.
i18n decision: CLI wizard remains English-only per operator
explicit instruction. Web wizard's 6-language `wizard.*`
catalogue (en/tr/de/fr/es/zh) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Sorry @cemililik, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| BestPractice | 1 minor |
| ErrorProne | 5 high |
| Security | 2 medium 1 minor |
| Comprehensibility | 1 minor |
🟢 Metrics 671 complexity · 2 duplication
Metric Results Complexity 671 Duplication 2
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request refactors the forgelm/wizard.py monolith into a modular sub-package forgelm/wizard/ and modernizes the CLI wizard to achieve parity with the web wizard's 9-step flow. It introduces support for advanced training configurations, compliance metadata, and improved dataset ingestion and auditing. The review feedback highlights opportunities to enforce mandatory compliance fields, fix index validation in menu prompts, correct default index logic for model selection, and ensure consistent UTF-8 encoding for file operations.
| intended_use = ( | ||
| _prompt("Article 9(2)(a): intended_use", "") | ||
| if is_strict | ||
| else _prompt("intended_use (Article 9(2)(a)) — optional", "") | ||
| ) |
There was a problem hiding this comment.
Under the EU AI Act, Article 9 risk management evidence is mandatory for high-risk and unacceptable systems. The intended_use field should be enforced as non-empty using _prompt_required when is_strict is true, rather than allowing an empty default.
References
- EU AI Act Article 9(2)(a) requires documentation of the intended purpose of the system. (link)
- High-risk AI systems must have a risk management system that includes documentation of intended use and foreseeable misuse.
| "collection_method": _prompt("Article 10(2)(b): how was data collected?", ""), | ||
| "annotation_process": _prompt("Article 10(2)(b): annotation methodology", ""), | ||
| "known_biases": _prompt("Article 10(2)(f): known_biases", ""), |
There was a problem hiding this comment.
Article 10 data governance evidence is mandatory for strict risk tiers. When mandatory is true, the collection_method, annotation_process, and known_biases fields should be enforced as non-empty using _prompt_required.
References
- EU AI Act Article 10 requires documentation of data collection, annotation, and bias assessment. (link)
- High-risk AI systems must use high-quality datasets and have appropriate data governance and management practices.
| def _collect_compliance_metadata() -> Dict[str, Any]: | ||
| """Article 11 + Annex IV §1: provider + system metadata.""" | ||
| return { | ||
| "provider_name": _prompt("Organization (legal-entity) name", ""), | ||
| "provider_contact": _prompt("Provider regulatory contact (email or phone)", ""), | ||
| "system_name": _prompt("Human-readable system name", ""), | ||
| "intended_purpose": _prompt("Intended purpose of the system", ""), | ||
| "known_limitations": _prompt("Known limitations operator wants documented (free-text)", ""), | ||
| "system_version": _prompt("System version string", "v0.1.0"), | ||
| "risk_classification": _prompt_choice( | ||
| "Risk classification (mirrored at risk_assessment.risk_category):", | ||
| list(_RISK_TIERS), | ||
| default=2, # ``minimal-risk`` — same default as the Pydantic field. | ||
| ), | ||
| } |
There was a problem hiding this comment.
The _collect_compliance_metadata function collects the risk classification at the end of the block. For high-risk or unacceptable tiers, Article 11 and Annex IV §1 require specific provider and system metadata. To ensure compliance and provide a better user experience (matching the web wizard's behavior), the risk classification should be prompted first. This allows the function to use _prompt_required for mandatory fields when a strict tier is selected.
| _check_navigation_token(choice) | ||
| try: | ||
| idx = int(choice) if choice else default | ||
| return options[idx - 1] |
There was a problem hiding this comment.
The _prompt_choice function does not validate if the user-provided index is within the positive bounds of the options list. If a user enters 0, Python's negative indexing will return the last element of the list (options[-1]), which is counter-intuitive for a 1-based menu and may lead to unintended configuration choices.
| return options[idx - 1] | |
| if 1 <= idx <= len(options): | |
| return options[idx - 1] | |
| return options[default - 1] |
| if preset_default and preset_default in POPULAR_MODELS: | ||
| default_idx = POPULAR_MODELS.index(preset_default) + 1 | ||
| else: | ||
| default_idx = 1 |
There was a problem hiding this comment.
The logic for determining default_idx in the model selection step clobbers the user's current choice if it is not present in the POPULAR_MODELS list (e.g., a model preselected by a use-case template like code-assistant). It should default to the 'Custom' option index when a non-popular model is already set in the state.
| if preset_default and preset_default in POPULAR_MODELS: | |
| default_idx = POPULAR_MODELS.index(preset_default) + 1 | |
| else: | |
| default_idx = 1 | |
| if preset_default: | |
| if preset_default in POPULAR_MODELS: | |
| default_idx = POPULAR_MODELS.index(preset_default) + 1 | |
| else: | |
| default_idx = len(options) # Default to 'Custom' | |
| else: | |
| default_idx = 1 |
|
|
||
| def _save_config_to_file(config: Dict[str, Any], requested_filename: str) -> str: | ||
| """Write *config* as YAML; falls back to a unique filename on OSError.""" | ||
| try: |
There was a problem hiding this comment.
The configuration file should be written using UTF-8 encoding to ensure portability and correct handling of non-ASCII characters (e.g., in provider names or intended purposes), consistent with how the wizard state is saved.
| try: | |
| try: | |
| with open(requested_filename, "w", encoding="utf-8") as f: |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/wizard_mode.md`:
- Around line 33-79: The ordered list entries (markers "0." through "9.") in the
wizard flow block use two spaces after the list marker which triggers MD030;
update each ordered-list line to use a single space after the numeric marker
(e.g., change "0. **Quickstart..." to "0. **Quickstart...") so the markdownlint
warning is resolved while keeping the exact text and formatting of the list
items (refer to the numbered entries in the wizard flow block).
In `@docs/reference/architecture.md`:
- Line 149: Update the heading and corresponding section text in
docs/reference/architecture.md so the wizard component reads and is titled
"wizard/" (reflecting the forgelm/wizard/ sub-package) instead of "wizard.py",
and make the identical change in docs/reference/architecture-tr.md to keep EN/TR
mirrors consistent; also scan the wizard section for any remaining monolithic
phrasing (e.g., references to a single-file implementation) and adjust wording
to reference the package layout and functions like _offer_ingest_for_directory
and _offer_audit_for_jsonl where the doc currently implies a monolith.
In `@forgelm/wizard/_byod.py`:
- Around line 80-83: The printed shell hints use raw interpolated paths
(variable resolved) inside _print calls, which breaks for paths with
spaces/metacharacters; update each occurrence (the _print call around the first
snippet and the other hints at the locations mentioned) to wrap the path with
shlex.quote(resolved) (import shlex if not present) so the displayed commands
are safely quoted and copy-pastable; ensure every formatted string that inserts
resolved (and any similar path variables) uses shlex.quote(...) in the f-strings
for the three occurrences referenced.
- Around line 194-209: The prompt currently accepts any non-directory string as
valid; change _prompt_dataset_path_with_ingest_offer to reuse the quickstart
validation flow: after reading raw, set candidate = Path(raw).expanduser(); if
candidate.is_dir() keep the _offer_ingest_for_directory path; else if
candidate.exists() and candidate.is_file() and candidate.suffix.lower() in
(".jsonl", ".json") call _offer_audit_for_jsonl(candidate.resolve()) and return
the resolved path; else if candidate.exists() but is not a valid JSONL, surface
an error and continue; if candidate does not exist, call the existing
_validate_local_jsonl(...) (or the project’s HF-id fallback helper) to validate
whether raw is a valid local JSONL or a Hugging Face dataset id and only return
raw when that validation succeeds, otherwise show an error and loop. Ensure you
reference and reuse _offer_ingest_for_directory, _offer_audit_for_jsonl, and
_validate_local_jsonl in the updated logic.
In `@forgelm/wizard/_collectors.py`:
- Around line 151-153: The default for webhook start notifications is wrong;
change the call that sets the default for "notify_on_start" so it matches the
web wizard (set to False) instead of True; specifically update the line that
calls section.setdefault("notify_on_start", True) to
section.setdefault("notify_on_start", False) while leaving the other keys
("notify_on_success", "notify_on_failure") unchanged to keep parity across
surfaces.
- Around line 412-424: When mandatory=True, _collect_data_governance currently
still allows empty strings for "collection_method", "annotation_process", and
"known_biases"; change the prompts so those three fields are required when
mandatory is True by re-prompting until a non-empty response (e.g., loop on
_prompt for "collection_method", "annotation_process", and "known_biases" when
mandatory) or by using a required/validate parameter if _prompt supports it;
ensure the rest of the function returns the same keys ("collection_method",
"annotation_process", "known_biases", "personal_data_included",
"dpia_completed") but with non-empty values for the three text fields when
mandatory is set.
- Around line 389-405: The strict-tier branch currently treats Article 9 fields
as optional — ensure when is_strict is True that intended_use,
foreseeable_misuse and mitigation_measures cannot be empty by re-prompting or
validating input: for intended_use, replace the single _prompt call with logic
that loops (or validates) until a non-empty string is returned when is_strict is
True (still allow empty when False); for foreseeable_misuse and
mitigation_measures, use _prompt_optional_list but enforce at least one item
when is_strict is True by re-prompting until the returned list is non-empty;
keep vulnerable_groups_considered behavior unchanged. Target the calls to
_prompt("Article 9(2)(a): intended_use", ""), _prompt_optional_list("Article
9(2)(b): foreseeable_misuse — list at least one realistic misuse"), and
_prompt_optional_list("Article 9(2)(c): mitigation_measures the deployer
applies") and add the minimal loop/validation around them keyed to is_strict.
In `@forgelm/wizard/_io.py`:
- Around line 191-193: The CUDA VRAM lookup uses
torch.cuda.get_device_properties(0).total_mem which doesn't exist and raises an
AttributeError; update the VRAM calculation in the block that sets
info["vram_gb"] (the code using torch.cuda.get_device_properties in
forgelm/wizard/_io.py) to use the correct attribute total_memory and keep the
same rounding/GB conversion so info["vram_gb"] =
round(torch.cuda.get_device_properties(0).total_memory / (1024**3), 1).
In `@forgelm/wizard/_orchestrator.py`:
- Around line 500-511: The WizardBack handler fails to restore the prior config
snapshot so partial mutations leak into persisted state; when catching
WizardBack in the block that calls step.runner, restore state.config from the
previously saved prev_config (deep copy) before decrementing state.current_step
and before calling _persist_state, and ensure any changes to
state.completed_steps still reflect the rolled-back step (use prev_config to
decide if completed_steps needs adjustment); update the exception branch around
step.runner, prev_config, state.config, WizardBack, state.current_step,
state.completed_steps and _persist_state accordingly.
- Around line 512-515: The except block handling WizardReset should not return a
new _WizardState() because that makes _run_full_wizard() treat the reset as a
completed run; instead after calling _clear_wizard_state() re-raise WizardReset
so the reset propagates to the caller and triggers the restart flow. Change the
except WizardReset block to call _clear_wizard_state() and then "raise" (not
"return _WizardState()") so _run_full_wizard and surrounding logic can handle
restarting correctly.
In `@tests/test_wizard_phase22.py`:
- Around line 299-304: The current test
test_load_returns_none_on_corrupt_snapshot uses a string that PyYAML can parse,
so update the written snapshot content to an unambiguously invalid YAML string
(e.g., an unbalanced bracket or a lone colon line) so the parser throws
yaml.YAMLError; keep using wizard._wizard_state_path() to create the file and
assert wizard._load_wizard_state() is None so the corrupt-YAML except branch is
exercised.
- Around line 88-96: The tests are inconsistent: wizard._parse_webhook_value
currently raises ValueError with "must use `https://`" for unknown schemes while
http:// is accepted with a warning; update the ValueError text in
wizard._parse_webhook_value to state the real rule (e.g., "must use http:// or
https://") and then update the test test_unknown_scheme_rejected to expect that
new message so the rejection for ftp:// matches the acceptance behavior for
http://; references: function wizard._parse_webhook_value and test
test_unknown_scheme_rejected.
- Around line 104-120: Tests TestDefaultSafetyProbesPath
(test_resolves_to_packaged_jsonl and test_path_is_absolute_or_resolvable) make
unconditional on-disk assertions against the path returned by
wizard._default_safety_probes_path, which will fail in source-only checkouts;
either mock importlib.resources.files (or the resource lookup used by
wizard._default_safety_probes_path / pkgutil.get_data) to return a temporary
file path or gate the tests with pytest.mark.skipif that checks the resource
exists before asserting Path(...).is_file() so the tests do not assume
package-data is present in non-installed/dev environments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 09191bc6-c98c-4257-a2a8-64ad91848f51
📒 Files selected for processing (18)
CHANGELOG.mddocs/design/wizard_mode.mddocs/guides/quickstart-tr.mddocs/guides/quickstart.mddocs/reference/architecture-tr.mddocs/reference/architecture.mddocs/roadmap/phase-11-5-backlog.mdforgelm/wizard.pyforgelm/wizard/__init__.pyforgelm/wizard/_byod.pyforgelm/wizard/_collectors.pyforgelm/wizard/_io.pyforgelm/wizard/_orchestrator.pyforgelm/wizard/_state.pysite/js/wizard.jstests/test_phase12_5.pytests/test_wizard_phase11.pytests/test_wizard_phase22.py
💤 Files with no reviewable changes (1)
- forgelm/wizard.py
| def test_unknown_scheme_rejected(self): | ||
| with pytest.raises(ValueError, match="must use `https://`"): | ||
| wizard._parse_webhook_value("ftp://example.com/hook") | ||
|
|
||
| def test_http_warning_but_accepted(self, capsys): | ||
| section = wizard._parse_webhook_value("http://hooks.example.com/x") | ||
| assert section == {"url": "http://hooks.example.com/x"} | ||
| captured = capsys.readouterr().out | ||
| assert "uses HTTP, not HTTPS" in captured |
There was a problem hiding this comment.
Error message "must use \https://`"` is inconsistent with HTTP being accepted.
test_unknown_scheme_rejected (Line 89) asserts that ftp:// raises ValueError with the message "must use \https://`", but test_http_warning_but_accepted(Lines 92–96) showshttp://is silently accepted with only a warning. An operator who hits theftp://rejection would read the error as "only HTTPS is allowed", then be surprised to findhttp://works. The error message should reflect the actual acceptance rule — e.g.,"must use https:// or http://"`.
🐛 Suggested test correction to match the intended error message fix
def test_unknown_scheme_rejected(self):
- with pytest.raises(ValueError, match="must use `https://`"):
+ with pytest.raises(ValueError, match="must use `https://` or `http://`"):
wizard._parse_webhook_value("ftp://example.com/hook")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_unknown_scheme_rejected(self): | |
| with pytest.raises(ValueError, match="must use `https://`"): | |
| wizard._parse_webhook_value("ftp://example.com/hook") | |
| def test_http_warning_but_accepted(self, capsys): | |
| section = wizard._parse_webhook_value("http://hooks.example.com/x") | |
| assert section == {"url": "http://hooks.example.com/x"} | |
| captured = capsys.readouterr().out | |
| assert "uses HTTP, not HTTPS" in captured | |
| def test_unknown_scheme_rejected(self): | |
| with pytest.raises(ValueError, match="must use `https://` or `http://`"): | |
| wizard._parse_webhook_value("ftp://example.com/hook") | |
| def test_http_warning_but_accepted(self, capsys): | |
| section = wizard._parse_webhook_value("http://hooks.example.com/x") | |
| assert section == {"url": "http://hooks.example.com/x"} | |
| captured = capsys.readouterr().out | |
| assert "uses HTTP, not HTTPS" in captured |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_wizard_phase22.py` around lines 88 - 96, The tests are
inconsistent: wizard._parse_webhook_value currently raises ValueError with "must
use `https://`" for unknown schemes while http:// is accepted with a warning;
update the ValueError text in wizard._parse_webhook_value to state the real rule
(e.g., "must use http:// or https://") and then update the test
test_unknown_scheme_rejected to expect that new message so the rejection for
ftp:// matches the acceptance behavior for http://; references: function
wizard._parse_webhook_value and test test_unknown_scheme_rejected.
| def test_load_returns_none_on_corrupt_snapshot(self, isolated_state_dir): | ||
| path = wizard._wizard_state_path() | ||
| path.parent.mkdir(parents=True, exist_ok=True) | ||
| # Invalid YAML — must not raise; just silently miss. | ||
| path.write_text("not: valid: yaml: at: all", encoding="utf-8") | ||
| assert wizard._load_wizard_state() is None |
There was a problem hiding this comment.
"not: valid: yaml: at: all" may be parsed as valid YAML, silently exercising version-mismatch rather than corrupt-data handling.
PyYAML parses not: valid: yaml: at: all as {"not": "valid: yaml: at: all"} (the first : starts a block mapping; the remainder is a scalar value). If the parser succeeds, _load_wizard_state returns None due to a version-mismatch check, not because the except yaml.YAMLError branch was reached. The corrupt-YAML code path goes untested.
Use a string that is unambiguously unparseable:
🛡️ Proposed fix
- path.write_text("not: valid: yaml: at: all", encoding="utf-8")
+ # Tabs in block mappings are illegal in YAML; this guarantees a ScannerError.
+ path.write_text("key:\n\t- value", encoding="utf-8")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_wizard_phase22.py` around lines 299 - 304, The current test
test_load_returns_none_on_corrupt_snapshot uses a string that PyYAML can parse,
so update the written snapshot content to an unambiguously invalid YAML string
(e.g., an unbalanced bracket or a lone colon line) so the parser throws
yaml.YAMLError; keep using wizard._wizard_state_path() to create the file and
assert wizard._load_wizard_state() is None so the corrupt-YAML except branch is
exercised.
Addresses inline review comments + verified user findings on the
Phase 22 wizard modernisation PR. Critical / major / minor mix:
Critical (2):
- _io.py: torch.cuda.get_device_properties().total_mem → total_memory
(the v0.5.5 bug raised AttributeError on real CUDA hardware).
- _orchestrator.py: WizardReset re-loops with a fresh _WizardState
instead of returning early; returning let _run_full_wizard treat
the reset as a completed run and try to save an empty config.
Major (6):
- _orchestrator.py: WizardBack now restores the prev_config snapshot
so a half-completed step's mutations don't leak into the previous
step's prompts.
- _orchestrator.py: deep copy now uses copy.deepcopy instead of
json.loads(json.dumps(...)) — the JSON path was lossy for tuples
and sets that may sneak in via collector helpers.
- _orchestrator.py: _step_model preserves a use-case preselect that
isn't in POPULAR_MODELS (defaults to the Custom slot pre-filled
with the operator's earlier choice).
- _byod.py: _prompt_dataset_path_with_ingest_offer validates that
the typed value is a directory, JSONL/JSON file, or HF Hub ID;
bare typos no longer silently become data.dataset_name_or_path.
- _collectors.py: Article 9 strict-tier branch enforces non-empty
intended_use / foreseeable_misuse / mitigation_measures.
- _collectors.py: Article 10 strict-tier branch enforces non-empty
collection_method / annotation_process / known_biases.
Medium (4):
- _collectors.py: notify_on_start default flips to False (web wizard
parity — start pings are noisy).
- _collectors.py: _collect_compliance_metadata collects
risk_classification first so strict-tier hint shows up before
Article 11 fields and downstream Article 9/10 branches see the
correct context.
- _state.py: _save_config_to_file uses yaml.safe_dump + UTF-8 +
allow_unicode (yaml.dump could emit !!python/object tags that
ForgeConfig then rejects). Same treatment for the YAML preview.
- _state.py: chmod 0o600 on wizard_state.yaml (best-effort) —
compliance metadata can be sensitive.
- _io.py: _prompt_choice guards negative idx and out-of-range int
alongside the existing IndexError fallback.
Minor (3):
- _byod.py: shlex.quote() the path in BYOD shell-hint output so
paths with spaces survive copy-paste into a real shell.
- docs/design/wizard_mode.md: MD030 — single space after ordered
list markers (was double).
- docs/reference/architecture{,-tr}.md: heading wizard.py → wizard/
(the module became a sub-package in Phase 22).
Coverage (HIGH):
- tests/test_wizard_phase22.py adds end-to-end coverage for
_drive_wizard_steps (back-restores-config, reset-re-loops,
back-at-first-step is no-op, persist-after-each-step) plus
_maybe_resume_state (resume-from-snapshot, decline-clears) and
_step_welcome (backend-hint + beginner branch). Orchestrator
module coverage rises from 14% → 30% on this file alone.
All checks green:
- ruff check + format clean
- 1544 passed, 14 skipped (full suite)
- 8 doc guards: bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks
- module_size: no new warnings
- forgelm --config config_template.yaml --dry-run
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHANGELOG entry for the c0dd57a fix sweep — Critical / Major / Medium / Minor classifications + the orchestrator coverage bump that landed alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 22 review-cycle 2: addresses 17 newly-discovered parity / UX gaps surfaced during the post-PR-#40 audit (P1-P18 + B1-B5 + C1-C2 + D1-D2 + E1+E4). All five HIGH-severity findings closed; both wizards (CLI sub-package + site/js/wizard.js) tightened in lockstep. ## Critical bug fixes - B-NEW-1: ``_print_step_diff`` strips internal ``_wizard_meta.*`` keys before showing the operator the per-step diff (was leaking ``_wizard_meta.use_galore`` from the strategy step). - B-NEW-2: ``_step_compliance`` skips the Article 10 governance re-prompt when step 6 already populated it under non-strict tier (was silently overwriting earlier answers). - B-NEW-3: ``_save_wizard_state`` now writes via ``tempfile`` + ``fsync`` + ``os.replace`` so a SIGKILL / power loss / concurrent wizard process can never leave a half-written state file. - B5: strict-tier coercion fires at the end of ``_step_compliance`` too (not just ``_step_evaluation``), so a Ctrl-C between steps 8 and 9 still produces a loadable YAML if the operator manually saves. - A1: ``_parse_webhook_value`` SSRF preflight — rejects loopback / RFC1918 / link-local destinations up front by reusing ``forgelm._http._is_private_destination``. ## UX guardrails - B1/E1: ``_validate_generated_config`` runs ``ForgeConfig.model_validate`` on the saved YAML before exit. Schema rejections surface inline. - B2: ``_prompt_unique_filename`` + ``_next_free_filename`` — overwrite confirmation with auto-suffix (no more silent clobbers). - B3: ``run_wizard_full`` refuses to launch on non-tty stdin and points operators at ``forgelm quickstart <template>`` for scripted runs (closes the silent-empty-config CI hole). - E4: ``_print_preflight_checklist`` — GPU/VRAM/dataset/risk-tier signals before the configuration summary. ## CLI ↔ web parity - P1+P18: safety-block field union — CLI now emits ``classifier`` + ``max_safety_regression`` (and ``min_classifier_confidence`` under confidence_weighted scoring); web wizard now emits ``scoring`` + ``track_categories`` + ``severity_thresholds``. Both surfaces produce structurally equivalent ``evaluation.safety`` blocks. - P2: judge ``min_score`` default 6.5 → 5.0 (schema parity). - P3: web wizard surfaces ``model.max_length`` as a number input on the training step (was silently capped at 2048). - P9: monitoring collector accepts ``env:VAR_NAME`` like webhook. - P13: HF Hub format hint when custom model name isn't ``<org>/<name>``. - P14: ``auto_revert.max_acceptable_loss`` default 2.0 emission. - P16: CLI emits ``bnb_4bit_quant_type`` + ``bnb_4bit_compute_dtype`` when QLoRA is selected (web parity). - C1/I3: web ``learningRate: '2e-5'``, ``batchSize: 4`` (schema parity); ``2e-5`` added to LR option list. - C2: web wizard cross-tab ``storage`` event listener. ## Distinct exit codes - D2: new ``EXIT_WIZARD_CANCELLED = 5`` constant; new ``WizardOutcome`` dataclass + ``run_wizard_full()`` entry point. Genuine cancels (Ctrl-C, non-tty, decline-to-save) now exit 5 instead of 0. Saved-and-deferred remains exit 0. - D1: best-effort ``import readline`` for arrow-key + history on Linux/macOS (Windows unaffected). ## Tests + isolation - B10: new autouse ``_isolate_wizard_state`` fixture in ``tests/conftest.py`` redirects ``XDG_CACHE_HOME`` for any ``tests/test_wizard_*`` module. - 28 new tests across the new behaviour (146 wizard tests passing in total; full suite 1573 passed / 14 skipped). - Existing ``test_wizard_still_works`` updated for the new ``WizardOutcome`` contract; new ``test_wizard_cancelled_exits_5``. ## Documentation - ``CHANGELOG.md`` — review-cycle 2 entry under [Unreleased]. - ``README.md`` — directory listing ``wizard.py`` → ``wizard/``. - ``docs/standards/architecture.md`` — Mermaid graph + table updated. - ``docs/design/wizard_mode.md`` — review-cycle 2 status banner. - ``docs/reference/usage{,-tr}.md`` — exit-code 5 row. - ``docs/usermanuals/{en,tr}/reference/exit-codes.md`` — full contract row + compatibility note (six integers, was five). - ``docs/guides/quickstart{,-tr}.md`` — operator-guardrail paragraph. ## Validation - ruff check + format clean (184 files). - 8 doc guards pass (bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks). - module_size: no new warnings. - ``forgelm --config config_template.yaml --dry-run`` green. - ``pytest tests/`` — 1573 passed, 14 skipped. ## Deferred (roadmap entries, NOT in this PR) - F1: schema-driven single-source-of-truth (``wizard_defaults.json`` build-time generation) — multi-PR refactor, frames every future default drift. - B4: ``--wizard-from-yaml answers.yaml`` mode — surface change pending policy decision on scripted-config UX. - E2: per-prompt rationale catalog — needs ~50 hand-written strings. - E3: ``--wizard --start-from <yaml>`` idempotent re-run — pairs with F1. - F3: state-versioning migration logic. - C3: web wizard mobile-CSS overflow review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the 11 actionable findings from the review-cycle 2 audit.
Low-risk batch — no architectural changes. PR-B (F1 schema-driven
defaults SOT) follows separately.
## Critical / correctness
- A3: ``_save_wizard_state`` unlinks the temp file when ``os.replace``
fails (cross-device home, EACCES, EXDEV) so the
``.wizard_state.*.tmp`` files stop accumulating. Defensive
``finally``-branch sweep also covers the case where
``yaml.safe_dump`` itself raises before the rename runs.
- A4: ``_apply_strict_tier_coercion`` notice prints once per wizard
run instead of twice (compliance-step + evaluation-step idempotent
re-call). Tracked via the ``_wizard_meta.strict_tier_announced``
flag that ``_strip_internal_meta`` removes before save.
- F3: ``_STATE_MIGRATIONS`` registry skeleton in ``_state.py`` ready
for future ``_STATE_VERSION`` bumps. Empty by design; first real
bump only adds ``{1: migrator}``, no plumbing rewrite.
## Performance
- C16: ``_detect_hardware()`` cached on ``_WizardState.hardware``;
``_step_welcome`` + ``_print_preflight_checklist`` share the
result instead of paying a second torch import + CUDA enumeration
(~50–200 ms saved per wizard run).
## Standards / observability
- G30: ``_validate_generated_config`` now emits ``logger.warning(...)``
in addition to the inline ``_print`` notice — CI / log pipelines
see "wizard output failed schema validation" without scraping
stdout (per ``docs/standards/error-handling.md`` rule 2).
- B11: web wizard's ``storage`` event listener ``console.warn``s on
a sync failure instead of silently swallowing the error.
## UX
- C3: expanded YAML accordion on ``<900px`` viewports now caps at
``38vh; overflow:auto`` so long configs no longer push the
Copy/Download/Back/Next footer off-screen.
- E2 (hedefli): inline rationale on five high-impact prompts
(``lora_r``, ``lora_alpha``, ``num_train_epochs``, ``batch_size``,
``max_length``, ``risk_classification``) so first-time operators
see the operational hint without flipping into beginner mode.
## Tests
- E22-24: new ``test_wizard_start_training_routes_through_dispatcher``
covers the previously-uncovered branch in ``_maybe_run_wizard``
where ``outcome.start_training=True`` mutates ``args.config`` and
lets the trainer pipeline run.
- 20+ additional tests: atomic-write cleanup (success / replace-fail
/ dump-fail), announce-once flag, migration registry (unknown
version / newer version / chain runs / non-advancing migrator),
hardware cache reuse, validate-warning log line.
## Documentation
- A9: ``docs/product_strategy{,-tr}.md``, ``docs/design/gdpr_erasure.md``,
``docs/design/data_audit_cli_split.md`` — "0/1/2/3/4 contract"
claims now acknowledge ``EXIT_WIZARD_CANCELLED = 5``.
- F27: ``docs/usermanuals/{en,tr}/compliance/human-approval-gate.md``
cross-references exit code 5 alongside the existing exit-4
discussion (high-risk operators using ``--wizard`` need both
signals).
- B10 doc-only: ``_parse_webhook_value`` comment now notes the SSRF
preflight catches "wrong URL at config time", not "guaranteed-
public destination at runtime" — runtime TOCTOU tracked at #14.
- CHANGELOG entry under [Unreleased] / Added.
## Roadmap reshuffle
- B4 (``--wizard-from-yaml``) removed from consideration: project
identity ("config-driven") + B3 (non-tty refusal + ``quickstart``
hand-off) closed the gap.
- E3 (``--wizard --start-from <yaml>``) explicitly bound to F1:
reverse-mapping duplicates F1's schema-driven SOT work; deferred
until F1 lands.
## Validation
- ruff check + format clean (184 files).
- 8 doc guards pass (bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks).
- module_size: no new warnings.
- 181 wizard-scope tests pass (``test_wizard_*`` + ``test_phase12_5``
+ ``test_cli_phase10``).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the long-running drift surface tracked through P1-P17 across
review-cycles 1, 2, and 3. A schema field-default change without a
matching wizard update was the single biggest source of CLI ↔ web
parity gaps; this PR turns the schema into the canonical source.
## Pydantic field metadata flag
8 wizard-relevant fields in ``forgelm/config.py`` now carry
``json_schema_extra={"wizard": True}``:
- ``ModelConfig.max_length`` (2048)
- ``LoraConfigModel.r`` / ``alpha`` / ``dropout`` (8 / 16 / 0.1)
- ``TrainingConfig.num_train_epochs`` / ``per_device_train_batch_size``
/ ``gradient_accumulation_steps`` / ``learning_rate``
(3 / 4 / 2 / 2e-5)
The flag is a no-op for ``ForgeConfig.model_validate`` — it only
surfaces in ``model_json_schema()`` output, where the generator picks
it up.
## Generator + CI guard
``tools/generate_wizard_defaults.py`` walks every nested ``BaseModel``
reachable from ``ForgeConfig``, finds wizard-flagged fields, and
writes two artefacts:
- ``forgelm/wizard/_defaults.json`` — package data (shipped via
``[tool.setuptools.package-data]``).
- ``site/js/wizard_defaults.js`` — exposes ``window.WIZARD_DEFAULTS``
for the web wizard.
``tools/check_wizard_defaults_sync.py`` re-runs the generator into a
temp comparison and fails the CI if the shipped artefacts drift from
the schema. Operators get a clear "regenerate via …" hint with the
first 5 differing lines so the fix is one command away.
## Python consumer
``forgelm/wizard/_state.py`` now reads the ``DEFAULT_*`` constants
from the shipped JSON via ``importlib.resources``. A new
``_load_defaults()`` + ``_default(section, key, fallback)`` helper
pair handles missing-file / corrupt-JSON cases by returning ``{}``,
so the hardcoded fallbacks below still keep the wizard usable on a
broken pip install.
## JS consumer
``site/js/wizard.js`` ``defaultState()`` now uses a new
``wd(section, key, fallback)`` helper to read schema-derived values
from ``window.WIZARD_DEFAULTS``. ``site/quickstart.html`` loads
``wizard_defaults.js`` BEFORE ``wizard.js`` so the global is set in
time. Special-cases ``learning_rate``: arrives as a number from
JSON (2e-5), uses ``Number.toExponential()`` to canonicalise to
the dropdown's ``'2e-5'`` string format (rather than ``String()``
which would yield ``'0.00002'`` and fail to match).
## Tests
New ``TestSchemaDrivenDefaultsSOT`` class pins the contract:
- ``test_module_constants_match_schema_via_json`` — every
``DEFAULT_*`` equals its ``Field.default``.
- ``test_defaults_json_shipped_and_loadable`` — package data lives
on disk + parses cleanly + has expected sections.
- ``test_default_helper_returns_fallback_when_section_missing`` —
graceful degradation.
- ``test_load_defaults_returns_empty_on_missing_file`` — slim install
scenario.
Full suite green: 1590 passed, 14 skipped.
## Documentation
- ``CHANGELOG.md`` — F1 entry under [Unreleased] / Added with B4
removed-from-roadmap + E3-bound-to-F1 notes.
- ``CLAUDE.md`` — self-review chain extended to 8 commands; tools/
listing names ``generate_wizard_defaults`` + ``check_wizard_defaults_sync``.
- ``pyproject.toml`` — ``[tool.setuptools.package-data]`` ships
``forgelm/wizard/_defaults.json``.
## Why a generator + JSON, not live-import?
- Web wizard is a static site — no Python runtime to query schema
defaults at page-load; JSON-as-data ships with the assets.
- Python side benefits too: low import-cost path, JSON visible in
git diffs (auditable), and round-tripping forces wizard-relevant
defaults to be JSON-serialisable primitives.
## Roadmap reshuffle
- B4 (``--wizard-from-yaml``) confirmed REMOVED from consideration.
- E3 (``--wizard --start-from <yaml>``) explicitly bound to F1:
reverse-mapping reuses this PR's metadata flags + JSON shape
(deferred until needed; no scope rush).
## Validation
- ruff check + format clean (186 files including the two new tools).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency, yaml_snippets, audit_event_catalog,
library_api_doc, doc_numerical_claims, bilingual_code_blocks,
**wizard_defaults_sync** (new).
- module_size: ``forgelm/config.py`` crossed 1000 LOC (now 1011); it
was already grandfathered.
- 1590 passed, 14 skipped (full suite).
- ``forgelm --config config_template.yaml --dry-run`` green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d test Closes the actionable findings from the independent review of cycle-3 + F1 commits (5667885 + 91d5726). Low-risk batch — internal hardening + test coverage; no operator-visible behaviour change. ## Generator robustness (PR-B follow-up) - B1: ``_walk_model`` now unwraps ``Optional[BaseModel]`` sub-blocks via the new ``_unwrap_basemodel()`` helper (uses ``typing.get_args``). Pre-fix the walk only matched bare-class annotations, so a wizard flag on a field inside an ``Optional[XxxConfig]`` sub-model was silently skipped. 10 of 14 ForgeConfig top-level fields are ``Optional[...]`` — today no flags exist on them, but a future contributor would have hit the silent- skip trap with no error from the generator and no failure from the CI guard. - B2: ``default is PydanticUndefined`` identity check (imported from ``pydantic_core``) replaces the brittle ``repr(default).startswith("PydanticUndefined")`` substring match. Stable across Pydantic v2 minor releases. - C2: module-level header documenting the JSON-serialisable contract for wizard-flagged fields (``int / float / bool / str / None / list of primitives / dict of primitives`` only). Future flags on Path / datetime / callable defaults will fail loudly at generator run; the contract is now explicit so contributors don't get surprised. ## State module hardening (PR-B follow-up) - B5: ``_default()`` now warns via ``logger.warning`` when the shipped JSON value disagrees with the hardcoded fallback literal. The JSON is canonical (returned), but drift between schema and fallback now surfaces in CI logs — eliminates the second-source-of-truth silently-stale risk. ## CI guard test coverage (PR-A follow-up) - G2: new ``tests/test_check_wizard_defaults_sync.py`` (7 tests): - corrupt-JSON triggers failure + diff message + regeneration hint. - missing-target triggers clean failure (no crash). - diff truncation works at 5 lines. - generator unwraps ``Optional[BaseModel]`` (B1 contract pinned). - required (no-default) wizard fields skipped (B2 contract pinned). - ``check_wizard_defaults_sync.py``: new ``_display_path()`` helper falls back to absolute path when target is outside ``_REPO_ROOT`` (prevents ``Path.relative_to`` crash when tests monkeypatch the target to a tmp dir). ## Test hardening (PR-A follow-up) - G3: ``test_wizard_start_training_routes_through_dispatcher`` now asserts ``ForgeConfig.model_validate(config_data)`` on the test fixture YAML before mocking the trainer. Pre-fix, a future change to ``minimal_config()`` that produced an invalid YAML would still pass this test (because the trainer pipeline is mocked BEFORE validation runs in ``_dispatch.main``), giving false-positive coverage. ## Documentation + cosmetic - A3: hardware-cache freshness caveat documented in ``_cached_hardware()`` docstring (GPU hot-plug mid-session is an acceptable edge case — operator can exit-and-resume to refresh). - F3: README directory tree now mentions ``_defaults.json`` as schema-derived defaults SOT shipped under ``forgelm/wizard/``. - B7: ``defer`` attribute on ``wizard_defaults.js`` script tag. Both wizard scripts now defer; HTML5 spec guarantees execution order (document order) so ``WIZARD_DEFAULTS`` still lands before ``defaultState()`` runs, but neither blocks HTML parsing. ## Skipped (review noted as "soft / acceptable") - C3 (``_forgelm_wizard:`` extra-key prefix): ``model_json_schema()`` is not called anywhere in the codebase — the ``wizard:true`` extra cannot leak into operator-facing artefacts today. Cost > value. - E1, E2, E3, H2: explicit refactor / defer in the review. - B4 (``--wizard-from-yaml``) confirmed REMOVED from roadmap. - E3 (``--start-from <yaml>``) bound to F1; deferred until needed. ## Validation - ruff check + format clean (187 files including new test). - 9 doc guards pass: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync. - module_size: no new warnings. - 1597 passed, 14 skipped (full suite). - ``forgelm --config config_template.yaml --dry-run`` green. - bandit -ll: 0 HIGH / 19 MEDIUM (all pre-existing B615 unpinned HF Hub downloads — project-wide, intentional, not in PR-C scope). ## PR-comment status - 13 inline comments from coderabbitai on commit ``52b113d`` (Phase 22 origin) — ALL addressed in cycle-2 (``c0dd57a``) + cycle-3 (``5667885``) commits. Threads attached to old commits; will auto-resolve on PR merge. - 6 inline comments from gemini-code-assist on commit ``52b113d`` — ALL addressed. - Codacy + SonarQube hotspot summaries from ``52b113d`` — current HEAD (``91d5726`` + this commit) addresses everything they flagged in scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer findings — resolution summary (HEAD:
|
| Commit | Scope | Validation |
|---|---|---|
52b113d |
Phase 22 — CLI wizard parity-with-web modernisation (sub-package split + 25 review findings G1-G20 + I1-I5) | 65 wizard tests |
c0dd57a |
Phase 22 review-cycle 1 — 18 PR-#40 review findings (commit msg + cycle-1 audit) | 1544 passed |
7d08bba |
docs(changelog): cycle-1 documentation | (docs only) |
9c32fe9 |
Phase 22 review-cycle 2 — 17 newly-discovered parity / UX gaps (P1-P18, B1-B3, C1-C2, D1-D2, E1, E4) | 1573 passed |
5667885 |
review-cycle 3 — 11 cycle-2 audit findings (A3, A4, F3, C16, G30, B11, C3, E2, E22-24, A9, F27, B10) | 1573 passed |
91d5726 |
F1 — schema-driven wizard defaults SOT (8 fields, generator + CI guard, Python + JS consumers) | 1590 passed |
9fe85df |
review-cycle 3 follow-up — generator hardening (B1, B2, C2), state parity warning (B5), CI guard fail-path tests (G2), dispatcher test hardening (G3), README + docs (F3, A3, B7) | 1597 passed |
Per-comment status
coderabbitai (13 inline on 52b113d) — all addressed:
- MD030 list markers (
docs/design/wizard_mode.md) →c0dd57a architecture.mdwizard.py→wizard/heading →c0dd57ashlex.quoteBYOD shell hints →c0dd57a- BYOD path validation →
c0dd57a notify_on_startweb-parity default →c0dd57a- Article 9 strict-tier non-empty enforcement →
c0dd57a - Article 10 strict-tier non-empty enforcement →
c0dd57a total_mem→total_memory→c0dd57a(note: cycle-2 audit verified the field was alwaystotal_memory; the "fix" was overcorrection — no functional bug existed)WizardBackrestoreprev_config→c0dd57aWizardResetre-loop instead of return →c0dd57a- HTTP/HTTPS error message consistency →
c0dd57a - Safety probe path source-only env → resolved (probes file IS shipped at
forgelm/safety_prompts/default_probes.jsonl) - Corrupt YAML test → resolved (PyYAML actually raises
ScannerErroron the test string; cycle-2 verified)
gemini-code-assist (6 inline on 52b113d) — all addressed:
- Article 9 + 10 mandatory-field enforcement →
c0dd57a risk_classificationcollected first →c0dd57a_prompt_choicenegative-idx guard →c0dd57a- Model-selection
default_idxpreservation →c0dd57a - UTF-8 encoding for config files →
c0dd57a
codacy + sonarqube summaries on 52b113d — current HEAD addresses everything they flagged in scope. Repo-wide bandit shows 0 HIGH severity in wizard sub-package; pre-existing B615 (HF Hub unpinned downloads) is project-wide and intentional, not in this PR's scope.
Independent Review of cycle-3 + F1 — top-5 follow-ups all in 9fe85df:
- B1: generator unwraps
Optional[BaseModel] - B2: PydanticUndefined identity comparison
- C2: JSON-serialisable contract documented
- B5: schema↔fallback drift WARNING
- G2: CI guard fail-path test (7 tests)
- G3: dispatcher test asserts
model_validate - A3, B7, F3: doc + cosmetic
Roadmap reshuffle (explicit)
- B4 (
--wizard-from-yaml) — REMOVED from consideration. Politik çelişki ("config-driven is the identity") + B3 (non-tty refusal +quickstarthand-off) closed the gap. - E3 (
--wizard --start-from <yaml>) — bound to F1's metadata flags + JSON shape; deferred until needed.
Validation at HEAD 9fe85df
- ruff check + format clean (187 files)
- 9 doc guards green: bilingual_parity, anchor_resolution, cli_help_consistency, yaml_snippets, audit_event_catalog, library_api_doc, doc_numerical_claims, bilingual_code_blocks, wizard_defaults_sync (new)
- module_size: 6 grandfathered modules over warn-threshold; 0 over fail-threshold (1500)
- 1597 passed, 14 skipped (full pytest)
- bandit
-llrepo-wide: 0 HIGH / 19 MEDIUM (all pre-existing B615 HF Hub unpinned, project-wide, intentional) forgelm --config config_template.yaml --dry-rungreen
Ready for human review.
External scanners (Codacy / SonarCloud) attached to PR #40 reported findings on the original ``52b113d`` Phase 22 commit. Verified each against the current HEAD; the remaining noise is genuinely non-actionable. Shrinks the public quality-gate to its real signal. ## Codacy 7 HIGH severity — all false positives, suppressed in-tree - **Prospector F401 (5×)** in ``forgelm/wizard/__init__.py`` — the re-export shim already carried ``# ruff: noqa: F401`` at the top but Codacy's Prospector doesn't honor ruff directives. Per-line ``# noqa: F401 -- re-export for tests (pyflakes/Prospector compat)`` markers added to the 5 flagged underscore-prefixed symbols (``_AUDIT_LARGE_FILE_THRESHOLD_BYTES``, ``_GALORE_OPTIMIZERS``, ``_BACK_TOKENS``, ``_STEPS``, ``_STATE_VERSION``). These are intentional re-exports for the wizard test suite, not dead imports. - **Bandit B105** in ``forgelm/wizard/_collectors.py:673`` — ``backend_token == "api"`` flagged as "possible hardcoded password". False positive: ``"api"`` is an enum-style discrim- inator (one of ``"api"``/``"local"``/``"file"``). Suppressed with ``# nosec B105 -- enum-style discriminator, not a secret``. - **PMD InaccurateNumericLiteral** in ``site/js/wizard_defaults.js`` on the ``2e-05`` literal — the file is a build-time artefact emitted by ``tools/generate_wizard_defaults.py`` from Pydantic ``learning_rate`` default. PMD's heuristic flags scientific notation as imprecise; the JS consumer canonicalises with ``Number.toExponential()`` so precision is never operator-visible. New ``.codacy.yaml`` excludes the generated artefact (and its Python sibling ``forgelm/wizard/_defaults.json``) from analysis. ## Codacy 29 INFO severity - **markdownlint MD013 (line-length, 17×)** — ForgeLM bilingual reference + design docs intentionally carry long lines (paragraph- style explanations, table cells, prose with long inline links). The 80-col limit is a code-style holdover that hurts readability for documentation; the project's ``bilingual_parity`` guard already enforces structural alignment. Disabled via ``.markdownlint.json``: ``"MD013": false`` with a rationale comment. - **markdownlint MD030 (9×)** in ``docs/design/wizard_mode.md`` — cycle-2 already fixed the list-marker spacing in this file; the Codacy result was stale (older commit snapshot). No code change needed; will clear on next Codacy re-scan. - **markdownlint MD028 (1×)**, **Semgrep pytorch-detection (2×)** — informational only, no action. ## SonarCloud 8 Security Hotspots — all LOW, all in test fixtures Hotspot review (no code changes — test fixtures are intentional): - ``tests/test_wizard_phase22.py:90`` ``ftp://example.com/hook`` — literal in ``test_unknown_scheme_rejected``; verifies the wizard rejects unsupported schemes. - ``tests/test_wizard_phase22.py:93,94`` ``http://hooks.example.com/x`` — literal in ``test_http_warning_but_accepted``; verifies the HTTP-with-warning behaviour mirrors ``forgelm/webhook.py``. - ``tests/test_wizard_phase22.py:830`` IMDS IP ``169.254.169.254`` / ``:834`` RFC1918 ``10.0.0.5`` — literals in ``TestWebhookSSRFPreflight``; verifies the SSRF preflight rejects loopback / RFC1918 / link-local destinations. - ``tests/test_wizard_phase22.py:924,929`` ``tmp_path`` / ``tests/test_cli_phase10.py:383`` — pytest's standard temp- directory fixture pattern. These hotspots are the ENTIRE point of the tests they appear in — suppressing them would require either disabling the SonarCloud rule project-wide (loses signal on production code) or adding inline comments that confuse the test-source-as-spec pattern. Decision: mark as "Safe / Reviewed" via SonarCloud's UI, no in-tree changes. ## Validation - ruff check + format clean (187 files). - 9 doc guards pass. - 139 wizard-scope tests pass (test_wizard_phase22 + test_check_wizard_defaults_sync + test_cli_phase10). - ``check_wizard_defaults_sync`` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes E3 from the cycle-2 + cycle-3 review backlog. Operators can
now iterate on an existing config without losing prior answers — the
long-asked-for ergonomic improvement that the previous review cycles
all flagged ("wizard always starts from scratch") and bound to F1
(landed in PR-B `91d5726`) so the per-step value-honoring reuses
the schema-derived defaults plumbing.
## CLI surface
- New flag ``--wizard-start-from PATH`` registered in
``forgelm/cli/_parser.py``; ``--wizard-start-from existing.yaml``
alongside ``--wizard`` triggers the new flow.
- ``forgelm/cli/_wizard.py``: ``_maybe_run_wizard`` reads
``args.wizard_start_from`` (via ``getattr`` for back-compat) and
threads it into ``run_wizard_full(start_from=...)``.
## Orchestrator
- New ``_load_initial_state_from_yaml(path)`` helper:
reads YAML, validates it against ``ForgeConfig``, seeds a
:class:`_WizardState` with ``experience="expert"`` +
``use_case=_MANUAL_USE_CASE`` + ``config=copy.deepcopy(data)``.
Three error categories surface to the operator: ``FileNotFoundError``
for missing path, ``ValueError`` for parse failure / non-mapping
root / schema rejection. All caught by ``run_wizard_full`` and
rendered as a cancelled outcome with a printed warning — no
traceback for the operator.
- ``run_wizard``, ``run_wizard_full``, ``_run_full_wizard_outcome``,
and ``_run_full_wizard`` now accept ``start_from: Optional[str] =
None`` (back-compat preserved — None falls back to today's flow).
- When ``start_from`` is supplied the quickstart-template prelude is
skipped — the operator's intent is "edit this file", not "pick a
fresh template".
## Per-step value honouring
- ``_step_strategy``: ``lora_r`` and ``lora_alpha`` prompts default
to ``state.config["lora"]`` values when present (falls back to
schema defaults).
- ``_step_training_params``: ``num_train_epochs`` /
``per_device_train_batch_size`` / ``model.max_length`` /
``output_dir`` prompts honour existing ``state.config`` values.
- ``_step_dataset``: when an existing dataset path is present, the
step prints "Existing dataset: <path>" and offers a "Keep this
dataset?" yes/no early-out so the operator doesn't re-trigger
ingestion / audit on an unchanged path.
Other steps (welcome / use-case / model / trainer / compliance /
evaluation) already honoured existing values via
``state.config.get(...)`` defaults — no change needed.
## Save flow
- ``_run_full_wizard_outcome``: when ``start_from`` is supplied the
save prompt's default filename becomes the start-from path. The
existing ``_prompt_unique_filename`` overwrite confirmation still
fires before clobbering.
## Tests
10 new tests under ``tests/test_wizard_phase22.py``:
- ``TestStartFromYAMLLoad``: valid YAML pre-populates state (with
``copy.deepcopy`` isolation), missing path raises
``FileNotFoundError``, malformed YAML raises ``ValueError``,
non-mapping root raises, schema-invalid YAML raises.
- ``TestRunWizardFullStartFrom``: missing / invalid YAML returns a
cancelled ``WizardOutcome`` instead of crashing.
- ``TestStepHonorsExistingValues``: strategy / training-params /
dataset steps each preserve existing values when the operator
presses Enter at every prompt.
Total wizard-scope tests: 202 (was 192).
## Documentation
- ``CHANGELOG.md`` — E3 / PR-D entry under [Unreleased] / Added.
- ``docs/design/wizard_mode.md`` — PR-D additions block.
- ``docs/reference/usage{,-tr}.md`` — new flag in the cheat-sheet.
- ``docs/usermanuals/{en,tr}/reference/cli.md`` — flag table row.
- ``docs/guides/quickstart{,-tr}.md`` — "idempotent re-run"
paragraph after the existing operator-guardrail block.
## Validation
- ruff check + format clean (187 files).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1607 passed, 14 skipped (full suite, was 1597).
## Roadmap status — fully on-plan
- B4 (``--wizard-from-yaml`` answer-file mode): REMOVED from
consideration earlier (B3 closed the gap).
- E3: SHIPPED in this PR.
- F1 (schema-driven SOT): SHIPPED in PR-B.
- F3 (state-version migration skeleton): SHIPPED in PR-A.
- C3 (mobile YAML clamp): SHIPPED in PR-A.
- E2 (rationale catalog): targeted version SHIPPED in PR-A; full
catalog explicitly deferred (no scope rush).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eview
Independent code review of PR-D (`2ac9df2`) flagged 3 HIGH-severity
bugs where pressing Enter at certain prompts silently regressed the
operator's loaded YAML to wizard defaults — directly violating the
"press Enter to keep" contract documented in PR-D's CHANGELOG entry
and the quickstart guide. Plus 4 MEDIUM/LOW issues. All closed
here.
## Critical: contract violations on `--wizard-start-from`
- **PR-D-A1: ``_step_strategy`` regressed `lora.method` /
`target_modules` / `dropout` / `bias` / `task_type` on Enter.**
``_select_strategy()`` now accepts ``existing_method`` /
``existing_load_in_4bit`` / ``existing_galore`` kwargs and derives
the prompt's ``default_idx`` from the loaded YAML. Target-modules
prompt detects the loaded list shape against
``TARGET_MODULE_PRESETS`` and shifts its default accordingly.
Non-prompted fields (dropout / bias / task_type) now use
``setdefault``.
- **PR-D-A2: ``_step_evaluation`` clobbered existing benchmark /
llm_judge / safety blocks.** Function now seeds ``evaluation``
from ``state.config.get("evaluation")`` via ``copy.deepcopy``,
defaults the ``auto_revert`` gate based on existing state, and
preserves prior blocks when the operator declines a re-prompt.
- **PR-D-A3: ``_step_use_case`` overwrote the operator's existing
model + trainer choices** with the first template's preset (default
index = 1). Combined with save-default-to-``start_from``, this
silently corrupted the original YAML. Step now early-returns
with a clear notice when ``state.use_case == _MANUAL_USE_CASE`` and
``model.name_or_path`` / ``training.trainer_type`` are populated.
Greenfield runs now use ``setdefault`` for the preset application.
## Medium / hardening
- **PR-D-A4: ``_step_welcome`` ``model.backend``** → chained
``setdefault`` instead of the nested-dict-replace pattern. An
operator's explicit ``model.backend: transformers`` no longer
flips silently to ``unsloth`` on a Linux+GPU box.
- **PR-D-A5: ``_load_initial_state_from_yaml`` split try/except**
for the ``ForgeConfig`` import vs. the ``model_validate`` call. A
downstream ``ImportError`` from inside a Pydantic validator can no
longer silently bypass schema validation.
- **PR-D-A6: warns before discarding pre-existing wizard resume
state** when ``--wizard-start-from`` is supplied alongside an
in-progress ``$XDG_CACHE_HOME/forgelm/wizard_state.yaml``. Pre-fix
the snapshot was silently destroyed at save time.
- **PR-D-A7: ``docs/guides/quickstart{,-tr}.md``** scopes the
"press Enter to keep" claim to numeric / text / choice prompts;
documents the use-case skip and the ``setdefault`` pattern for
non-prompted fields.
## UX / test coverage
- **PR-D-B1: new CLI dispatcher integration test**
(``test_wizard_start_from_threads_through_to_run_wizard_full``)
pins parser → dispatcher → ``run_wizard_full`` thread-through.
- **PR-D-B3: ``--wizard-start-from`` without ``--wizard``** now
prints a one-line warning instead of silently no-op'ing.
- **PR-D-B5: ``_collect_data_governance(existing=...)``** kwarg
threads existing Article 10 free-text values through as prompt
defaults. Strict-tier operators iterating from a populated YAML
no longer re-type collection_method / annotation_process /
known_biases.
- **PR-D-B6: ``_canonical_start_from()``** helper applies
``Path(...).expanduser()`` once at entry. Pre-fix
``--wizard-start-from ~/configs/x.yaml`` created a literal
``~/configs/x.yaml`` directory on save because the load did
``expanduser`` but the save flow's overwrite check did not.
## Tests
13 new tests under ``tests/test_wizard_phase22.py`` covering every
fix above:
- ``TestPRDA1StrategyHonorsExisting`` (3): DoRA / target_modules /
GaLore preserved on Enter.
- ``TestPRDA2EvaluationHonorsExisting`` (2): benchmark + judge
preserved when declining re-prompt; auto-revert default reflects
existing state.
- ``TestPRDA3UseCaseSkipsWhenExisting`` (1): early-return with
notice when start-from path + populated model/trainer.
- ``TestPRDA4WelcomeBackendSetdefault`` (1): existing backend
survives Linux+GPU branch.
- ``TestPRDA5ImportErrorBypass`` (1): valid YAML happy path still
validates.
- ``TestPRDA6ResumeStateWarning`` (1): in-progress snapshot warning
emitted.
- ``TestPRDB3StartFromWithoutWizardWarns`` (1): silent no-op
warning.
- ``TestPRDB5GovernanceHonorsExisting`` (1): existing free-text
values preserved on Enter.
- ``TestPRDB6ExpanduserCanonicalisation`` (1): ``~`` expansion at
entry.
Plus 1 dispatcher integration test
(``test_wizard_start_from_threads_through_to_run_wizard_full``)
under ``tests/test_cli_phase10.py``.
## Validation
- ruff check + format clean (187 files).
- 9 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync.
- module_size: no new warnings.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1620 passed, 14 skipped (full suite, was 1607).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-track cleanup landed atomically:
## Track 1 — Working-memory directories now strictly gitignored
``docs/marketing/`` and ``docs/analysis/`` are operator-local research
+ audit + draft notes that never appear in fresh clones. Pre-fix, 11
``docs/analysis/code_reviews/*.md`` files were exempted via inverse
``.gitignore`` rules and referenced from the public tree (CHANGELOG,
standards, roadmap, design docs, tests). Every such reference rotted
into a 404 the moment the maintainer touched the local working-memory
tree.
- **`.gitignore` simplified.** Dropped 11 ``!docs/analysis/code_reviews/<file>``
exception lines that re-exposed specific closure-plan and master-
review files; the directory-level ignore now applies uniformly.
- **``git rm --cached -r docs/analysis/``** untracked the 11 leaked
files (kept on disk locally; just no longer versioned in this repo).
## Track 2 — Public-tree reference cleanup
19 references removed / restructured. Where the citation was the
load-bearing anchor of a sentence, the surrounding text was rewritten
to convey the same intent without the file pointer.
- **`CHANGELOG.md`**: 6 references removed (review-cycle 2 audit
citation, Phase 22 cycle-1 review citation, closure-plan twice,
wave2a fix-summary, master-review-opus, two `Documentation`
links) — replaced with PR-thread / roadmap pointers.
- **`CLAUDE.md`**: 2 references rewritten — the "Common pitfalls"
block dropped its QKV-Core / Trion citations; the memory section
rewrote the bullet to forbid future references with a clear rule.
- **`docs/standards/coding.md`**: master-review-opus citation
removed; "anti-patterns" intro rewritten as project lessons.
- **`docs/standards/documentation.md`**: the "researcher" audience
row dropped; new top-level "Working-memory directories" section
spells out the no-references rule with rationale.
- **`docs/standards/localization.md`**: the ``docs/analysis/**``
bilingual-policy row dropped (no public-tree reference == no
bilingual concern).
- **`docs/design/iso27001_soc2_alignment.md`**: removed the
marketing/strategy citation and a self-referential "this file
lives under ``docs/analysis/``" note (it lives under
``docs/design/``).
- **`docs/roadmap.md`** / **``phase-12-6-closure-cycle.md``** /
**``phase-12-data-curation-maturity.md``**: closure-plan and
marketing/strategy citations rewritten to point at public roadmap
/ CHANGELOG / CLAUDE.md anchors instead.
- **`tests/test_wizard_phase22.py`** + **``test_check_bilingual_parity.py``**:
test docstrings softened to describe the directory category
(working-memory) rather than naming specific analysis files.
- **`tools/check_anchor_resolution.py`** + **``check_yaml_snippets.py``**:
comments + docstrings softened. Functional path-string filters
(``_SKIP_PATH_FRAGMENTS = ("docs/analysis/", "docs/marketing/")``,
``--exclude analysis`` default) are KEPT as path filters (they're
not citations; they're directory exclusions).
- **`.claude/skills/{cut-release,sync-bilingual-docs}/SKILL.md`**:
marketing/strategy template + content-strategy citations rewritten
to reference local-only working memory rather than specific paths.
## Track 3 — Standard + CI guard preventing regression
- **New "Working-memory directories" rule** in
`docs/standards/documentation.md` codifies the ban with examples
+ path-string-filter exemption.
- **New CI guard `tools/check_no_analysis_refs.py`** scans every
git-tracked file (uses ``git ls-files``, so the gitignored dirs
are excluded by construction) and fails the run on any citation
into the working-memory tree. Per-file ``_EXEMPT`` allowlist for
policy statements (``CLAUDE.md``, ``docs/standards/documentation.md``,
``.gitignore``, the skill that warns about the dirs) and
functional path filters; each entry carries a written
justification comment.
- **CLAUDE.md self-review chain** extended to 9 commands; tools/
listing names ``check_no_analysis_refs``.
## Validation
- ruff check + format clean (188 files including the new guard).
- 10 doc guards pass: bilingual_parity, anchor_resolution,
cli_help_consistency (439 invocations), yaml_snippets,
audit_event_catalog, library_api_doc, doc_numerical_claims,
bilingual_code_blocks, wizard_defaults_sync,
**check_no_analysis_refs** (new — clean).
- module_size: no new warnings.
- ``forgelm --config config_template.yaml --dry-run`` green.
- 1620 passed, 14 skipped (full suite).
## Site verification (read-only side-effect)
The audit pass produced a 270-line site verification report
(3 CRITICAL + 5 HIGH + 6 MEDIUM + 3 LOW findings on ``site/``
claims vs. live code). The report itself lives in the now-
gitignored ``docs/analysis/`` working-memory tree per the new
policy; follow-up site fixes will be tracked as separate PRs
referencing the live code, not the report.
Top three site findings to address in a follow-up PR:
1. ``forgelm.api`` import path advertised on ``index.html`` /
``features.html`` does not exist (real surface is
``from forgelm import ForgeTrainer, audit_dataset, ...``).
2. ``forgelm quickstart byod`` is shown on ``index.html`` but
isn't a real template (BYOD is a wizard sub-flow, not a
quickstart key).
3. ``compliance.html`` cites ``forgelm/data_audit.py`` as a
single-file module; it became a sub-package in Phase 14.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Promotes the [Unreleased] block into the [0.5.5] section under a new "Post-merge follow-up" subsection that captures the three PRs landed on top of the paper-merged closure cycle: - PR #40 (2026-05-08) — Phase 22 CLI wizard parity-with-web - PR #41 (2026-05-09) — Site documentation correction sweep + i18n parity backfill - PR #42 (2026-05-10) — Release-prep + nightly pip-audit gate fix + two review-absorption rounds The CHANGELOG [0.5.5] header date is bumped to 2026-05-10 (the actual PyPI release date) and a fresh empty [Unreleased] block is added at the top. __api_version__ stays at 1.0.0 — Phase 22 is a CLI/wizard surface change, not a Python library API addition; the 30-symbol forgelm.__all__ surface that this release anchors was already added during the Phase 12.6 closure cycle (paper-merged as v0.5.5). Roadmap (en + tr) marks v0.5.5 as released; "Latest release on PyPI" flipped from v0.5.0 to v0.5.5; "Current state" sentence simplified (no more "merged but tag deferred" framing). releases.md drops the "Next PyPI release (slot pending)" section since v0.5.5 now bundles everything that was staged in [Unreleased]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>




Summary
Closes the parity gap between
forgelm --wizard(CLI) and the in-browsersite/js/wizard.jsdocumented indocs/analysis/code_reviews/2026-05-07_cli_wizard_ux_analysis.md+-2.md. 20 original findings (G1-G20) and 5 independent findings (I1-I5) verified against current code; this PR addresses the actionable subset.What's in this PR
CLI wizard (
forgelm/wizard/)compliance.risk_classificationartıkevaluation.safety'den önce soruluyor. v0.5.5'te tersine sıralı olduğu için F-compliance-110 strict-tier auto-coercion'ı yapısal olarak imkansızdı.dpo_beta,simpo_beta+gamma,kto_beta,orpo_beta,grpo_*)._layerwise).linear/dynamic/yarn+longrope(4-of-4).rschema parite (G11):DEFAULT_LORA_R16 → 8.risk_assessment), Madde 10 (data.governance), Madde 11+Annex IV §1, Madde 12+17 (monitoring), GDPR Madde 5(1)(e)+17 (retention),evaluation.benchmark,evaluation.llm_judge,syntheticblokları.safety.enabled+require_human_approvalotomatik True; F-compliance-110 schema-side gate'in front-stop'u.env:VAR_NAME;urlparsevalidation; HTTP uyarıyla kabul.importlib.resources.files("forgelm.safety_prompts")ile bundled probe set —pip installregression'u kapatıyor.\$XDG_CACHE_HOME/forgelm/wizard_state.yamlsnapshot (schema-versioned).+ key: value/~ key: before → after.back/b+reset/rtoken'ları.POPULAR_MODELSparite (G14): web wizard preset kartlarıyla aynı._print_wizard_summaryartık tam YAML dump ediyor.Web wizard
site/js/wizard.jsheader comment "7 step" → "9 step" + correct list.USE_CASE_PRESETSkeys aligned withforgelm/quickstart.py::TEMPLATES(code-copilot→code-assistant,medical-tr→medical-qa-tr).Mimari
forgelm/wizard.py(976 LOC monolit) →forgelm/wizard/sub-package:_io.py_state.py_WizardState, persistence, summary, schema-default constant'lar_collectors.py_collect_*+ strategy + use-case preset registry_byod.py_orchestrator.pyrun_wizard__init__.pyHiçbir submodule warn-threshold'u (1000 LOC) aşmıyor.
Docs
docs/design/wizard_mode.md— 9-adım flow rewrite (I5).docs/reference/architecture.md+-tr.md— sub-package referansı.docs/guides/quickstart.md+-tr.md— wizard akış güncellemesi.docs/roadmap/phase-11-5-backlog.md— link fix (_byod.py).CHANGELOG.md—[Unreleased]### AddedPhase 22 entry.Tests
Yeni:
tests/test_wizard_phase22.py— 55 test (webhook parsing, safety probe path, trainer hyperparams, navigation, strict-tier coercion, persistence, step diff, use-case parity, schema-default parity).Güncellenen:
tests/test_wizard_phase11.py(3_print_wizard_summarytest'i yeni dict API),tests/test_phase12_5.py(3 patch hedefi sub-package binding'ine).i18n kararı
CLI wizard English-only kalıyor (operatör explicit talebi). Web wizard'ın 6-dil katalogu korundu.
Test plan
pytest tests/ --ignore=tests/test_compliance.py— 1487 pass, 14 skip, 0 fail (5:49)pytest tests/test_wizard_*.py tests/test_phase12_5.py— 108 pass, 0 failruff check forgelm/wizard/+ruff format --check— temiztools/check_module_size.py— 0 over fail-threshold; 5 grandfathered (unchanged)forgelm --config config_template.yaml --dry-run— başarılıForgeConfig.model_validate()'i geçtiği doğrulandı; high-risk auto-coerced config schema'ya uyum sağlıyorgrep -rn "from forgelm import wizard")forgelm/cli/_wizard.py) vepyproject.tomlpackages.findglob'u sub-package'ı otomatik kapsıyorReviewing
Independent review prompt prepared at
/tmp/review-agent-prompt.md(içerik PR commit message'ında özetlenmiş).🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
back/resetcontrols), webhook URL parsing, safety probe resolution, and XDG-aware state persistence.Changed
torchversion bumped from 2.1.0 to 2.3.0.Documentation
Tests