Skip to content

Releases: HodeTech/ForgeLM

v0.7.0 — Phase 14 multi-stage pipelines + SSRF hardening

15 May 08:35
v0.7.0
24de388

Choose a tag to compare

v0.7.0 — Phase 14 multi-stage pipelines + SSRF hardening

Headline: Phase 14 — multi-stage training pipelines. One YAML, one
CLI invocation, one Annex IV manifest covering SFT → DPO → GRPO (or
any sequence of supported trainers). Auto-chained model paths,
per-stage gates (auto-revert / human-approval / safety), crash-safe
state, 7 new pipeline-scoped audit events all sharing a single
top-level run_id for SIEM-style grouping.

Security: DNS-rebinding TOCTOU SSRF hardening on the webhook / judge
/ synthetic outbound paths (issue #14). requests-toolbelt is now a
hard dependency.

Backward compatibility: single-stage configs (no pipeline: block)
reach forgelm/trainer.py byte-identical to v0.6.0; the orchestrator
module is never imported on the single-stage path.

See CHANGELOG.md [0.7.0] — 2026-05-14 for the full entry.

v0.6.0 — Phase 15: Ingestion Pipeline Reliability

11 May 18:38
v0.6.0
7f967f1

Choose a tag to compare

ForgeLM v0.6.0 closes the silent-failure gap that the 2026-05-11 pilot exposed across PDF / DOCX / EPUB / TXT / MD ingestion plus the user-facing notebooks/ingestion_playground.ipynb. Phase 15 ships Wave 1 (Turkish glyph normalisation, language-aware script-sanity layer, end-of-run quality pre-signal, DOCX header/footer extraction, EPUB spine-order + skip-list, TXT BOM + MD frontmatter, paragraph-packed dedup, window-based page-line dedup) and Wave 2 (operator-controlled regex stripping with ReDoS guard + SIGALRM budget, PDF page-range slicing, front-matter / back-matter heuristic, URL handling modes, multi-column layout warning). Five review-absorption rounds (Gemini + CodeRabbit + Sonar + Codacy + independent self-review) land in the same release.

ℹ️ The v0.6.0 git tag was retagged at commit 7f967f1 to include a Windows-only test encoding fix (encoding=\"utf-8\" on two Phase 15 regression tests) that the original tag's cross-OS publish matrix exposed. The published wheel + sdist behaviour is unchanged from the originally intended v0.6.0; only test fixtures were touched.

Highlights

  • Turkish glyph normalisation (forgelm/_pypdf_normalise.py) — maps the audit-measured pypdf font-fallback artefacts (ø Õ ú ÷ ࡟İ ı ş ğ •) back to their correct Turkish characters at chunk-write time. Opt-in via --language-hint tr; --normalise-profile none for explicit opt-out.
  • Language-aware script-sanity layer (forgelm/_script_sanity.py)tr / en / de / fr / es / it / pt Unicode-block sanity check with calibrated 1.5 % threshold; catches both pypdf font corruption and TXT encoding mis-routing. Emits a structured script_sanity_summary block.
  • End-of-run quality pre-signal (Task 4) — three cheap row-level checks (alpha-ratio, weird-char ratio, repeated-line ratio) emit [WARN] N/M chunks below ingestion quality threshold + structured quality_presignal in notes_structured. Skips chunks below 80 non-whitespace chars to suppress false positives on small clean corpora.
  • forgelm audit --quality-filter now default-ON — operators wanting the pre-v0.6.0 opt-in semantics pass --no-quality-filter.
  • DOCX explicit header / footer extraction (Task 6)_extract_docx reads doc.sections[i].header.paragraphs + .footer.paragraphs and subtracts those lines from the body before chunking.
  • EPUB spine-order + skip-list (Task 7) — iterates book.spine (reading order) instead of file order; default skip-list (nav / cover / copyright / colophon / titlepage / frontmatter); whole-token match so recovery.xhtml no longer matches cover. Opt-out via --epub-no-skip-frontmatter.
  • TXT BOM strip + MD frontmatter detection (Task 8)encoding=\"utf-8-sig\" BOM strip; ---\n...\n---\n YAML frontmatter detected and dropped. Opt back in with --keep-md-frontmatter.
  • Window-based page-line dedup (Task 1) — replaces the pre-Phase-15 outermost-row-only iteration; catches variable-outer-line + constant-deeper-line corpora. A second pass runs after paragraph packing to mop up survivor headers via strip_paragraph_packed_headers.
  • Operator regex stripping with ReDoS guard (--strip-pattern, Task 11) — structural validation at CLI-parse time (rejects nested unbounded quantifiers, .*? + back-reference under DOTALL per SonarCloud python:S5852), 5-second per-pattern SIGALRM budget on POSIX, alarm clamped to min(timeout_s, previous_remaining) so nested calls cannot extend an outer caller's deadline. Opt out of the timeout via --strip-pattern-no-timeout.
  • PDF page-range slicing (--page-range START-END, Task 12) — 1-indexed inclusive contiguous slice. Validation failures abort with EXIT_CONFIG_ERROR via a new IngestParameterError(ValueError) that bypasses the per-file soft-fail catch.
  • PDF front-matter / back-matter heuristic (Task 13, default-ON) — three-signal AND filter (alpha < 0.30 + underscore-or-dot-leader > 0.10 + ≥ 5 inline page-number matches) drops up to 12 leading + 12 trailing pages; emits a WARNING + frontmatter_pages_dropped field. Opt out via --keep-frontmatter.
  • URL handling modes (--strip-urls {keep,mask,strip}, Task 14) — independent of --all-mask (URL handling is a content-shape decision, not a GDPR redaction).
  • PDF multi-column layout warning (Task 15) — samples first 3 pages via pypdf's visitor_text, WARNs on > 30 %-of-page-width two-cluster gaps. Operator pre-processes externally (no auto-fix).
  • forgelm doctor pypdf_normalise.turkish probe — confirms the glyph-normalisation table loaded and round-trips without running a test ingest.
  • IngestionResult schema (additive) — five new fields: pdf_paragraph_packed_lines_stripped, script_sanity_triggered, strip_pattern_substitutions, urls_handled, frontmatter_pages_dropped. No pre-Phase-15 key renamed.
  • 36 regression tests in tests/test_ingestion_reliability.py locking the Wave 1 + Wave 2 behaviour across PDF / DOCX / EPUB / TXT / MD fixtures.

Version contract

  • __version__: 0.5.70.6.0 (MINOR — CLI / behavioural contract: 12 new forgelm ingest flags + audit --quality-filter default-on).
  • __api_version__: stays at 1.0.0 — no new symbol added to forgelm.__all__.

Operator-string injection defence

IngestParameterError, decrypt-fail, and Could not open PDF messages now repr-escape paths via {path!r} so ANSI escape sequences / control chars / embedded quotes cannot leak into rendered error output.

Installation

```bash
pip install -U forgelm==0.6.0
```

For new ingestion features:
```bash
pip install -U "forgelm[ingestion]==0.6.0"
```

For the optional MinHash near-duplicate detector:
```bash
pip install -U "forgelm[ingestion,ingestion-scale]==0.6.0"
```

Full changelog

See CHANGELOG.md [0.6.0] for the complete list of additions, changes, fixes, and the per-round review-absorption details.

v0.5.7 — SFT trainer trl modernisation + Intel Mac NumPy 2 ABI fix

11 May 08:17
v0.5.7
2f9f746

Choose a tag to compare

v0.5.7 — SFT trainer trl modernisation + Intel Mac NumPy 2 ABI fix

Two production blockers + one operator UX gap, plus five rounds of
review absorption against PRs #44 / #45:

  • SFT trainer TypeError on modern trl (0.13+, 1.x) — SFTConfig.max_seq_length
    → max_length signature-detected at runtime.
  • Intel Mac (x86_64) NumPy 2 / torch 2.2 binary-ABI mismatch — PEP 508
    marker pin (numpy<2 on darwin x86_64) + new forgelm doctor
    'numpy.torch_abi' probe + training-pipeline early-fail preflight,
    so the cryptic _C-not-defined failure surfaces as a single-line
    'pip install numpy<2' remediation instead.

Internal: shared forgelm/cli/_abi_check.py helper, sys.modules-pollution
cascade fix (3 pre-existing tests + new CI guard), pytest-randomly
adoption, docs/usermanuals JSON-envelope documentation for the new
preflight error path. Full CHANGELOG entry: https://github.com/cemililik/ForgeLM/blob/main/CHANGELOG.md#057--2026-05-11

v0.5.6 — Intel Mac install fix

10 May 12:51
v0.5.6
1603cb5

Choose a tag to compare

Patch release — Intel Mac (x86_64) installability restored

v0.5.6 reverts the v0.5.5 torch>=2.3.0 minimum back to torch>=2.2.0. The 2.3 floor was inaccurate (no v2.3-specific PyTorch API is referenced in production code) and made pip install forgelm silently downgrade existing users to v0.5.0 on Intel Mac (x86_64) hosts, where PyPI has no torch>=2.3 wheel.

No code change beyond the dependency floor and version metadata. Every v0.5.5 feature (Library API, GDPR purge / reverse-pii, ISO/SOC 2 alignment, operational subcommands, CLI wizard parity-with-web) is unchanged in v0.5.6.

What changed

  • pyproject.toml: torch>=2.3.0,<3.0.0torch>=2.2.0,<3.0.0
  • forgelm/_version.py: rationale comment updated; __api_version__ stays at 1.0.0 (no Python API surface change)
  • Notebook install pins bumped 0.5.5 → 0.5.6 across all 10 example notebooks
  • README.md: prerequisite block clarifying torch wheel availability per platform

Why

The v0.5.5 torch>=2.3 requirement was justified in the CHANGELOG by torch.distributed.fsdp.FSDPModule being "referenced by tests/test_grpo_reward.py and runtime GRPO paths." Verification:

  • Production code: forgelm/trainer.py uses FSDP only as a string-option delegation to transformers.TrainingArguments — never imports torch.distributed.fsdp.FSDPModule directly.
  • Test code: tests/test_grpo_reward.py:22 mentions FSDPModule only inside a comment explaining why the test skips gracefully when trl.GRPOTrainer's lazy import fails on a torch/trl version mismatch. The skip pattern (pytest.mark.skipif(not grpo_patchable, ...)) already handles that case across torch 2.2 + 2.3.
  • Transitive dependencies: trl 1.4, transformers 5.x, peft 0.19 all accept torch>=1.13 or have no torch pin at all.

PyTorch Foundation stopped publishing torch>=2.3 wheels for Intel Mac (x86_64) — only Apple Silicon / Linux / Windows have 2.3+. This caused pip's resolver to silently fall back to v0.5.0 on Intel Mac, hiding every v0.5.5 feature behind a year-old install.

v0.5.6 lowers the floor to torch>=2.2.0, the highest minor available across all supported platforms.

Installation

# All platforms
pip install forgelm==0.5.6

# Or upgrade from any previous version
pip install -U forgelm

Intel Mac users on v0.5.0 can now upgrade with pip install -U forgelm and reach v0.5.6 cleanly.

Full changelog

See CHANGELOG.md [0.5.6].

v0.5.5 — Closure Cycle Bundle + Phase 22 Wizard + Site Documentation Sweep

10 May 11:39
v0.5.5
b78bb64

Choose a tag to compare

Closure Cycle Bundle + Phase 22 Wizard + Site Documentation Sweep

v0.5.5 promotes ForgeLM from a CLI fine-tuning tool to a complete enterprise pipeline. The release ships a stable Python library API for downstream embedders, GDPR Article 15 + 17 tooling, an environment / supply-chain / verification toolbelt of operational subcommands, the ISO 27001 / SOC 2 Type II alignment artefacts, a CLI wizard surface that reaches parity with the in-browser counterpart, and a tag-driven cross-OS release pipeline with per-combo CycloneDX SBOM. Every claim on forgelm.dev was re-validated against the live code; the forgelm/cli.py and forgelm/data_audit.py monoliths were split into focused sub-packages while preserving their public import surface.

⚠️ Intel Mac (x86_64) users: install v0.5.6 instead of v0.5.5 — pip install -U forgelm on v0.5.5 silently downgrades to v0.5.0 because PyPI has no torch>=2.3 wheel for Intel Mac. v0.5.6 is a dependency-only patch that lowers the floor to torch>=2.2.0 and restores Intel Mac installability. Apple Silicon / Linux / Windows users can install either v0.5.5 or v0.5.6.

Highlights

  • Library API (forgelm.__all__) — every CLI surface has a stable Python entry point with PEP 561 typing (py.typed), lazy-import facade (import forgelm does not pull torch), and __api_version__ decoupled from the CLI __version__.
  • GDPR Article 17 (forgelm purge) — three-mode dispatcher (row erasure / run-scoped artefact / read-only policy report) with per-output-dir-salted SHA-256 audit events; RetentionConfig Pydantic block with four configurable horizons.
  • GDPR Article 15 (forgelm reverse-pii) — locate identifier matches across JSONL artefacts; literal / email / phone / regional-id / regex modes; identifier salted-and-hashed before audit emission.
  • Operational subcommandsforgelm doctor (env / GPU / CUDA / extras pre-flight + JSON envelope), cache-models + cache-tasks (air-gap pre-cache for HF Hub + lm-eval), safety-eval (standalone Llama Guard with bundled 50-prompt × 14-category default probes), verify-audit / verify-annex-iv / verify-gguf (compliance + artefact integrity toolbelt), approve / reject / approvals (Article 14 staging-gate management).
  • CLI wizard parity-with-web — same 9-step flow as forgelm.dev/quickstart, schema-driven defaults shared between the two surfaces (CI guard fails on drift), idempotent re-run via --wizard-start-from <yaml>, distinct EXIT_WIZARD_CANCELLED = 5 exit code (additive; public surface now 0–5).
  • ISO 27001 / SOC 2 Type II alignment — 93-control deployer cookbook, 4 new QMS docs (encryption at rest, access control, risk treatment plan, statement of applicability) with 10 new TR mirrors, 2 new reference tables.
  • Supply-chain security — CycloneDX 1.5 SBOM per release-tag matrix combo, pip-audit + bandit nightly + on-tag (HIGH/CRITICAL → exit 1, MEDIUM → warning), opt-in gitleaks pre-commit, new [security] extra.
  • Cross-OS release-tag matrixpublish.yml runs Linux + macOS + Windows × Python 3.10 / 3.11 / 3.12 / 3.13 = 12 combos before PyPI publish; OIDC trusted publishing.
  • forgelm/cli/ + forgelm/data_audit/ package splits — legacy 2300-line + 3098-line monoliths decomposed into 24-module + 14-module sub-packages while preserving public import surface.
  • Site documentation correction sweep — every visible YAML / artefact-path / CLI / schema claim on site/*.html validated against the live forgelm/ surface; i18n parity at 731 keys per locale across EN + TR + DE + FR + ES + ZH.

Breaking changes (deliberate)

  • High-risk / unacceptable risk_classification combined with evaluation.safety.enabled=false now raises ConfigError at config-load time (was a warning). EU AI Act Article 9 risk-management evidence cannot be derived from a disabled safety eval.
  • WebhookConfig.timeout default raised from 5s to 10s.
  • --data-audit flag fully removed (was deprecated in v0.5.0). Use the forgelm audit subcommand instead.

Installation

# Apple Silicon / Linux / Windows
pip install forgelm==0.5.5

# Intel Mac (x86_64) — install v0.5.6 instead
pip install forgelm==0.5.6

If pip install -U forgelm resolves to an older version, run pip show torch to check the installed torch — pip's resolver silently downgrades ForgeLM when the local torch can't satisfy the dependency floor (v0.5.5 requires torch>=2.3.0; v0.5.6 lowers this to torch>=2.2.0).

Full changelog

See CHANGELOG.md [0.5.5] for the complete list of additions, changes, fixes, deprecations, and removals.

v0.5.0 — Document Ingestion + Data Curation Pipeline

29 Apr 21:08
v0.5.0
e5ba1d9

Choose a tag to compare

Theme: "Document Ingestion + Data Curation Pipeline" — Phases 11, 11.5, 12, and 12.5 ship as one comprehensive release.

Note on consolidation. Originally planned as four sequential PyPI tags (v0.5.0 / v0.5.1 / v0.5.2 / v0.5.3) but consolidated into a single v0.5.0 because the four phases form one coherent surface (ingest → polish → mature → polish) that's hard to use in parts. Git history retains the four phases as separate commit batches; the changelog collapses them into the user-facing release notes.

Highlights

  • Phase 11forgelm ingest (PDF / DOCX / EPUB / TXT / Markdown → SFT-ready JSONL) + forgelm audit (length / language / near-duplicate / cross-split leakage / PII) + EU AI Act Article 10 governance integration.
  • Phase 11.5 — operational polish: LSH banding, streaming reader, token-aware chunking, PDF header/footer dedup, PII severity tiers, atomic audit writes.
  • Phase 12 — data curation maturity: MinHash LSH dedup option, markdown-aware splitter, code/secrets leakage tagger, heuristic quality filter, DOCX table preservation.
  • Phase 12.5 — additive polish: --all-mask shorthand, Croissant 1.0 dataset card emission, optional Presidio ML-NER PII adapter (--pii-ml [--pii-ml-language LANG]en/de/es/fr/it/ja/ko/nl/pl/pt/ru/zh plus xx multilingual fallback), wizard "audit first" entry point.

CI / docs / standards bookkeeping accompanying every phase is folded into "Cross-cutting review hardening" in the full changelog (rounds 1–12 of review-cycle fixes applied across the four phases above).

Install / upgrade

pip install --upgrade forgelm
# Optional extras:
pip install 'forgelm[ingestion]'        # PDF / DOCX / EPUB / Markdown
pip install 'forgelm[ingestion-scale]'  # MinHash LSH (datasketch)
pip install 'forgelm[ingestion-pii-ml]' # Presidio ML-NER + python -m spacy download en_core_web_lg

Full changelog

CHANGELOG.md @ v0.5.0

🤖 Tag signed; PyPI publish via OIDC trusted publishing on release event.

v0.4.5 — Quickstart Layer (Phase 10.5)

26 Apr 19:31
v0.4.5
45713dc

Choose a tag to compare

Added

Quickstart Layer (Phase 10.5) — One-command bundled templates with opinionated defaults. Primary community-growth driver: closes the gap between "I just installed ForgeLM" and "I have a fine-tuned model running locally."

  • forgelm/quickstart.py — Template registry + orchestrator:

    • Template (frozen dataclass) — name, title, description, primary_model, fallback_model, trainer_type, estimated_minutes, min_vram_for_primary_gb, bundled_dataset, license_note.
    • TEMPLATES: Dict[str, Template] — 5 entries: customer-support, code-assistant, domain-expert, medical-qa-tr, grpo-math.
    • auto_select_model(template, available_vram_gb) — picks primary model when VRAM ≥ threshold (10–12 GB), fallback otherwise; explicit no-gpu-detected reason when CUDA is absent.
    • _detect_available_vram_gb() — wraps torch.cuda.mem_get_info(); returns None when no GPU (test mock point).
    • run_quickstart(template_name, *, model_override, dataset_override, output_path, dry_run, available_vram_gb)QuickstartResult — copies seed dataset, substitutes model.name_or_path and data.dataset_name_or_path, writes configs/<template>-YYYYMMDDHHMMSS.yaml. Generated YAML is identical in shape to a hand-written one — same trainer, same schema.
    • format_template_list(), summarize_result(result) — text/JSON renderers for CLI use.
  • forgelm quickstart <template> CLI subcommand (in forgelm/cli.py):

    • --list — prints the registry; honors top-level --output-format json for CI.
    • --model <id> — override auto-selected model.
    • --dataset <path> — override the bundled seed dataset (required for domain-expert).
    • --output <path> — custom YAML output path (default: ./configs/<template>-<timestamp>.yaml).
    • --dry-run — generate config only; skip training and chat.
    • --no-chat — train but skip the post-training chat REPL.
    • On a successful run, subprocess-invokes forgelm --config <out> and then forgelm chat <output_dir> (unless --no-chat).
  • Wizard integrationforgelm --wizard now opens with "Start from a template?":

    • Yes → routes to the quickstart selector; the wizard becomes a thin shell over run_quickstart().
    • No → falls through to the existing 8-step interactive flow.
    • No bifurcation: identical code paths and YAML schema downstream.
  • 5 bundled templates under forgelm/templates/:

    • customer-support/ — Qwen2.5-7B-Instruct primary, SmolLM2-1.7B-Instruct fallback. SFT trainer. 58-example seed JSONL in {"messages": [...]} format.
    • code-assistant/ — Qwen2.5-Coder-7B-Instruct primary, Qwen2.5-Coder-1.5B-Instruct fallback (code-tuned smaller variant, not generic SmolLM2). SFT. 59-example Python/programming Q&A.
    • domain-expert/ — Qwen2.5-7B-Instruct primary, SmolLM2-1.7B-Instruct fallback. BYOD; empty data with a README explaining how to pair with forgelm ingest (Phase 11) or a custom JSONL.
    • medical-qa-tr/ — Qwen2.5-7B-Instruct primary, Qwen2.5-1.5B-Instruct fallback (Turkish-capable, not English-only SmolLM2). SFT, 49 Turkish Q&A; every answer ends with "Tıbbi acil durumlarda 112'yi arayın..." (medical-disclaimer guardrail).
    • grpo-math/ — Qwen2.5-Math-7B-Instruct primary, Qwen2.5-Math-1.5B-Instruct fallback. GRPO trainer (grpo_num_generations: 4). 40 grade-school math word problems in prompt-only format, each carrying a gold_answer field for the built-in regex correctness reward.
  • Conservative defaults in every template config:

    • QLoRA 4-bit NF4, LoRA rank=8, per_device_train_batch_size=1, gradient checkpointing on, safety eval / compliance artifacts opt-in only.
    • Designed so the smallest fallback model + the bundled seed dataset run end-to-end on a 12 GB consumer GPU.
  • forgelm/templates/LICENSES.md — Full attribution for bundled seed datasets (CC-BY-SA 4.0, author-original); contributing guide for new templates; medical-disclaimer note for medical-qa-tr.

  • pyproject.toml [tool.setuptools.package-data] — bundles *.yaml, *.jsonl, *.md under forgelm.templates into the wheel so pip install forgelm users get the templates without a source checkout.

  • GRPO baseline rewardforgelm/grpo_rewards.py ships a default reward bundle so prompt-only datasets don't crash inside trl.GRPOTrainer. When grpo_reward_model is unset the trainer wires combined_format_length_reward (0.8 × format-match + 0.2 × length-shaping); if the dataset additionally carries a gold_answer field (the bundled grpo-math seed does), _math_reward_fn is appended so TRL sums correctness on top of format teaching.

  • Tests — All GPU-independent via TRL/torch FSDP-aware skip-if pattern:

    • tests/test_quickstart.py — registry consistency, bundled-asset shape, auto_select_model primary/fallback/no-gpu, end-to-end run_quickstart, CLI dispatch, regression test that loads every generated YAML through load_config (strongest guard against template drift).
    • tests/test_quickstart_hardening.py — PR review hardening (path validation, model override edges, dry-run wiring).
    • tests/test_grpo_math_reward.py — pure-Python unit tests for _normalize_answer, _answers_match, _math_reward_fn, _dataset_has_gold_answers.
    • tests/test_grpo_format_reward.pyformat_match_reward, length_shaping_reward, combined_format_length_reward, plus trainer integration.
    • tests/test_wizard_byod.py — wizard BYOD dataset path validation (existence, directory, malformed JSONL, valid JSONL, HF Hub IDs, ~ expansion).
    • tests/test_cli_quickstart_wiring.py--offline propagation, separate chat inheritance, chat exit-code 0/130 handling.
    • tests/test_packaging.py — wheel package_data smoke (catches editable-install-only template paths).
    • tests/test_grpo_reward.py — extended with no-reward-model + gold-answer wiring assertions.
  • CI.github/workflows/nightly.yml:

    • Per-template quickstart smoke (4 of 5 — domain-expert is BYOD and covered by pytest).
    • New wheel-install-smoke job: builds the wheel, installs it into a fresh venv from /tmp (so the source tree is off sys.path), and reruns quickstart --list + quickstart --dry-run to catch broken package_data globs that editable installs hide.

Documentation

  • New "Option 0: One-Command Quickstart Template" section at the top of docs/guides/quickstart.md.
  • docs/roadmap.md, docs/roadmap-tr.md, docs/roadmap/phase-12-quickstart.md, docs/roadmap/releases.md updated to mark Phase 10.5 as Done.
  • README.md quickstart section updated to lead with forgelm quickstart.

v0.4.0 — Post-Training Completion

26 Apr 04:12
v0.4.0
7b3b2cd

Choose a tag to compare

v0.4.0 — Post-Training Completion

v0.3.1-rc1 — TestPyPI publish test

28 Mar 10:10

Choose a tag to compare

Pre-release

Pre-release for TestPyPI publishing test. Not for production use.

v0.3.0 — GaLore, Long-Context, Synthetic Data Pipeline

28 Mar 00:30

Choose a tag to compare

What's New

GaLore Optimizer Integration

Full-parameter training via gradient low-rank projection — an alternative to LoRA that updates all weights while using less optimizer memory. 6 optimizer variants supported.

training:
  galore_enabled: true
  galore_optim: "galore_adamw_8bit"
  galore_rank: 128

Long-Context Optimizations

Train on sequences >4K tokens with RoPE scaling, NEFTune noise injection, sliding window attention, and sample packing.

training:
  rope_scaling: { type: "yarn", factor: 4.0 }
  neftune_noise_alpha: 5.0

Synthetic Data Pipeline

Generate training data from a teacher model (API or local) with --generate-data.

forgelm --config my_config.yaml --generate-data

GPU Cost Estimation

Auto-detection for 18 GPU models with per-run cost tracking.

Additional

  • PyPI: pip install forgelm now works
  • Adversarial Prompts: 140 prompts across 6 categories (was 50/3)
  • Nightly CI: Automated compatibility testing against latest deps
  • Wizard: GaLore, long-context, NEFTune options added
  • 19 code review fixes: Security (API key redaction, webhook URL masking), correctness (SLERP normalization, eos_token crash), consistency (version fallback, success definitions)

Stats

  • 297 tests (up from 242), 0 failures
  • 17 Python modules, ~3,500 lines
  • 47 files lint-clean (ruff check + format)

Full Changelog: https://github.com/cemililik/ForgeLM/blob/main/CHANGELOG.md