Skip to content

v0.3.0

Choose a tag to compare

@github-actions github-actions released this 11 Mar 03:27
· 642 commits to main since this release

[0.3.0] - 2026-03-10

Added

  • TUI Control Center (pmetal tui): Full terminal interface with 9 tabs — Dashboard, Device, Models, Datasets, Training, Distillation, GRPO, Inference, Jobs. Async event loop with crossterm/ratatui, modal system (confirm, text input, model picker, dataset picker, error, progress), and reusable form field widgets
  • Live job integration: Training, distillation, and GRPO tabs spawn pmetal subprocesses and stream metrics in real time via CommandRunner + JSONL polling
  • LoRA fuse command (pmetal fuse): Merge LoRA adapter weights into base model, with optional fuse-then-quantize pipeline
  • Chat template support for Llama 4, DeepSeek, and Cohere: Full template formatting, Jinja detection, model name heuristics, stop tokens, and inference formatting for all three model families
  • Llama 4 template: <|header_start|>/<|header_end|>/<|eot|> tokens (distinct from Llama 3's <|start_header_id|>/<|end_header_id|>/<|eot_id|>)
  • DeepSeek template: Full-width unicode tokens (<|begin▁of▁sentence|>, <|User|>, <|Assistant|>) with thinking mode support (<think>/</think> prefill)
  • Cohere Command R template: <|START_OF_TURN_TOKEN|>, <|USER_TOKEN|>, <|CHATBOT_TOKEN|>, <|END_OF_TURN_TOKEN|> tokens
  • Comprehensive stop token collection: collect_all_stop_tokens() now probes 11 well-known special tokens across all model families (added <|eot|>, <|end|>, <|return|>, <|END_OF_TURN_TOKEN|>, <|end▁of▁sentence|>)
  • LoRA inference auto-chat detection: Probes vocabulary for <|im_end|>/<|eot_id|> to auto-enable chat mode on base models fine-tuned with LoRA
  • Streaming generation support: GenerationConfig streaming extensions in pmetal-models
  • Epoch/total_steps in StepMetrics: Training progress now flows through entire pipeline (training loop → JSONL callback → TUI) showing step X/Y and epoch M/N
  • Hardware support documentation: Apple Silicon hardware matrix and tuning reference (docs/hardware-support.md)

Fixed

  • TUI inference word wrap: Model output now wraps correctly within the terminal width instead of clipping off-screen; normalize_code_fences() preprocessor ensures ``` markers always appear on their own line even when the model emits text without newlines
  • TUI inference code block rendering: Fenced code blocks (```python, etc.) now render properly with distinct styling even when the token stream lacks explicit newline characters
  • TUI UTF-8 safe text handling: Word wrap and code block truncation now use char-count width instead of byte length, preventing panics on multi-byte characters
  • GRPO accuracy reward — last-occurrence extraction: AccuracyReward now uses rfind() for <answer> tags and \boxed{}, correctly grabbing the final answer when the model retries within chain-of-thought
  • GRPO accuracy reward — broken fallback: Old code compared the entire completion (including reasoning) against the answer when no <answer> tags were found; now falls back to last non-empty line
  • GRPO accuracy reward — whitespace normalization: Answer comparison now collapses internal whitespace runs to single space, preventing false negatives from formatting differences
  • LoRA inference stop tokens: run_inference_with_lora now uses full chat template + comprehensive stop token collection instead of just tokenizer EOS — fixes infinite generation on chat-finetuned models
  • LoRA inference missing parameters: All sampling parameters (top_k, top_p, min_p, penalties, seed) now passed through to LoRA inference path
  • Llama 4 misdetection: Model name heuristic now correctly routes llama-4/llama4 to Llama 4 template (was incorrectly using Llama 3 tokens)

Added

  • GRPO \boxed{} answer extraction: AccuracyReward now extracts answers from LaTeX \boxed{...} expressions with brace-depth tracking, standard for math GRPO (DeepSeek-R1 style)

Improved

  • TUI replaces legacy dashboard: pmetal tui provides full control center; legacy pmetal dashboard retained for simple metrics monitoring
  • Chat template Jinja detection: Ordered detection ensures DeepSeek (full-width unicode), Cohere, Llama 4 are matched before generic patterns
  • EOS token stripping: strip_eos_tokens() now handles all model-family EOS tokens

Full Changelog: v0.2.1...v0.3.0