This commit introduces the CrucibleTrain library, a comprehensive machine
learning training orchestration framework designed for the BEAM ecosystem.
CrucibleTrain provides platform-agnostic infrastructure for supervised
fine-tuning, reinforcement learning, preference optimization (DPO), and model
distillation.
Key features included in this release:
1. Training Orchestration
- Supervised Learning (SFT) loop with configurable optimizers and scheduling.
- Reinforcement Learning (RL) loop supporting environment rollouts and PPO.
- Direct Preference Optimization (DPO) loop for preference learning.
- Distillation support for on-policy teacher-student training.
2. Model Rendering System
- Robust renderer architecture for converting messages to token sequences.
- Support for major model families: Llama 3, Qwen 2.5/3, DeepSeek V3,
Kimi K2, and GPT-OSS.
- Handling of special tokens, chat templates, and tool calls.
- Configurable training targets (e.g., train on last assistant message only).
3. Data Management
- Unified type system including Datum, ModelInput, and TensorData.
- Dataset abstractions for memory-efficient batching and shuffling.
- Support for multimodal inputs (text and image chunks).
4. Infrastructure and Integration
- Ports and Adapters architecture for swappable backends (TrainingClient,
VectorStore, BlobStore, HubClient).
- Integration with the Crucible pipeline framework via Stage implementations.
- Multiplexed logging system supporting JSONL, console, and custom backends.
5. Utilities and QA
- Deterministic PCG64 PRNG for NumPy-compatible reproducibility.
- Parity instrumentation tools to verify behavior against Python reference
implementations.
- Comprehensive test suite including mock tokenizers and renderers.
This foundation enables building complex, distributed ML workflows in Elixir
while maintaining compatibility with existing tensor frameworks and training
backends.