Fine-tuning pipeline for the Ailiance domain-expert LLM family — 10 hardware/embedded domains on Qwen 2.5-32B.
Part of the Ailiance platform. Upstream sibling: ailiance-mac-tuner (MLX toolkit for Apple Silicon). Downstream consumer: micro-kiki (MoE-LoRA routing runtime).
- Live demo & cockpit: https://www.ailiance.fr
- Status dashboard: https://home.saillant.cc
- HuggingFace IP source-of-truth: https://huggingface.co/electron-rare
- HuggingFace product distribution: https://huggingface.co/Ailiance-fr
- Audit-grade bench validators: https://github.com/ailiance/iact-bench
- Benchmark results: https://github.com/ailiance/ailiance-bench
Ailiance is the EU-sovereign LLM serving stack of L'Electron Rare, a French SME. Multi-model, audit-grade, EU AI Act Art. 13/15/52/53 transparency.
- Builds chat-format datasets (ShareGPT + Hugging Face merges) for 10 hardware domains.
- Trains QLoRA NF4 4-bit adapters on top of
Qwen/Qwen2.5-32B-Instruct. - Evaluates adapters via token-overlap against held-out samples per domain.
- Publishes adapters to the Hugging Face Hub with autogenerated model cards.
- Maintains a JSON model registry (
artifacts/model_registry.json) tracking version, metrics, base model.
10 domains, one recipe shape per domain:
| Category | Domains |
|---|---|
| Firmware / MCU | stm32, embedded, platformio, iot |
| EDA / Hardware | kicad, spice, emc, power |
| Signal / CAD | dsp, freecad |
Plus espidf available via datasets/builders/expand_espidf.py. Each domain has a dedicated seed set, system prompt, and YAML recipe.
Training target: KXKM-AI — RTX 4090 24 GB only. Qwen 2.5-32B QLoRA fits via:
max_memory = {0: "22GiB", "cpu": "50GiB"}
llm_int8_enable_fp32_cpu_offload = TrueGrosMac / Tower cannot train 32B — use them for dataset building, eval, and publishing only. See parent monorepo ../CLAUDE.md for SSH setup. Ask before launching a multi-hour job on KXKM-AI.
datasets/builders/build_<domain>_dataset.py seed + HF merge → JSONL
→ scripts/validate_dataset.py role/content schema
→ scripts/train_sft.py --config configs/... QLoRA on 4090
→ outputs/sft-<domain>/adapter_model.safetensors
→ scripts/eval_adapters.py token-overlap, 5 samples/domain
→ scripts/publish_adapters.py HF Hub + model card
→ src/ailiance_tuning/registry.py JSON registry
# Dependencies (Python via uv, parent F4L workspace uses 3.14)
uv sync
# Tests (CPU only, fast)
uv run python -m pytest tests/ -v
# Build all domain datasets (seeds only, no HF download)
./scripts/build_all_datasets.sh
# With Hugging Face enrichment
./scripts/build_all_datasets.sh --with-hf
# Validate dataset schema
python scripts/validate_dataset.py datasets/processed/*.jsonl
# Train one domain (on KXKM-AI, via SSH)
python scripts/train_sft.py \
--base-model Qwen/Qwen2.5-32B-Instruct \
--dataset datasets/processed/stm32_train.jsonl \
--output-dir outputs/sft-stm32
# Evaluate all adapters
python scripts/eval_adapters.py --samples 5
# Publish all adapters to HF
python scripts/publish_adapters.py --org clemsailsrc/ailiance_tuning/ Thin lib — config dataclasses, registry, validator
scripts/ Real entry points (train / eval / publish / build)
datasets/builders/ One builder per domain (seed + HF merge)
datasets/processed/ Built JSONL outputs (gitignored)
configs/ YAML recipes, one per domain
outputs/ Trained adapters (gitignored)
artifacts/ Model registry JSON (gitignored)
tests/ Config validation, dataset schema checks
| Repo | Role |
|---|---|
| mascarade | LLM orchestration — loads adapters at inference time |
| micro-kiki | Downstream runtime — 35-domain routing + cognitive layer |
| ailiance-mac-tuner | Sibling — MLX fine-tuning for Mac Studio (distillation target = teacher) |
- Base-model mismatch:
train_sft.pydefaults toQwen2.5-32B-Instructbuteval_adapters.py/publish_adapters.pyhistorically hardcodedQwen3-8B. Align all three when switching base; adapters are base-specific. - Adapter paths: training writes to
outputs/sft-<domain>-qwen25-32b/, eval expectsoutputs/sft-<domain>/. Rename or symlink. - ShareGPT vs OpenAI format: builders emit ShareGPT (
conversations/from/value), validator enforces OpenAI (messages/role/content). Always runsharegpt_to_openai()before writing JSONL. - HF streaming datasets can hang silently —
--max-samplesis mandatory.
More in CLAUDE.md (Hardware reality + Gotchas sections).
MIT. See LICENSE.