Skip to content

SShadowS/al-train

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

al-train

GitHub Release Python License: MIT

Fine-tune LLMs on AL (Business Central) code using training data from al-corpus.

Overview

Item Detail
Language Python 3.11+
Build hatchling via pyproject.toml
Core deps anthropic, click, pyyaml, sacrebleu
Optional deps unsloth, trl, transformers, torch (training)
Data source SShadowS/al-corpus
Model target LoRA adapter via Unsloth QLoRA

Pipeline

al-corpus pairs     →  pairs.jsonl
       ↓
al-train describe   →  described_pairs.jsonl   (Claude Haiku)
       ↓
al-train format     →  train.jsonl + eval.jsonl (ChatML)
       ↓
al-train train      →  LoRA adapter            (Unsloth QLoRA)
       ↓
al-train eval       →  report                  (5 metrics)

Setup

python -m venv .venv
source .venv/Scripts/activate  # Windows Git Bash
pip install -e ".[train]"

Quick Start

# 1. Generate pairs with al-corpus
al-corpus pairs ./my-al-project -o pairs.jsonl

# 2. Generate descriptions (test with 100 first)
al-train describe pairs.jsonl -o described.jsonl -n 100

# 3. Full corpus via batch API
al-train describe pairs.jsonl -o described.jsonl --batch
al-train describe --poll described.jsonl.batch_id -o described.jsonl

# 4. Format for training
al-train format described.jsonl -o train.jsonl --eval eval.jsonl

# 5. Train
al-train train train.jsonl --eval eval.jsonl

# 6. Evaluate
al-train eval ./output/al-coder-lora --eval-set eval.jsonl

CLI Commands

Command Input Output Description
al-train describe pairs.jsonl described_pairs.jsonl Generate natural-language descriptions via Claude Haiku
al-train describe --batch pairs.jsonl described_pairs.jsonl Submit as an Anthropic batch job
al-train describe --poll .batch_id file described_pairs.jsonl Poll and retrieve a completed batch
al-train format described_pairs.jsonl train.jsonl, eval.jsonl Convert to ChatML format and split train/eval
al-train train train.jsonl LoRA adapter Fine-tune with Unsloth QLoRA
al-train eval LoRA adapter + eval.jsonl Report Evaluate across 5 metrics

Requirements

Requirement Notes
Python 3.11+
ANTHROPIC_API_KEY Required for description generation
NVIDIA GPU, 24 GB+ VRAM Required for training (.[train] extras)
al-corpus on PATH Required for pair generation and evaluation

Key Files

File Purpose
pyproject.toml Package metadata, dependencies, entry points
pairs.jsonl Raw AL code pairs from al-corpus
described_pairs.jsonl Pairs enriched with Claude-generated descriptions
train.jsonl / eval.jsonl ChatML-formatted split ready for training
output/al-coder-lora/ Trained LoRA adapter output directory

Author: Torben Leth <sshadows@sshadows.dk> — License: MIT

About

Fine-tune LLMs on AL (Business Central) code using training data from al-corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages