A complete GUI application for training transformer language models and exporting them to GGUF or ONNX format β from training to interactive chat testing, all in one place.
This tool provides an end-to-end workflow: select a model, train it on your dataset, export to your preferred format (GGUF or ONNX), and immediately test it in an interactive chat β all without leaving the application.
- Python 3.8+ (3.10+ recommended)
- Git (for cloning llama.cpp)
git clone https://github.com/avsDeveloper/ONNX-Model-Trainer.git
cd ONNX-Model-Trainerpip install torch transformers datasets numpy psutil accelerate
pip install onnx onnxruntime optimum
pip install onnxruntime-genai # For ONNX model inferenceFor GPU support (NVIDIA):
pip install torch --index-url https://download.pytorch.org/whl/cu118
pip install onnxruntime-gpu# Clone llama.cpp into the project directory
git clone https://github.com/ggerganov/llama.cpp.git
# Build llama.cpp
cd llama.cpp
make -j$(nproc) # Linux/macOS
# or: cmake -B build && cmake --build build --config Release # Windows
cd ..python trainer.pysudo apt install python3-tk # If tkinter is missingbrew install python-tk- Python from python.org includes tkinter by default
- For llama.cpp, use Visual Studio or MinGW to build
- What it is: A binary format designed for efficient CPU and GPU inference with llama.cpp
- Best for: Local deployment, edge devices, CPU inference, privacy-focused applications
- Pros: Small file sizes with quantization, fast inference, no Python required for deployment
- Quantization options: F16 (full precision), Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K (smallest)
- What it is: An open format for representing machine learning models
- Best for: Cross-platform deployment, cloud services, ONNX Runtime integration
- Pros: Wide ecosystem support, hardware acceleration, framework interoperability
- Quantization options: QInt8, QUInt8 with Dynamic quantization
| Parameter | Description | Default |
|---|---|---|
| Epochs | Number of complete passes through the dataset | 3 |
| Batch Size | Samples processed before updating weights | 4 |
| Learning Rate | Step size for weight updates | 5e-5 |
| Max Length | Maximum token sequence length | 128 |
| Save Steps | Checkpoint save frequency | 500 |
| Warmup Steps | Gradual learning rate increase steps | 100 |
| Scheduler | Learning rate schedule (linear, cosine, constant) | linear |
| Gradient Norm | Maximum gradient magnitude for clipping | 1.0 |
| Weight Decay | L2 regularization strength | 0.01 |
- Quick Test: Fast iteration for testing (1 epoch, small batch)
- Balanced: Good balance of speed and quality
- Quality Focus: Better results, longer training
- Memory Saver: For limited VRAM/RAM systems
- Large Dataset: Optimized for big datasets
- Fine-tune: Gentle training for pre-trained models
| Parameter | Description |
|---|---|
| Quantization Type | Compression level (F16 to Q2_K) |
| Auto-fix EOS | Automatically fix end-of-sequence token for chat models |
| Parameter | Description |
|---|---|
| ONNX Opset | ONNX operation set version (11-17) |
| Quant Format | Quantization format (QInt8/QUInt8) |
| Quant Method | Quantization method (Dynamic) |
| Per-Channel | Better quality quantization |
| Reduce Range | Better hardware compatibility |
| Parameter | Description | Range |
|---|---|---|
| Max Tokens | Maximum response length | 1-2048 |
| Temperature | Randomness (lower = focused) | 0.0-2.0 |
| Top-P | Nucleus sampling threshold | 0.0-1.0 |
| Top-K | Number of top tokens to consider | 1-100 |
| Repetition Penalty | Penalize repeated tokens | 1.0-2.0 |
- Select Base Model: Choose from supported models (GPT-2, DialoGPT, Qwen, Phi, etc.)
- Choose Actions:
- βοΈ Train β Fine-tune on your dataset
- βοΈ Export β Convert to GGUF/ONNX
- βοΈ Quantize β Compress the model
- Configure Training: Select preset or customize parameters
- Browse Dataset: Select your JSON dataset file
- Start Training: Click "Start Training" or "Convert & Export"
- Select Format: Choose GGUF or ONNX
- Quick Select: Pick from your exported models
- Choose Mode: Chat, Q&A, Text Generation, etc.
- Chat: Enter prompts and interact with your model
Create a JSON file with conversation pairs:
[
{"input": "Hello!", "output": "Hi there! How can I help you?"},
{"input": "What's the weather?", "output": "I don't have weather data, but I can help with other questions."}
]output/
βββ ModelName/
βββ 1_trained/ # Trained model files
β βββ model.safetensors
β βββ config.json
β βββ tokenizer.json
βββ 2_converted/ # Exported models
βββ model.gguf # GGUF format
βββ model.onnx # ONNX format
| Model | Size | GGUF | ONNX | Notes |
|---|---|---|---|---|
| SmolLM 135M/360M/1.7B Instruct | 135M-1.7B | β | β | HuggingFace compact chat models |
| SmolLM2 135M/360M/1.7B Instruct | 135M-1.7B | β | β | Improved 2nd generation |
| Qwen2 0.5B/1.5B | 0.5B-1.5B | β | β | Alibaba's efficient models |
| Qwen2.5 0.5B/1.5B/3B Instruct | 0.5B-3B | β | β | Latest Qwen, excellent quality |
| TinyLlama 1.1B Chat | 1.1B | β | β | Compact Llama-based chat |
| StableLM 2 1.6B / Zephyr 1.6B | 1.6B | β | β | Stability AI chat models |
| StableLM Zephyr 3B | 3B | β | β | Larger StableLM variant |
| Phi-1/1.5/2 | 1.3B-2.7B | β | β | Microsoft code-focused |
| Phi-3 Mini 4K Instruct | 3.8B | β | β | Microsoft's latest compact |
| MiniCPM 2B | 2B | β | β | OpenBMB compact chat |
| Model | Size | GGUF | ONNX | Notes |
|---|---|---|---|---|
| GPT-2 / DistilGPT-2 | 82M-1.5B | β | β | Great for learning |
| DialoGPT Small/Medium/Large | 117M-774M | β | β | Conversational |
| GPT-Neo 125M/1.3B | 125M-1.3B | β | β | Open source GPT |
| OPT 125M/350M | 125M-350M | β | β | Meta's open models |
| Model | Size | GGUF | ONNX | Notes |
|---|---|---|---|---|
| Gemma 2B/7B | 2B-7B | β | β | Google's open models |
| Llama 2 7B | 7B | β | Requires authentication | |
| Mistral 7B | 7B | β | High quality 7B |
Build llama.cpp in the project directory:
cd llama.cpp && make -j$(nproc)- Reduce batch size
- Use "Memory Saver" preset
- Enable gradient checkpointing
- Ensure proper chat template is applied
- Check tokenizer files are preserved
- Try different generation parameters
MIT License β Use at your own risk.
This app was primarily written with AI assistance, intended to train AI models using AI-generated datasets. These models will be used by AI-based apps. π€