Fine-tune LLMs to translate natural language chaos engineering requests into precise krknctl CLI commands.
KRKN-LLM is a complete pipeline for:
- Dataset Generation - Using a local LLM to generate synthetic training data (natural language → CLI commands)
- Model Fine-tuning - Training a Qwen model with LoRA adapters
- Inference - Testing the fine-tuned model with natural language queries
# 1. Setup environment
make setup
cp .env.example .env
# Edit .env and add your HuggingFace token
# 2. Generate training dataset (requires local LLM server)
# Run these three steps in order:
make generate-dataset # Step 1: Generate via LLM
make merge-dataset # Step 2: Combine scenario files
make prepare-dataset # Step 3: Convert to HuggingFace format
# 3. Train model
make train
# 4. Test model
make inference # Fine-tuned model
make inference-base # Base model (comparison)krkn-llm/
├── config/ # All configuration
│ ├── scenarios/ # 20 chaos scenario definitions
│ ├── prompts/ # System prompts & question types
│ └── global/ # Global CLI flags
├── src/
│ ├── dataset/ # Dataset generation scripts
│ └── model/ # Training & inference scripts
├── outputs/ # Generated files (gitignored)
│ ├── dataset/ # Generated datasets
│ ├── models/ # Trained model adapters
│ └── logs/ # Generation logs
└── docs/ # Documentation
The pipeline is designed for zero-code configuration changes:
- Add new scenario: Drop a JSON file in
config/scenarios/ - Add question type: Edit
config/prompts/question_types.json - Add global flag: Edit
config/global/commands.json
See Configuration Guide for details.
- Configuration Guide - How to add scenarios, questions, and flags
- Architecture - System design and pipeline flow
- Examples - Usage examples and customization
- Python 3.8+
- Local LLM server (for dataset generation) - e.g., LM Studio
- HuggingFace account & token (for model download)
- GPU recommended (for training and inference)
Uses a local LLM to generate training examples:
- Reads scenario definitions from
config/scenarios/ - Applies question templates from
config/prompts/question_types.json - Generates natural language → CLI command pairs
- Outputs JSONL files per scenario
- Combines all scenario JSONL files
- Shuffles for training diversity
- Creates single dataset file
- Converts to HuggingFace dataset format
- Applies chat template
- Tokenizes for training
- Fine-tunes Qwen model with LoRA
- Saves adapters for efficient storage
- Logs training metrics
- Loads base model + LoRA adapters
- Translates natural language to CLI commands
- Interactive testing mode
Run make help to see all available commands.
To add support for new chaos scenarios:
- Create a JSON schema in
config/scenarios/ - Follow the schema format from existing scenarios
- Run
make generate-dataset- your scenario is automatically discovered
No code changes required!
[Add your license here]
[Add authors here]