Skip to content

ddjain/krkn-command-assistant

Repository files navigation

KRKN-LLM: Natural Language to CLI Fine-Tuning

Fine-tune LLMs to translate natural language chaos engineering requests into precise krknctl CLI commands.

Overview

KRKN-LLM is a complete pipeline for:

  1. Dataset Generation - Using a local LLM to generate synthetic training data (natural language → CLI commands)
  2. Model Fine-tuning - Training a Qwen model with LoRA adapters
  3. Inference - Testing the fine-tuned model with natural language queries

Quick Start

# 1. Setup environment
make setup
cp .env.example .env
# Edit .env and add your HuggingFace token

# 2. Generate training dataset (requires local LLM server)
# Run these three steps in order:
make generate-dataset  # Step 1: Generate via LLM
make merge-dataset     # Step 2: Combine scenario files
make prepare-dataset   # Step 3: Convert to HuggingFace format

# 3. Train model
make train

# 4. Test model
make inference         # Fine-tuned model
make inference-base    # Base model (comparison)

Project Structure

krkn-llm/
├── config/              # All configuration
│   ├── scenarios/       # 20 chaos scenario definitions
│   ├── prompts/         # System prompts & question types
│   └── global/          # Global CLI flags
├── src/
│   ├── dataset/         # Dataset generation scripts
│   └── model/           # Training & inference scripts
├── outputs/             # Generated files (gitignored)
│   ├── dataset/         # Generated datasets
│   ├── models/          # Trained model adapters
│   └── logs/            # Generation logs
└── docs/                # Documentation

Configuration

The pipeline is designed for zero-code configuration changes:

  • Add new scenario: Drop a JSON file in config/scenarios/
  • Add question type: Edit config/prompts/question_types.json
  • Add global flag: Edit config/global/commands.json

See Configuration Guide for details.

Documentation

Requirements

  • Python 3.8+
  • Local LLM server (for dataset generation) - e.g., LM Studio
  • HuggingFace account & token (for model download)
  • GPU recommended (for training and inference)

Pipeline Stages

1. Dataset Generation

Uses a local LLM to generate training examples:

  • Reads scenario definitions from config/scenarios/
  • Applies question templates from config/prompts/question_types.json
  • Generates natural language → CLI command pairs
  • Outputs JSONL files per scenario

2. Dataset Merging

  • Combines all scenario JSONL files
  • Shuffles for training diversity
  • Creates single dataset file

3. Dataset Preparation

  • Converts to HuggingFace dataset format
  • Applies chat template
  • Tokenizes for training

4. Model Training

  • Fine-tunes Qwen model with LoRA
  • Saves adapters for efficient storage
  • Logs training metrics

5. Inference

  • Loads base model + LoRA adapters
  • Translates natural language to CLI commands
  • Interactive testing mode

Make Targets

Run make help to see all available commands.

Contributing

To add support for new chaos scenarios:

  1. Create a JSON schema in config/scenarios/
  2. Follow the schema format from existing scenarios
  3. Run make generate-dataset - your scenario is automatically discovered

No code changes required!

License

[Add your license here]

Authors

[Add authors here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors