<a href="https://colab.research.google.com/github/ProfSynapse/Toolset-Training/blob/main/kto_tool_calling_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# KTO Training for Tool Calling - Claudesidian Vault Tools

This notebook trains a language model using KTO (Kahneman-Tversky Optimization) to internalize tool calling for the Claudesidian vault application.

**Dataset**: syngen_tools_11.14.25.jsonl (4,652 examples)
- Desirable examples: Correct tool usage with proper parameters
- Undesirable examples: Incorrect tool usage (wrong params, missing required fields)

**Goal**: Train the model to recognize and use the correct tools with correct parameters for vault operations.

## Installation
Fast installation using --no-deps to avoid dependency resolution delays (2-3 minutes)

In [None]:
# Fast installation for Colab - uses --no-deps to avoid dependency resolution delays
print("Installing packages (this may take 2-3 minutes)...")
print("=" * 60)

# Step 1: Install PyTorch 2.4.1 with CUDA 12.1
print("\n[1/10] Installing PyTorch 2.4.1 + CUDA 12.1...")
!pip install -q torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
print("‚úì PyTorch 2.4.1 installed")

# Step 2: Install core dependencies without resolving conflicts (--no-deps)
print("\n[2/10] Installing core dependencies...")
!pip install --no-deps bitsandbytes accelerate peft triton cut_cross_entropy unsloth_zoo
print("‚úì Core dependencies installed")

# Step 3: Install supporting libraries with version constraints
print("\n[3/10] Installing supporting libraries...")
!pip install sentencepiece protobuf "datasets>=2.14.0,<4.0.0" "huggingface_hub>=0.20.0"
print("‚úì Supporting libraries installed")

# Step 4: Install specific versions of transformers and trl
print("\n[4/10] Installing transformers and trl...")
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2
print("‚úì Transformers and TRL installed")

# Step 5: Install tyro and msgspec (required by unsloth)
print("\n[5/10] Installing tyro and msgspec...")
!pip install tyro msgspec
print("‚úì Tyro and msgspec installed")

# Step 6: Install xformers (required by unsloth for fast attention)
print("\n[6/10] Installing xformers...")
!pip install --no-deps xformers
print("‚úì xformers installed")

# Step 7: Install unsloth without dependencies
print("\n[7/10] Installing unsloth...")
!pip install --no-deps unsloth
print("‚úì Unsloth installed")

# Step 8: Ensure numpy compatibility
print("\n[8/10] Ensuring numpy compatibility...")
!pip install "numpy>=1.24.0,<2.0"
print("‚úì NumPy configured")

# Step 9: Force PyTorch back to 2.4.1 (in case xformers upgraded it)
print("\n[9/10] Re-confirming PyTorch 2.4.1 version...")
!pip install -q --force-reinstall torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
import torch
print(f"‚úì PyTorch version locked at: {torch.__version__}")

# Step 10: Install Flash Attention if GPU supports it
print("\n[10/10] Installing Flash Attention (if GPU supports it)...")
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install ninja packaging
    !pip install "flash-attn>=2.5.0" --no-build-isolation
    print("‚úì Flash Attention installed")
else:
    print("‚ö† GPU doesn't support Flash Attention 2 (skipping)")

print("\n" + "=" * 60)
print("‚úì INSTALLATION COMPLETE!")
print("=" * 60)
print("\n‚ö†Ô∏è  IMPORTANT: Restart the runtime now (Runtime ‚Üí Restart runtime)")
print("Then re-run this cell - it will be much faster the second time!")

In [None]:
# Import libraries
print("\nImporting libraries...")
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
import os
import json
from datasets import Dataset
from trl import KTOConfig, KTOTrainer

print("\n‚úì All imports successful!")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Model Loading
Load a pre-trained model suitable for tool calling tasks

In [None]:
# Model configuration
max_seq_length = 4096
dtype = None  # Auto-detect: Float16 for older GPUs, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4-bit quantization for memory efficiency

# Load model and tokenizer
# Options:
# - "unsloth/Qwen2.5-Coder-1.5B-Instruct" (small, fast)
# - "unsloth/Qwen2.5-7B-Instruct" (medium)
# - "unsloth/Llama-3.2-3B-Instruct" (small, good for tool calling)

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-Coder-1.5B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"‚úì Model loaded: {model.config.model_type}")
print(f"‚úì Tokenizer vocab size: {len(tokenizer)}")

## Dataset Loading and Processing
Load the tool calling dataset and convert to KTO format

In [None]:
# Load the dataset from HuggingFace
from datasets import load_dataset

raw_dataset = load_dataset(
    "professorsynapse/claudesidian-synthetic-dataset",
    data_files="syngen_tools_11.14.25.jsonl"
)

print(f"‚úì Dataset loaded: {len(raw_dataset['train'])} examples")

In [None]:
# Convert ChatML format to KTO format
def convert_to_kto_format(example):
    """
    Convert tool calling conversations to KTO format.
    
    Input format:
    {
      "conversations": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "tool_call: ...\narguments: ...\n\nResult: ...\n\n..."}
      ],
      "label": true/false
    }
    
    Output format:
    {
      "prompt": "user message",
      "completion": "assistant tool call and response",
      "label": true/false
    }
    """
    conversations = example["conversations"]
    
    # Extract user and assistant messages
    user_msg = None
    assistant_msg = None
    
    for msg in conversations:
        if msg["role"] == "user":
            user_msg = msg["content"]
        elif msg["role"] == "assistant":
            assistant_msg = msg["content"]
    
    if user_msg is None or assistant_msg is None:
        return None
    
    return {
        "prompt": user_msg,
        "completion": assistant_msg,
        "label": example["label"]
    }

# Process the dataset
kto_data = []
for example in raw_dataset["train"]:
    processed = convert_to_kto_format(example)
    if processed:
        kto_data.append(processed)

# Create HuggingFace Dataset
train_dataset = Dataset.from_dict({
    "prompt": [ex["prompt"] for ex in kto_data],
    "completion": [ex["completion"] for ex in kto_data],
    "label": [ex["label"] for ex in kto_data],
})

# Show statistics
desirable = sum(train_dataset["label"])
undesirable = len(train_dataset) - desirable

print(f"\n‚úì KTO Dataset prepared:")
print(f"  Total examples: {len(train_dataset)}")
print(f"  Desirable (correct tool use): {desirable}")
print(f"  Undesirable (incorrect tool use): {undesirable}")
print(f"  Ratio: {desirable/undesirable:.2f}:1")

# Show example
print(f"\nüìù Example (desirable):")
desirable_ex = [ex for ex in kto_data if ex["label"]][0]
print(f"Prompt: {desirable_ex['prompt'][:100]}...")
print(f"Completion: {desirable_ex['completion'][:150]}...")

print(f"\nüìù Example (undesirable):")
undesirable_ex = [ex for ex in kto_data if not ex["label"]][0]
print(f"Prompt: {undesirable_ex['prompt'][:100]}...")
print(f"Completion: {undesirable_ex['completion'][:150]}...")

## LoRA Configuration
Configure LoRA adapters for efficient fine-tuning

In [None]:
# Apply LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # LoRA rank (higher = more parameters)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=128,  # LoRA scaling factor
    lora_dropout=0.05,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print("‚úì LoRA adapters configured")
print(f"  Rank: 64")
print(f"  Alpha: 128")
print(f"  Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

## KTO Training Configuration
Set up KTO trainer to learn correct vs incorrect tool usage

In [None]:
# KTO Training Arguments
training_args = KTOConfig(
    output_dir="./kto_claudesidian_tools",
    
    # Batch size configuration
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,  # Effective batch size = 32
    
    # KTO-specific parameters
    beta=0.1,  # KTO beta (controls strength of preference optimization)
    desirable_weight=1.0,
    undesirable_weight=1.0,
    
    # Learning rate
    learning_rate=5e-6,
    max_grad_norm=1.0,
    
    # Sequence lengths
    max_length=4096,
    max_prompt_length=2048,
    
    # Memory optimizations
    gradient_checkpointing=True,
    optim="adamw_8bit",
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    
    # Training schedule
    num_train_epochs=3,
    warmup_ratio=0.1,
    
    # Logging and saving
    logging_steps=10,
    save_steps=250,
    save_total_limit=2,
    
    # Performance
    dataloader_num_workers=2,
    report_to="none",  # Change to "wandb" for experiment tracking
)

# Initialize KTO Trainer
kto_trainer = KTOTrainer(
    model=model,
    args=training_args,
    processing_class=tokenizer,
    train_dataset=train_dataset,
)

print("‚úì KTO trainer initialized")
print(f"  Dataset: {len(train_dataset)} examples")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  KTO beta: {training_args.beta}")
print(f"  Epochs: {training_args.num_train_epochs}")

## Training Execution
Train the model to internalize Claudesidian vault tools

In [None]:
# Show memory stats before training
if torch.cuda.is_available():
    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
    max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
    print(f"GPU: {gpu_stats.name}")
    print(f"Max memory: {max_memory} GB")
    print(f"Memory reserved: {start_gpu_memory} GB\n")

In [None]:
# Start training
print("Starting KTO training for tool calling...")
print("=" * 60)

try:
    trainer_output = kto_trainer.train()
    print("\n‚úì Training completed successfully!")
    print(f"Final loss: {trainer_output.training_loss:.4f}")
except Exception as e:
    print(f"\n‚úó Training failed: {type(e).__name__}")
    print(f"Error: {e}")
    raise

## Save Model
Save the trained model and adapters

In [None]:
# Save LoRA adapters locally
model.save_pretrained("claudesidian_tool_lora")
tokenizer.save_pretrained("claudesidian_tool_lora")

print("‚úì Model saved to ./claudesidian_tool_lora")

# Optional: Upload to HuggingFace Hub
# Uncomment and configure:
# HF_USERNAME = "your_username"
# MODEL_NAME = "claudesidian-tool-calling-qwen-1.5b"
# HF_TOKEN = "hf_..."
# 
# model.push_to_hub_merged(
#     f"{HF_USERNAME}/{MODEL_NAME}",
#     tokenizer,
#     save_method="merged_16bit",
#     token=HF_TOKEN
# )

## Inference Testing
Test the trained model with tool calling examples

In [None]:
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer

# Set up for inference
FastLanguageModel.for_inference(model)

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",
)

def test_tool_calling(user_message):
    """Generate tool call for a user request."""
    print("\n" + "="*60)
    print("USER REQUEST:")
    print("="*60)
    print(user_message)
    print("\n" + "-"*60)
    print("MODEL RESPONSE:")
    print("-"*60)
    
    messages = [{"role": "user", "content": user_message}]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")
    
    text_streamer = TextStreamer(tokenizer, skip_special_tokens=True, skip_prompt=True)
    outputs = model.generate(
        input_ids=inputs,
        streamer=text_streamer,
        temperature=0.1,
        max_new_tokens=512,
        use_cache=True
    )
    print("\n")

print("‚úì Inference setup complete")

In [None]:
# Test cases covering different Claudesidian tools
test_cases = [
    # Content reading
    "Show me the contents of my project roadmap file.",
    
    # Content modification
    "Add a header to my meeting notes saying 'Q1 2025 Planning'.",
    
    # File operations
    "Delete the old draft file called 'temp-notes.md'.",
    
    # Workspace operations
    "Switch to my 'Personal' workspace.",
    
    # Agent operations
    "Turn on my Research Assistant agent.",
    
    # Search operations
    "Find all notes that mention 'product launch'.",
    
    # Folder operations
    "Create a new folder called 'Archive-2024'.",
]

print("Testing tool calling with trained model...\n")
for test_case in test_cases:
    test_tool_calling(test_case)

## Summary

This notebook trained a model using KTO to internalize Claudesidian vault tools:

**Tools covered:**
- `contentManager_readContent` - Read file contents
- `contentManager_prependContent` - Add content to file start
- `contentManager_appendContent` - Add content to file end
- `vaultManager_deleteNote` - Delete files
- `workspaceManager_switchWorkspace` - Switch workspaces
- `agentManager_toggleAgent` - Enable/disable agents
- `searchManager_search` - Search for notes
- `folderManager_createFolder` - Create folders

**Training approach:**
- KTO learns from desirable (correct) vs undesirable (incorrect) tool usage
- Model learns to use correct parameter names (e.g., `filePath` not `file`)
- Model learns to include all required parameters
- Model learns when to use which tool

**Next steps:**
1. Test the model with your actual Claudesidian application
2. Collect more examples of edge cases
3. Iterate and retrain for better performance
4. Consider larger models (7B, 14B) for production use