# Dialogue Fine-tuning with Axolotl

This notebook demonstrates how to use Axolotl for dialogue model fine-tuning. Axolotl is a powerful tool that simplifies the process of fine-tuning language models, especially for dialogue tasks, by handling:

- Proper dialogue formatting
- Context window management
- Multi-turn conversation handling
- Efficient training configurations
- QLoRA integration

## Setup and Installation

In [None]:
# Install Axolotl and dependencies
!pip install -q git+https://github.com/OpenAccess-AI-Collective/axolotl
!pip install -q accelerate bitsandbytes wandb

## Create Axolotl Configuration

Axolotl uses YAML configuration files. Let's create one for our dialogue fine-tuning task.

In [None]:
%%writefile ../config/model_configs/axolotl_dialogue_config.yml
base_model: mistralai/Mistral-7B-v0.1
model_config:
  trust_remote_code: true
  use_flash_attention_2: true

datasets:
  - path: ../data/processed/dialogue_format.jsonl
    type: jsonl
    format: chatml
    conversation:
      turns_key: turns
      user_key: user
      assistant_key: assistant

sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true

adapter: qlora
lora_model_dir: ../models/axolotl_dialogue

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

load_in_4bit: true
bf16: true
flash_attention: true

micro_batch_size: 4
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
warmup_steps: 100
save_steps: 100
logging_steps: 10
weight_decay: 0.001

eval_steps: 50
save_total_limit: 3
optimizer: adamw_torch

## Load Required Libraries

In [None]:
import os
import yaml
import torch
from accelerate import Accelerator
from axolotl.utils.config import load_config
from axolotl.utils.dict import DictDefault

## Load and Verify Configuration

In [None]:
# Load configuration
config_path = "../config/model_configs/axolotl_dialogue_config.yml"
with open(config_path, 'r') as f:
    cfg = yaml.safe_load(f)
    
cfg = DictDefault(cfg)
print("Configuration loaded successfully!")
print(f"Base model: {cfg.base_model}")
print(f"Dataset path: {cfg.datasets[0]['path']}")

## Start Training

Axolotl provides a CLI for training, but we can also run it programmatically.

In [None]:
# Run Axolotl training
!accelerate launch -m axolotl.cli.train ../config/model_configs/axolotl_dialogue_config.yml

## Load and Test the Fine-tuned Model

After training, we can load and test our fine-tuned model.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

def load_model(base_model_path, adapter_path):
    # Load base model
    model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        device_map="auto",
        trust_remote_code=True,
        load_in_4bit=True
    )
    
    # Load adapter
    model = PeftModel.from_pretrained(model, adapter_path)
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    tokenizer.pad_token = tokenizer.eos_token
    
    return model, tokenizer

# Load the fine-tuned model
model_path = "mistralai/Mistral-7B-v0.1"
adapter_path = "../models/axolotl_dialogue"
model, tokenizer = load_model(model_path, adapter_path)

In [None]:
def generate_response(prompt, max_length=200):
    # Format prompt for chat
    chat_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
    
    inputs = tokenizer(chat_prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test the model
test_prompt = "What's your favorite book and why?"
response = generate_response(test_prompt)
print(f"User: {test_prompt}")
print(f"Assistant: {response}")

## Multi-turn Conversation Test

Let's test the model with a multi-turn conversation to see how it handles context.

In [None]:
def chat_conversation(conversation_history="", user_input=""):
    if conversation_history:
        prompt = conversation_history + f"\n<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n"
    else:
        prompt = f"<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n"
    
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_length=2048,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test multi-turn conversation
conversation = ""
turns = [
    "Hi! Can you help me learn about machine learning?",
    "What should I learn first: supervised or unsupervised learning?",
    "Can you give me a simple example of supervised learning?"
]

for turn in turns:
    print(f"\nUser: {turn}")
    conversation = chat_conversation(conversation, turn)
    print(f"Assistant: {conversation}")