### Fine Tuning Gemma-3-4B using Low Rank Adaption (LoRA):

The following notebook fine tunes an SLM on Kusto Query Language (KQL) queries. We use a ground truth dataset separate from the NL2KQL evaluation to fine-tune an LLM on KQL queries. There are two different types of fine-tuning that we try:

- Supervised Fine-Tuning
- CoT Fine-Tuning

The following notebook will walk you through both processes and explain how to fine-tune a Gemma-3-4B-IT SLM on both types of fine-tuning.

In [None]:
import os
import torch
import yaml
import json
import re
import pickle

import numpy as np

# Huggingface Imports:
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer
from sklearn.model_selection import train_test_split
from huggingface_hub import login

from datasets import load_dataset, DatasetDict
import pandas as pd

import peft
from peft import LoraConfig, PeftModel, get_peft_model #(Performance Enhancing Fine-Tuning)
from trl import SFTTrainer, SFTConfig

with open("../config.yaml", "r") as f:
    token_config = yaml.safe_load(f)

In [None]:
TOKEN = token_config['huggingface']['token']

# Two Options: "Supervised", or "CoT" (for Chain-of-Thought Fine-Tuning)
mode = "CoT"
torch.cuda.set_device(0)

**NOTE:** You may run into some issues in Fine-Tuning models if you do not login formally through Huggingface. For this reason, run the following cell and put the Huggingface Token.

In [None]:
login()

### Defining Default Variables:

First we define the base model, the dataset that we want to fine-tune the base LLM on, and the new model name to save the fine-tuned model to. For the purposes of this fine-tuning exercise, we do not use 4-bit quantization because we have sufficient resources to fine-tune the LLM. In the case that resources are limited, feel free to incorporate 4-bit quantization.

In [None]:
# Use HuggingFace paths ONLY:

base_model = ""

### Preprocessing the Dataset:

We will need to preprocess the original ground truth dataset into an instruction format that is similar to the NL2KQL format. The purpose behind doing this is to make sure the structures remain consistent during training and inference time. This ensures a fair chance is given to the model in order to learn the same structure prompt as NL2KQL.

Because we do not know the respective schemas, values, and examples chosen by NL2KQL in advance, we will forgo these features when altering the structure of the prompts in Fine-Tuning.

In [None]:
df = pd.read_csv("training_data.csv")
df = df[0:1000]

lst = []
for idx, row in df.iterrows():
    lst.append(f"{row['Explanation']}. Therefore, the answer is ```kusto\n{row['kql']}```")

df['kql_formatted'] = lst
df = df[['tables', 'theme', 'nlq', 'kql',
       'similarity_score', 'syntax', 'semantic', 'Explanation', 'Description',
       'kql_formatted']]

df.to_csv('training_data.csv')

In [None]:
# User Message:
user_message = ""
with open("deepseek-prompt-good.txt", "r") as f:
    user_message = f.read()
    
def create_conversation(sample):
    if mode == "Supervised":
        return {"messages": [
            {"role": "user", "content": user_message.format(USER_REQUEST_PLACEHOLDER = sample['nlq'])},
            {"role": "assistant", "content": sample["kql"]}
            ]
          }
    elif mode == "CoT":
        return {"messages": [
            {"role": "user", "content": user_message.format(USER_REQUEST_PLACEHOLDER = sample['nlq'])},
            {"role": "assistant", "content": sample["kql_formatted"]}
            ]
          }
    
dataset = load_dataset("csv", data_files = "training_data.csv", split = "train")

# Convert dataset to OAI messages:
dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)

# Split Dataset into 800 training samples and 200 test samples:
dataset = dataset.train_test_split(test_size=200/1000)

In [None]:
alpha_ranges = [2, 4, 8]
rank_ranges = [1, 2, 4]
lora_dropout = [0.1, 0.2]

for alpha in alpha_ranges:
    for rank in rank_ranges:
        for dropout in lora_dropout:

            model = AutoModelForCausalLM.from_pretrained(base_model, 
                                             token=TOKEN, 
                                             torch_dtype="auto", device_map={"": 0})

            tokenizer = AutoTokenizer.from_pretrained(base_model, token=TOKEN)

            if mode == "Supervised":
                new_model = f"Supervised_GemmaFineTune_{alpha}_{rank}_{dropout}"
            elif mode == "CoT":
                new_model = f"CoT_GemmaFineTune_{alpha}_{rank}_{dropout}"
            
            # Low Rank Adaption (Fine-Tuning Method):
            peft_parameters = LoraConfig(
                lora_alpha = alpha,
                lora_dropout = dropout, 
                r = rank, 
                bias="none", 
                task_type="CAUSAL_LM",
                target_modules=["q_proj", "k_proj"]
            )

            # Incorporate the LoRA parameters into the model:
            model = get_peft_model(model, peft_parameters)

            trainer = SFTTrainer(
                model = model,
                train_dataset = dataset["train"],
                eval_dataset = dataset['test'],
                peft_config=peft_parameters,
                processing_class = tokenizer,
                args = SFTConfig(
                    num_train_epochs = 1,
                    per_device_train_batch_size=5,
                    gradient_accumulation_steps=1,
                    optim="paged_adamw_32bit",
                    remove_unused_columns=False,
                    save_steps=25,
                    logging_steps=25,
                    learning_rate=2e-4,
                    weight_decay=0.001,
                    fp16=False,
                    bf16=False,
                    max_grad_norm=0.3,
                    max_steps=-1,
                    warmup_ratio=0.03,
                    group_by_length=True,
                    lr_scheduler_type="constant"),
            )

            # Train the model:
            trainer.train()

            # Final Loss:
            avg_train_loss = trainer.state.log_history[-1]['train_loss']

            # Evaluate the model:
            metrics = trainer.evaluate()
            metrics['avg_train_loss'] = avg_train_loss

            # Save the models:
            trainer.model.save_pretrained(new_model)
            trainer.tokenizer.save_pretrained(new_model)

            # Output metrics to .pkl
            with open(f"{new_model}/metrics.pkl", "wb") as f:
                pickle.dump(metrics, f)

            model = model.cpu()

### Choosing the Best Model:

From here, we can choose the best fine-tuned model based on the assessed evaluation loss. In each of the models reproduced, we stored a ```metrics.pkl``` file to keep track of the evaluation loss. To proceed forward, do the following:

1. Create two folders: ```Supervised_Fine_Tuning``` and ```CoT_Fine_Tuning```
2. Move each fine-tuned model into the respective folders, and change the ```folder``` variable to either ```Supervised_Fine_Tuning``` or ```CoT_Fine_Tuning```.
3. Run the snippet below to produce the Fine-Tuned Models
4. Run the final block below to get the ```alpha```, ```rank```, and ```LoRA Dropout``` value that minimizes evaluation loss.

In [None]:
main_folder = ''

In [None]:
gemma_folders = [entry for entry in os.listdir(main_folder)]

eval_loss = np.inf
alpha_best = -1
rank_best = -1
dropout_best = -1
best_folder = None

for folder in sorted(gemma_folders):

    if mode == "Supervised":
        regex = r'Supervised_GemmaFineTune_(.*)_(.*)_(.*)'
    elif mode == "CoT":
        regex = r'CoT_GemmaFineTune_(.*)_(.*)_(.*)'
        
    with open(f"{main_folder}/{folder}/metrics.pkl", "rb") as f:
        metrics_file = pickle.load(f)

    match = re.search(regex, folder, flags=re.DOTALL)
    alpha = match.group(1)
    rank = match.group(2)
    dropout = match.group(3)

    if metrics_file["eval_loss"] < eval_loss:
        eval_loss = metrics_file["eval_loss"]
        alpha_best = alpha
        rank_best = rank
        dropout_best = dropout
        best_folder = folder

print("Best model:")
print(f"  folder:      {best_folder}")
print(f"  eval_loss:   {eval_loss}")
print(f"  alpha_best:  {alpha_best}")
print(f"  rank_best:   {rank_best}")
print(f"  dropout_best:{dropout_best}")


In [None]:
import pandas as pd

gemma_folders = [entry for entry in os.listdir(main_folder)]
all_metrics = []

for folder in sorted(gemma_folders):
    if mode == "Supervised":
        regex = r'Supervised_GemmaFineTune_(.*)_(.*)_(.*)'
    elif mode == "CoT":
        regex = r'CoT_GemmaFineTune_(.*)_(.*)_(.*)'
        
    try:
        with open(f"{main_folder}/{folder}/metrics.pkl", "rb") as f:
            metrics_file = pickle.load(f)
        
        match = re.search(regex, folder, flags=re.DOTALL)
        alpha = match.group(1)
        rank = match.group(2)
        dropout = match.group(3)
        
        # Store all metrics
        all_metrics.append({
            'folder': folder,
            'eval_loss': metrics_file["eval_loss"],
            'alpha': alpha,
            'rank': rank,
            'dropout': dropout
        })
    except Exception as e:
        print(f"Error processing {folder}: {e}")

# Convert to DataFrame for easy viewing and analysis
metrics_df = pd.DataFrame(all_metrics)

# Sort by eval_loss to see best models first
metrics_df = metrics_df.sort_values('eval_loss')

print("All models:")
print(metrics_df.to_string(index=False))

# Still get the best model
best_model = metrics_df.iloc[0]
print("\nBest model:")
print(f"  folder:      {best_model['folder']}")
print(f"  eval_loss:   {best_model['eval_loss']}")
print(f"  alpha_best:  {best_model['alpha']}")
print(f"  rank_best:   {best_model['rank']}")
print(f"  dropout_best:{best_model['dropout']}")