# Fine-Tuning a model for Function-Calling

> From [Hugging Face Agents Course](https://huggingface.co/learn/agents-course/bonus-unit1/fine-tuning).

## Import the librairies

In [1]:
from enum import Enum
from functools import partial
import pandas as pd
import torch
import json

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig, TaskType

seed = 42
set_seed(seed)

import os

## Processing the dataset into inputs

In order to train the model, we need to **format the inputs into what we want the model to learn**.

For this tutorial, I enhanced a popular dataset for function calling `NousResearch/hermes-function-calling-v1` by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.

But in order for the model to learn, we need to **format the conversation correctly**. Going from a list of messages to a prompt is handled by the **chat_template**, The default chat_template of gemma-2-2B does not contain tool calls, so we will need to modify it!

This is the role of our preprocess function. To go from a list of messages, to a prompt that the model can understand.

In [None]:
# Set device
device = (
    "cuda" if torch.cuda.is_available() else "cpu"
)

print(f"Using device: {device}")

In [5]:
model_name = "google/gemma-2-2b-it"
dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"

In [5]:
dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")

### Let's Modify the Tokenizer

The tokenizer splits text into sub-words by default. This is not what we want for our new special tokens!

While we segmented our examples (as we will see) using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does not yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, **we must add these tokens to our tokenizer**.

Additionally, since we changed the chat_template in our preprocess function to format conversations as messages within a prompt, **we also need to modify the chat_template in the tokenizer** to reflect these changes.



In [3]:
class ChatmlSpecialTokens(str, Enum):
    tools = "<tools>"
    eotools = "</tools>"
    think = "<think>"
    eothink = "</think>"
    tool_call = "<tool_call>"
    eotool_call = "</tool_call>"
    tool_response = "<tool_reponse>"
    eotool_response = "</tool_reponse>"
    pad_token = "<pad>"
    eos_token = "<eos>"
    
    @classmethod
    def list(cls):
        return [c.value for c in cls]

In [6]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    pad_token=ChatmlSpecialTokens.pad_token.value,
    additional_special_tokens=ChatmlSpecialTokens.list()
)
print("From: ", tokenizer.chat_template)
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
print("\n\nTo: ", tokenizer.chat_template)

From:  {{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '
' + message['content'] | trim + '<end_of_turn>
' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model
'}}{% endif %}


To:  {{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '
' + message['content'] | trim + '<end_of_turn><eos>
' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model
'}}{% endif %}


In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    attn_implementation='eager',
    device_map=device
)
model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)
print("Model loaded")

### Think

Recent work—like the deepseek model or the paper "Test-Time Compute"—suggests that giving an LLM time to “think” before it answers (or in this case, before taking an action) can significantly improve model performance.

The [`NousResearch/hermes-function-calling-v1`](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) is considered a reference when it comes to function-calling datasets, but the dataset in this Notebook ([`Jofthomas/hermes-function-calling-thinking-V1`](https://huggingface.co/datasets/Jofthomas/hermes-function-calling-thinking-V1)) takes the original dataset to give it to deepseek-ai/DeepSeek-R1-Distill-Qwen-32B in order to compute some thinking tokens `<think>` before any function call.

In [9]:
def preprocess(sample):
    messages = sample["messages"]
    first_message = messages[0]

    # Instead of adding a system message, we merge the content into the first user message
    if first_message["role"] == "system":
        system_message_content = first_message["content"]
        # Merge system content with the first user message
        messages[1]["content"] = (
            f"{system_message_content} " 
            "Also, before making a call to a function take the time to plan the function to take. "
            "Make that thinking process between <think>{your thoughts}</think>\n\n"
            f"{messages[1]['content']}"
        )
        # Remove the system message from the conversation
        messages.pop(0)

    # Apply the chat template
    return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}

In [10]:
dataset = dataset.map(preprocess, remove_columns="messages")
dataset = dataset["train"].train_test_split(0.1)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 3213
    })
    test: Dataset({
        features: ['text'],
        num_rows: 357
    })
})


### Checking the inputs

Let's manually look at what an input looks like!

In this example we have:

1. A User message containing the necessary information with the list of available tools inbetween `<tools></tools>` then the user query, here:  "Can you get me the latest news headlines for the United States?"

2. An Assistant message here called "model" to fit the criterias from gemma models containing two new phases, a "thinking" phase contained in `<think></think>` and an "Act" phase contained in `<tool_call></<tool_call>`.

3. If the model contains a `<tools_call>`, we will append the result of this action in a new "Tool" message containing a `<tool_response></tool_response>` with the answer from the tool.

In [11]:
# Let's look at how we formatted the dataset
print(dataset["train"][8]["text"])

<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_news_headlines', 'description': 'Get the latest news headlines', 'parameters': {'type': 'object', 'properties': {'country': {'type': 'string', 'description': 'The country for which headlines are needed'}}, 'required': ['country']}}}, {'type': 'function', 'function': {'name': 'search_recipes', 'description': 'Search for recipes based on ingredients', 'parameters': {'type': 'object', 'properties': {'ingredients': {'type': 'array', 'items': {'type': 'string'}, 'description': 'The list of ingredients'}}, 'required': ['ingredients']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall

In [12]:
# Sanity check
print(tokenizer.pad_token)
print(tokenizer.eos_token)

<pad>
<eos>


## Training

### Testing the model before training

In [9]:
#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
example_prompt="""You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?"""
example_prompt = tokenizer.apply_chat_template([
    {"role": "human", "content": example_prompt},
], tokenize=False, add_generation_prompt=True) + "<think>"  # Add the thinking prompt
print(example_prompt)

<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The star

In [10]:
inputs = tokenizer(example_prompt, return_tensors="pt").to(device)

In [None]:
outputs = model.generate(
    **inputs,
    max_new_tokens=300,# Adapt as necessary
    do_sample=True,
    top_p=0.95,
    temperature=0.01,
    repetition_penalty=1.0,
    eos_token_id=tokenizer.eos_token_id
)

In [12]:
print(tokenizer.decode(outputs[0]))

<bos><bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The

### Let's configure the LoRA

This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training.

In [14]:
# We can configure LoRA parameters
# r: rank dimension for LoRA update matrices (smaller = more compression)
rank_dimension = 16
# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
lora_alpha = 64
# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
lora_dropout = 0.05

peft_config = LoraConfig(
    r=rank_dimension,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=[  # wich layer in the transformers do we target ?
        "gate_proj","q_proj", "o_proj","k_proj", 
        "down_proj","up_proj","v_proj",
        "lm_head", "embed_tokens"
    ],
    task_type=TaskType.CAUSAL_LM
)

### Let's define the Trainer and the Fine-Tuning hyperparameters

In [13]:
# The directory where the trained model checkpoints, logs, and other artifacts will be saved.
output_dir = "checkpoints"
exp_name = "gemma-2-2B-it-thinking-function_calling-V0"
username = "marioparreno"

In [None]:
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 4
logging_steps = 5
learning_rate = 1e-4 # The initial learning rate for the optimizer.

max_grad_norm = 1.0
num_train_epochs=1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500

training_arguments = SFTConfig(
    output_dir=f"{output_dir}/{exp_name}",
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    save_strategy="no",
    eval_strategy="epoch",
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    max_grad_norm=max_grad_norm,
    weight_decay=0.1,
    warmup_ratio=warmup_ratio,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard",
    bf16=True,
    hub_private_repo=False,
    push_to_hub=False,
    num_train_epochs=num_train_epochs,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    packing=True,
    max_seq_length=max_seq_length,
)

As Trainer, we use the SFTTrainer which is a Supervised Fine-Tuning Trainer.

In [None]:
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
    peft_config=peft_config,
)

Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕.



In [17]:
trainer.train()
trainer.save_model()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Epoch,Training Loss,Validation Loss
0,0.2797,0.293311




### Let's push the Model and the Tokenizer to the Hub
Let's push our model and out tokenizer to the Hub !

In [None]:
trainer.push_to_hub(f"{username}/{exp_name}")

Since we also modified the chat_template Which is contained in the tokenizer, let's also push the tokenizer with the model.

In [None]:
tokenizer.eos_token = "<eos>"
tokenizer.push_to_hub(f"{username}/{exp_name}", token=True)

## Inference

We will:

1. Load the adapter from the hub!
2. Load the base model : `google/gemma-2-2b-it` from the hub
3. Resize the model to with the new tokens we introduced!

In [14]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

In [None]:
peft_model_id = f"{username}/{exp_name}"  # Replace with your trained adapter
config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
)

tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id)

model.to(device)  # Move model to the selected device
model.eval()

print(f"Model loaded on {device}")

In [17]:
#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
example_prompt="""You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?"""
example_prompt = tokenizer.apply_chat_template([
    {"role": "human", "content": example_prompt},
], tokenize=False, add_generation_prompt=True) + "<think>"  # Add the thinking prompt
print(example_prompt)

<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The star

In [18]:
inputs = tokenizer(example_prompt, return_tensors="pt").to(device)

In [None]:
outputs = model.generate(
    **inputs,
    max_new_tokens=512,# Adapt as necessary
    do_sample=True,
    top_p=0.95,
    temperature=0.01,
    repetition_penalty=1.0,
    eos_token_id=tokenizer.eos_token_id
)

In [20]:
print(tokenizer.decode(outputs[0]))

<bos><bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The