# Unit 1: 5 - Creating a Tool Calling Agent with SmolLM2 from Scratch

**Collaborators**:
* Roberto Rodriguez ([@Cyb3rWard0g](https://x.com/Cyb3rWard0g))

## Overview

This notebook demonstrates how to build an autonomous **Tool-Calling Agent** using SmolLM2. The agent intelligently determines when a function call is needed and executes the corresponding tool when required. We explore how SmolLM2 structures tool calls and integrate a flexible system for executing them.

### Install Required Libraries

In [None]:
# !pip install transformers torch

## Define LM Client

In [1]:
from typing import List, Dict

class LMClient:
    """
    Handles communication with SmolLM2 for generating responses.
    """

    def __init__(self, model, tokenizer, device, max_new_tokens = 512):
        self.model = model
        self.tokenizer = tokenizer
        self.device = device
        self.max_new_tokens = max_new_tokens

    def generate(self, messages: List[Dict]) -> str:
        """
        Generates a response from SmolLM2 given a conversation history.

        Args:
            messages (List[Dict]): The list of messages in the chat.

        Returns:
            str: The generated response.
        """
        # Convert messages into model-compatible format
        input_text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

        # Encode input with attention mask
        encoded_input = self.tokenizer(input_text, return_tensors="pt").to(self.device)
        input_ids = encoded_input["input_ids"]
        attention_mask = encoded_input["attention_mask"]

        # Generate response
        outputs = self.model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=self.max_new_tokens,
            eos_token_id=self.tokenizer.eos_token_id
        )

        # Decode assistant response
        generated_tokens = outputs[0][input_ids.shape[1]:]
        return self.tokenizer.decode(generated_tokens, skip_special_tokens=True)

## Define Tool Class and Decorator

In [2]:
import inspect

class Tool:
    """
    Represents an AI-registered tool.
    """

    def __init__(self, name: str, description: str, func: callable):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = inspect.signature(func).parameters
        self.outputs = inspect.signature(func).return_annotation
    
    def to_string(self) -> str:
        """
        Returns a structured representation of the tool.
        """
        args_str = ", ".join([f"{arg}: {param.annotation}" for arg, param in self.arguments.items()])
        return f"Tool Name: {self.name}, Description: {self.description}, Arguments: {args_str}, Outputs: {self.outputs}"

    def __call__(self, *args, **kwargs):
        """Invoke the tool."""
        return self.func(*args, **kwargs)


def tool(func):
    """
    Decorator to register a function as a tool.
    """
    return Tool(func.__name__, func.__doc__, func)

## Define Agent Class

### Define Logging

In [3]:
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

### Define System Prompt Template

In [4]:
from jinja2 import Template

SYSTEM_PROMPT = Template("""
You are an AI assistant that can use tools when needed. 
You will decide when to call a tool based on the user's query. 

If a tool is needed, return output in this format:
<tool_call>[
{"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}},
... (more tool calls as required)
]</tool_call>

Otherwise, respond naturally.
You have access to the following tools:
<tools>{{ tools }}</tools>
""")

### Defining Tool Calling Workflow

In [67]:
from typing import List, Dict, Any
import json
import re
import random

class ToolCallingAgent:
    """
    Implements a tool-calling agent for SmolLM2, integrating tool execution
    and structured response parsing.
    """

    def __init__(self, model: LMClient):
        self.model = model
        self.system_prompt = SYSTEM_PROMPT
        self.tools = {}
        self.history = []

    def register_tool(self, tool: Tool):
        """
        Registers a tool for function calling.

        Args:
            tool (Tool): The tool instance.
        """
        if not isinstance(tool, Tool):
            raise TypeError(f"Expected Tool instance, got {type(tool)}")
        self.tools[tool.name] = tool
        logger.info(f"Registered tool: {tool.name}")

    def prepare_messages(self, query: str) -> List[Dict[str, str]]:
        """
        Prepares structured messages including system instructions.

        Args:
            query (str): The user query.

        Returns:
            List[Dict[str, str]]: Formatted conversation messages.
        """
        tool_descriptions = "\n".join([t.to_string() for t in self.tools.values()])
        rendered_prompt = self.system_prompt.render(tools=tool_descriptions)
        
        system_message = {"role": "system", "content": rendered_prompt}
        self.history.append({"role": "user", "content": query})
        messages = [system_message] + self.history

        return messages

    def parse_response(self, text: str) -> Any:
        """
        Parses SmolLM2 response, extracting tool calls if present.

        Args:
            text (str): The model-generated response.

        Returns:
            Any: Parsed tool calls or direct assistant response.
        """
        logger.info(f"Received response from model: {text}")

        pattern = r"<tool_call>(.*?)</tool_call>"
        match = re.search(pattern, text, re.DOTALL)
        if match:
            tool_calls = json.loads(match.group(1))
            logger.info(f"Extracted tool calls: {tool_calls}")
            return tool_calls
        
        logger.info("No tool calls detected, returning direct response.")
        return text

    def _execute_tool_calls(self, tool_calls: List[Dict[str, Any]]) -> List[Dict[str, str]]:
        """
        Executes the requested tool functions and formats assistant-user message pairs.

        Args:
            tool_calls (List[Dict[str, Any]]): List of tool calls.

        Returns:
            List[Dict[str, str]]: Formatted assistant-user message pairs.
        """
        tool_history = []

        for tool_call in tool_calls:
            tool_call_id = "".join(random.choices("0123456789", k=5))
            tool_name = tool_call["name"]
            tool_args = tool_call["arguments"]

            logger.info(f"Executing tool: {tool_name} with arguments {tool_args}")

            if tool_name in self.tools:
                tool_result = self.tools[tool_name](**tool_args)
            else:
                tool_result = f"Error: Unknown tool {tool_name}"
                logger.error(tool_result)

            # Assistant tool call message
            assistant_tool_message = {
                "role": "assistant",
                "content": f"Tool Chosen (id: {tool_call_id}) -> {tool_name}"
            }

            # User tool response message
            user_tool_response = {
                "role": "user",
                "content": f"Tool Execution Results (id: {tool_call_id}) -> {json.dumps(tool_result)}"
            }

            tool_history.append(assistant_tool_message)
            tool_history.append(user_tool_response)

        return tool_history

    def run(self, query: str) -> Any:
        """
        Processes a user query, generates a response, and executes tools if needed.

        Args:
            query (str): User query.

        Returns:
            Any: The final natural response from the assistant.
        """
        logger.info(f"User query received: {query}")
        messages = self.prepare_messages(query)

        # Generate response
        response_text = self.model.generate(messages)

        # Parse response
        parsed_response = self.parse_response(response_text)

        # If tool calls were made, execute them
        if isinstance(parsed_response, list):
            tool_execution_messages = self._execute_tool_calls(parsed_response)

            # Add tool execution messages to conversation
            messages.extend(tool_execution_messages)

            logger.info(f"Tool execution completed. Sending tool results back to model for final response.")

            # Ask model to summarize the final response based on the query and tool results
            messages.append({
                "role": "user",
                "content": f"Based on the original question: '{query}', and the tool execution results, provide a clear and natural response."
            })

            # Generate final assistant response
            final_response_text = self.model.generate(messages)

            # Store only the final response in history
            self.history.append({"role": "assistant", "content": final_response_text})

            logger.info(f"Final response from assistant: {final_response_text}")
            return final_response_text

        # Otherwise, return natural response
        self.history.append({"role": "assistant", "content": parsed_response})
        logger.info(f"Returning direct response: {parsed_response}")
        return parsed_response


## Initializing SmolLM2 Agent

### Loading SmolLM2 Efficiently

To avoid downloading the model every time (**~3.42 GB**), we first check if it exists locally before loading:

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os

MODEL_NAME = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
MODEL_DIR = "data/smollm2"

def load_model():
    if os.path.exists(MODEL_DIR):
        print("Loading model from local directory.")
        model = AutoModelForCausalLM.from_pretrained(MODEL_DIR)
    else:
        print("Downloading model...")
        model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
        model.save_pretrained(MODEL_DIR)
    return model

device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = load_model().to(device)

Loading model from local directory.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Initializing Language Model Client

In [7]:
model = LMClient(model, tokenizer, device)

### Defining Tools

Tools allow the model to execute external functions when needed. We define them as Python functions and convert them into JSON schemas for SmolLM2 to understand their purpose.

In [68]:
import datetime
import random

@tool
def get_current_time() -> str:
    """Returns the current time in HH:MM:SS format."""
    return datetime.datetime.now().strftime("%H:%M:%S")
@tool
def get_random_number(min: int, max: int) -> int:
    """Returns a random number between min and max."""
    return random.randint(min, max)

In [69]:
get_current_time.to_string()

"Tool Name: get_current_time, Description: Returns the current time in HH:MM:SS format., Arguments: , Outputs: <class 'str'>"

In [70]:
get_random_number.to_string()

"Tool Name: get_random_number, Description: Returns a random number between min and max., Arguments: min: <class 'int'>, max: <class 'int'>, Outputs: <class 'int'>"

### Initializing Agent

In [71]:
agent = ToolCallingAgent(model=model)

### Registering Tools

In [72]:
agent.register_tool(get_current_time)
agent.register_tool(get_random_number)

2025-02-17 17:47:00,521 - INFO - Registered tool: get_current_time
2025-02-17 17:47:00,522 - INFO - Registered tool: get_random_number


In [73]:
agent.tools

{'get_current_time': <__main__.Tool at 0x146aebcb0>,
 'get_random_number': <__main__.Tool at 0x146aeb690>}

### Basic One-Step Examples

In [None]:
# Reset Memory
agent.history = []

In [75]:
response = agent.run("What is the current time?")
response


2025-02-17 17:47:02,200 - INFO - User query received: What is the current time?
2025-02-17 17:47:04,746 - INFO - Received response from model: <tool_call>[{"name": "get_current_time", "arguments": {}}]</tool_call>
2025-02-17 17:47:04,746 - INFO - Extracted tool calls: [{'name': 'get_current_time', 'arguments': {}}]
2025-02-17 17:47:04,746 - INFO - Executing tool: get_current_time with arguments {}
2025-02-17 17:47:04,746 - INFO - Tool execution completed. Sending tool results back to model for final response.
2025-02-17 17:47:05,952 - INFO - Final response from assistant: The current time is 17:47:04.


'The current time is 17:47:04.'

In [76]:
agent.history

[{'role': 'user', 'content': 'What is the current time?'},
 {'role': 'assistant', 'content': 'The current time is 17:47:04.'}]

In [77]:
response = agent.run("Give me a random number between 1 and 10.")
response

2025-02-17 17:47:08,796 - INFO - User query received: Give me a random number between 1 and 10.
2025-02-17 17:47:09,948 - INFO - Received response from model: The random number between 1 and 10 is 7.
2025-02-17 17:47:09,949 - INFO - No tool calls detected, returning direct response.
2025-02-17 17:47:09,949 - INFO - Returning direct response: The random number between 1 and 10 is 7.


'The random number between 1 and 10 is 7.'

In [78]:
response = agent.run("What is the capital of France?")
response

2025-02-17 17:47:11,282 - INFO - User query received: What is the capital of France?
2025-02-17 17:47:12,029 - INFO - Received response from model: The capital of France is Paris.
2025-02-17 17:47:12,029 - INFO - No tool calls detected, returning direct response.
2025-02-17 17:47:12,030 - INFO - Returning direct response: The capital of France is Paris.


'The capital of France is Paris.'

## Basic Multi-Step Example

In [79]:
# Reset Memoty
agent.history = []

# New question
response = agent.run("Tell me the current time and give me a random number from 10 to 20")
response

2025-02-17 17:47:13,567 - INFO - User query received: Tell me the current time and give me a random number from 10 to 20
2025-02-17 17:47:17,122 - INFO - Received response from model: <tool_call>[{"name": "get_current_time", "arguments": {}}, {"name": "get_random_number", "arguments": {"min": 10, "max": 20}}]</tool_call>
2025-02-17 17:47:17,122 - INFO - Extracted tool calls: [{'name': 'get_current_time', 'arguments': {}}, {'name': 'get_random_number', 'arguments': {'min': 10, 'max': 20}}]
2025-02-17 17:47:17,123 - INFO - Executing tool: get_current_time with arguments {}
2025-02-17 17:47:17,123 - INFO - Executing tool: get_random_number with arguments {'min': 10, 'max': 20}
2025-02-17 17:47:17,123 - INFO - Tool execution completed. Sending tool results back to model for final response.
2025-02-17 17:47:20,682 - INFO - Final response from assistant: Based on the original question and the tool execution results, I can provide a clear and natural response.

The current time is "17:47:17".

'Based on the original question and the tool execution results, I can provide a clear and natural response.\n\nThe current time is "17:47:17". A random number between 10 and 20 is 19.'

In [None]:
agent.history