# Run a "small" reasoning model with tool calling

In the previous notebook, you have already become familiar with Nanbeige.

As reasoning models are often used in agentic scenarios (as their conclusions are much
more likely to be true), tool calling can extend this functionality and avoid loops.

This notebook shows how we can tell the model that a tool should be called. We don't
perform the actual call, but it would be easy to accomplish this.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
model_name = "Nanbeige/Nanbeige4.1-3B"

In [None]:
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    use_fast=False,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

In [None]:
!nvidia-smi

In [None]:
messages = [
    {'role': 'user',  'content': 'I want to find a course about reasoning language models!'}
]

This is a list of tools which are available to the LLM. The format does not depend on the
actual LLM, but is more or less standardized.

Don't be afraid, even if the mentioned function exists, the LLM won't automatically call
the function. Rather, it finds out (by reasoning) whether the function could help in
solving the problem. If yes, it suggest calling the function *including its arguments*.

In [None]:
tools = [{'type': 'function',
  'function': {'name': 'SearchOReillyCourses',
   'description': 'Find O\'Reilly courses about a certain topic.',
   'parameters': {'type': 'dict',
    'properties': {'topic': {'type': 'string',
      'description': 'A course topic for searching O\'Reilly\'s online library.'},
    'required': ['topic']}}}}]

The actual function call is what you have to perform on your own. Therefore, a suitable
output format is crucial in this case. This is why we add `return_dict=True` which
makes parsing a suggested function call much easier.

In [None]:
text = tokenizer.apply_chat_template(
    messages,
    tools,
    tokenize=False,
    add_generation_prompt=True,
    return_dict=True,
    enable_thinking=True
)

Note the Chinese system prompt, which we could change!

In [None]:
text

In [None]:
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

In [None]:
%%time 
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)

In [None]:
# only read output, skip input
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

In [None]:
len(output_ids)

Within the `<tool_call>` container you can see what the LLM suggested:

In [None]:
tokenizer.decode(output_ids)

In [None]:
from IPython.display import display, Markdown
display(Markdown(tokenizer.decode(output_ids)))