##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemma - 3 Function calling.

Google released Gemma-3 its new variant that comes with Multimodality, Multilingual support upto 140 global languages, 128K context window and also Agentic capabilites like structured responses and function calling. In this notebook, we will explore how to use Function calling with open source models from HuggingFace via Gemma 3 4B-it model

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_3]Function_Calling_with_HF.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

### Gemma-3 setup

To complete this tutorial, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:

* Get access to Gemma on kaggle.com.
* Select a Colab runtime with sufficient resources to run
  the Gemma 3 4B-it model.
* Generate and configure a Kaggle username and an API key as Colab secrets.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.


### Install dependencies

Run the cell below to install all the required dependencies.

- Transformers: Latest version that supports Gemma3
- Accelerate
- DuckDuck Go Search - for function calling

In [1]:
!pip install git+https://github.com/huggingface/transformers.git

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-x4jvhp7j
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-x4jvhp7j
  Resolved https://github.com/huggingface/transformers.git to commit 44715225e3169a5b57645c3a81f8d791cf67154b
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.51.0.dev0-py3-none-any.whl size=11067326 sha256=8cf2a34d40ce8bba53d775f1701f8ee4dc41faabee4454bc8f527a04e22fc176
  Stored in directory: /tmp/pip-ephem-wheel-cache-y0dzymau/wheels/32/4b/78/f195c684dd3a9ed21f3b39fe8f85b48df7918581b6437be143
Successfully b

In [2]:
!pip install accelerate

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [3]:
!pip install duckduckgo-search

Collecting duckduckgo-search
  Downloading duckduckgo_search-7.5.4-py3-none-any.whl.metadata (17 kB)
Collecting primp>=0.14.0 (from duckduckgo-search)
  Downloading primp-0.14.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading duckduckgo_search-7.5.4-py3-none-any.whl (20 kB)
Downloading primp-0.14.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m3.3/3.3 MB[0m [31m153.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m80.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: primp, duckduckgo-search
Successfully installed duckduckgo-search-7.5.4 primp-0.14.0


# Gemma 3

Google has released Gemma-3, a successor to its earlier Gemma models, focusing on open-source accessibility and multimodal vision capabilities. The model is designed to handle complex tasks with extended context lengths, support for multiple languages, and vision capabilities, making it suitable for a wide range of applications.

Gemma-3 has gained attention for outperforming notable models like DeepSeek V3, o3-mini, and Llama 3 405B, with extended context lengths, multilingual support, and multimodal capabilities.

- Extended context length from 32K to 128K using ROPE scaling (from 10K to 1M) on global self-attention layers.
- Available in 1B, 4B, 12B, and 27B sizes (Vision capability included, except for 1B, with bi-directional attention).
- Supports an extensive range of 140 global languages.
- Additional Features: Supports function calling, where the model can generate code or instructions to call external functions, and structured outputs, enabling it to produce data in specific formats like JSON or tables.

## Load Model weights

In [9]:
import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from google.colab import userdata


In [5]:
model_id = "google/gemma-3-4b-it"

In [10]:
HF_API_KEY = userdata.get('HF_API_KEY') #use colab secret
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto",token=HF_API_KEY
).eval()

processor = AutoProcessor.from_pretrained(model_id,token=HF_API_KEY)

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/90.6k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.64G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/215 [00:00<?, ?B/s]



preprocessor_config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json:   0%|          | 0.00/1.16M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

In [11]:
torch.cuda.empty_cache()

## Function Calling Gemma

Function calling is a feature that allows large language models to interact with external systems, APIs, and tools in a structured way. This feature enables LLMs to not just generate text responses but to invoke specific functions when appropriate based on user queries. At its core, function calling involves describing functions to the LLM and having it intelligently return the function to call along with necessary arguments for particular tasks. For example, an LLM can be configured to retrieve stock price data by calling a specific function when a user asks about the current price of a particular stock.

In [12]:
from duckduckgo_search import DDGS

def search(query:str):
  """
  search results to the user query

  Args:
      query: user prompt to fetch search results
  """
  req = DDGS()
  response = req.text(query,max_results=4)
  context = ""
  for result in response:
    context += result['body']
  return context

In [13]:
tools = [search]

In [14]:
query = "which teams played Cricket ICC Champions Trophy 2025 finals and who won?" # this is a realtime query, this event occured on March 9th 2025

## Define prompt for function calling

Function Calling follows these steps:

- Your application sends a prompt to the LLM along with function definitions
- The LLM analyzes the prompt and decides whether to respond directly or use defined functions
- If using functions, the LLM generates structured arguments for the function call
- Your application receives the function call details and executes the actual function
- The function results are sent back to the LLM
- The LLM provides a final response incorporating the function results

In [15]:
conversation = [
    {
        "role": "system",
        "content": [{"type": "text", "text": """
        You are an expert search assistant. Use `search` to get the most accurate details.
        At each turn, if you decide to invoke any of the function(s), it should be wrapped with ```tool_code```. The python methods described below are imported and available,
        you can only use defined methods. The generated code should be readable and efficient. The response to a method will be wrapped in ```tool_output``` use it to call more tools or generate a helpful, friendly response.
        When using a ```tool_call``` think step by step why and how it should be used.

        The following Python methods are available:
        \`\`\`python
        def search(query:str):
          "
          search results to the user query

          Args:
             query: user prompt to fetch search results
          "
        \`\`\`

        User: \{user_message\}
        """}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": query}
        ]
    }
]

In [16]:
inputs = processor.apply_chat_template(
            conversation,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            tokenize=True,
            return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

Keyword argument `audios` is not a valid argument for this processor and will be ignored.


In [17]:
outputs = model.generate(**inputs, max_new_tokens=512)

In [18]:
output = processor.decode(outputs[0], skip_special_tokens=True)

In [19]:
response = output.split("model")[-1]

In [20]:
print(response)


I need to find out which teams played in the final of the ICC Champions Trophy 2025 and who won. Let's use the search tool to get this information.
```tool_code
search(query="ICC Champions Trophy 2025 final teams and winner")
```


## Execute the function calling code

In [23]:
def extract_tool_call(text):
    import io
    import re
    from contextlib import redirect_stdout

    pattern = r"```tool_code\s*(.*?)\s*```"
    match = re.search(pattern, text, re.DOTALL)
    if match:
        code = match.group(1).strip()
        # Capture stdout in a string buffer
        f = io.StringIO()
        with redirect_stdout(f):
            result = eval(code)
        output = f.getvalue()
        r = result if output == '' else output
        return f'```tool_output\n{r}\n```'''
    return None

In [24]:
keyword = extract_tool_call(response)

In [25]:
keyword

"```tool_output\nOfficial ICC Champions Trophy, 2025 Cricket website - live matches, scores, news, highlights, commentary, rankings, videos and fixtures from the International Cricket Council. ... ICC Men's Player of the Month winner for February 2025 named 12 March, 2025. ICC Champions Trophy, 2025 ... Kohli stars as India march into final | Champions Trophy ...The 2025 ICC Champions Trophy final was a One Day International (ODI) cricket match played at the Dubai International Cricket Stadium on 9 March 2025 to determine the winner of 2025 ICC Champions Trophy.It was played between India and New Zealand.It was the second time that India and New Zealand played a Champions Trophy final against each other, after the 2000 final.The ninth edition of the Champions Trophy 2025 saw India being crowned as the winners on 9th March 2025 after they overcame New Zealand in the final. Several exceptional performers lit up the tournament with the bat and ball. The best of them made it to the Team of

In [26]:
prompt = [{
        "role": "user",
        "content": [
            {"type": "text", "text": f"Results from the search:{keyword} User_query : {query}"}
        ]
    }
]

In [27]:
final_prompt = processor.apply_chat_template(
            prompt,
            tools=tools,
            add_generation_prompt=True,
            return_dict=True,
            tokenize=True,
            return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

In [28]:
response = model.generate(**final_prompt, max_new_tokens=512)

In [29]:
print(processor.decode(response[0], skip_special_tokens=True))

user
Results from the search:```tool_output
Official ICC Champions Trophy, 2025 Cricket website - live matches, scores, news, highlights, commentary, rankings, videos and fixtures from the International Cricket Council. ... ICC Men's Player of the Month winner for February 2025 named 12 March, 2025. ICC Champions Trophy, 2025 ... Kohli stars as India march into final | Champions Trophy ...The 2025 ICC Champions Trophy final was a One Day International (ODI) cricket match played at the Dubai International Cricket Stadium on 9 March 2025 to determine the winner of 2025 ICC Champions Trophy.It was played between India and New Zealand.It was the second time that India and New Zealand played a Champions Trophy final against each other, after the 2000 final.The ninth edition of the Champions Trophy 2025 saw India being crowned as the winners on 9th March 2025 after they overcame New Zealand in the final. Several exceptional performers lit up the tournament with the bat and ball. The best of 