# **Function Calling with Llama 3.1 Models**
Welcome to this notebook exploring function calling capabilities with Meta's Llama 3.1 models. Llama 3.1, the latest iteration in the Llama family, introduces native support for function calling, marking a significant advancement in the model's capabilities and potential applications.

![Llamas](imgs/llama-pic.jpeg)

### **What's New in Llama 3.1?**
Llama 3.1 brings several exciting improvements:
- Native Function Calling: Built-in support for generating structured JSON outputs that can be used with various APIs.
- Multilingual Support: Expanded language understanding across 8 languages, including English, French, German, Hindi, Italian Portuguese, Spanish, and Thai.
- Improved Performance: Benchmarking in the GPT-4+ class, competitive with both GPT-4 and Claude 3.5 Sonnet.
- Increased Context Window: The 8B and 70B models now support a 128,000 token context.

### **Function Calling Overview**
Function calling, also known as tool use, allows Llama 3.1 to interact with external tools or APIs by generating structured outputs. This capability enables more complex, multi-step interactions between the model and available tools, enhancing its problem-solving abilities and practical applications.

### **Amazon Bedrock**
![bedrock](imgs/bedrock-img.png)

Amazon Bedrock is a fully managed service that provides access to a wide range of powerful foundation models (FMs) through a unified API. It offers models from leading AI companies like Mistral, Anthropic, AI21 Labs, Cohere, Stability AI, and Amazon's own Titan models.

- **Unified API**: Bedrock provides a single API endpoint to access different models, simplifying integration and allowing developers to experiment with or switch between models with minimal code changes.
- **Serverless and Fully Managed**: As a fully managed service, Bedrock eliminates the need for users to handle infrastructure management, making it easier to build and deploy generative AI applications.
- **Model Customization**: Users can customize models with their own data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).
- **Security and Privacy**: Bedrock offers built-in security features, ensuring data privacy and compliance with various standards. It follows best practices like encrypting all data and preventing third parties from accessing user data.
- **Agents for Complex Tasks**: Bedrock allows the creation of agents that can plan and execute multi-step tasks using enterprise systems and data sources.
- **Integration with AWS Services**: Bedrock can be easily integrated with other AWS services and existing applications.
- **Model Evaluation**: A new capability that helps customers assess, compare, and select the best model for their application.
- **Guardrails**: Bedrock provides tools to implement safeguards tailored to application needs and aligned with responsible AI policies.
- **Custom Model Import**: A new feature that allows customers to import and access their own custom models as a fully managed API in Bedrock.
- **Playground Environment**: Bedrock offers Playgrounds (text, chat, and image) to compare models and experiment with different options

## Preview in the Playground

We can use the Amazon Bedrock chat playground with Llama 3.1 70b Instruct model as a quick way to demonstrate function calling.  There are two ways of invoking function calling: one is the ```Python``` way and the other is the ```JSON``` way.  The essential difference between the two is that in the ```Python``` method, the Llama 3.1 models have been trained to recognize a small number of functions natively.  These functions are: ```wolfram_alpha```, ```brave_search``` and ```code_interpreter```.  In other words the models have been trained to recognize when they should call and how to call functions for computation, internet search and code.  They also have been trained to respond in a **zero shot** fashion to tool definitions they have not seen before.  This is the ```JSON``` method.  Both methods will be demonstrated in this notebook. 

### Python

![Python function calling](imgs/python-tool-calling-chat-playground.png)
Note the use of the system prompt and the parsimonious nature of the declaration to make the model aware of the tools it can use.  Note also the format of the response, including the new special tag ```<|python_tag|>``` which signifies a tool call.

### JSON

![JSON function calling](imgs/json-tool-calling-chat-playground.png)
Note the use of the user prompt to make the model aware of the tools it can use.  The tool declaration can be placed in the system prompt too and developers will have to test which is most effective.  Note also the format of the response.

At the time of developing this notebook the chat playgrounds used the ```ConverseStream``` API's.

## Brave Search

Brave Search is an independent browser which does not track user activity.  Brave Search API's have free tiers, although you must provide payment details. In order to send requests to the API, we need to create an account and get the API token.

1) To get API key we will first register for an account [Register for an Account](https://api.search.brave.com/register). 
2) After the account is created, you can navigate to the [Subscriptions page](https://api.search.brave.com/app/subscriptions/subscribe?tab=ai) where you should select the **Free AI** subscription.  You are required to enter payment details.
3) Having created a subscription then navigate to the [API Keys](https://api.search.brave.com/app/keys) and generate an API Key, selecting the **Free AI** subscription you just created. After that you will have access to a token, which you will use for this notebook.

To begin with, let's check that we can call the Brave Search API and familiarise ourselves with the structure of the response.

In [None]:
%pip install boto3==1.35.11 colorama==0.4.6 brave-search==0.1.8 --quiet

In [None]:
import io
import os
import re
import json
import gzip
from typing import List


import boto3
import getpass
import requests
import urllib.parse
from PIL import Image
from colorama import Fore
from datetime import datetime
from botocore.exceptions import ClientError
from brave import Brave


session = boto3.Session()
region = 'us-west-2'

Before you continue, please add your Brave Search API token below:

In [None]:
api_token = getpass.getpass('Please enter your brave-search-api-token')

In [None]:
# We intend to ask the model a question where the knowledge cannot be in the model's training data.
# In other words, something that has changed recently.
# We will use that question, un-modified, as the query to the search engine.
# You will notice when we integrate with Bedrock that the model will decide the exact search engine query text.
question = "What are the names of the candidates who will contest the US Presidential Election on Tuesday November 5 2024?"

print(f"{Fore.YELLOW}Question: {question}")

query = urllib.parse.quote_plus(question)
safesearch=urllib.parse.quote_plus('strict')
limit_results = 2
query_url = f"https://api.search.brave.com/res/v1/web/search?q={query}&count={limit_results}&safesearch={safesearch}"
headers = {
        "Accept": "application/json",
        "Accept-Encoding": "gzip",
        "X-Subscription-Token": api_token
    }

request = requests.get(query_url, headers=headers)
content = request.content

# Check if the content starts with the gzip magic number
if content.startswith(b'\x1f\x8b'):
    try:
        decompressed_response = gzip.decompress(content)
    except gzip.BadGzipFile:
        # If decompression fails, use the original content
        decompressed_response = content
else:
    # If it's not gzip, use the original content
    decompressed_response = content

# Parse the JSON response
try:
    search_results = json.loads(decompressed_response)
except json.JSONDecodeError:
    print("Failed to parse JSON response")
    search_results = None

# Check if we have valid results
if search_results and 'web' in search_results and 'results' in search_results['web']:
    # Iterate through the search results
    for result in search_results['web']['results']:
        # Access different fields of each result
        title = result.get('title', 'No title')
        url = result.get('url', 'No URL')
        description = result.get('description', 'No description')

        # Print or process the data as needed
        print(f"{Fore.LIGHTBLUE_EX}==================")
        print(f"Title: {title}")
        print(f"URL: {url}")
        print(f"Description: {description}")

else:
    print(f"{Fore.LIGHTRED_EX}===============")
    print("No valid search results found in the response")


There is also a Python package for the Brave Search API which helps to make the request and parse the results.  The following code snippet shows the use of that library to display the same information as above.

In [None]:
brave = Brave(api_token)

question = "What are the names of the candidates who will contest the US Presidential Election on Tuesday November 5 2024?"

print(f"{Fore.YELLOW}Question: {question}")

limit_results = 2

search_results = brave.search(q=question, count=limit_results)

for result in search_results.web_results:
    # Access different fields of each result
    title = result["title"]
    url = result["url"]
    description = result["description"]

    # Print or process the data as needed
    print(f"{Fore.LIGHTBLUE_EX}==================")
    print(f"Title: {title}")
    print(f"URL: {url}")
    print(f"Description: {description}")


Documentation for the Web Search can be found here: https://api.search.brave.com/app/documentation/web-search/get-started.

For the avoidance of doubt, Brave Search is returning a set of results from which we are extracting: title; url and description.  This is not a coherent natural language response to our question but rather, information which will augment the foundation model's answer.  Let's integrate the use of Brave Search into Amazon Bedrock to see how the model uses the tool.  

## Integrating into Bedrock

Now that we know how to use the Brave Search API we can integrate it into Amazon Bedrock using the [Amazon Bedrock Converse API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html). We will do this in 2 ways. Firstly, we will use the ```function``` calling capability inherent in the Llama 3.1 models.  We've previously referred to this as the ```Python``` method.  Secondly, we will use the Bedrock ```tool``` calling method and rely on the **zero shot** capability of the model.  We've previously referred to this as the ```JSON``` method.

The essential difference is how the LLM notifies the executor that it would like to call a function and then how the executor threads the function call completion back into the conversation.

Let's go ahead and set up the conversation with the model in Amazon Bedrock.  Currently the Llama 3.1 models are only available in the 'us-west-2' region.

In [None]:
modelId = 'meta.llama3-1-70b-instruct-v1:0'
# modelId = 'meta.llama3-1-405b-instruct-v1:0'
# modelId = 'meta.llama3-1-8b-instruct-v1:0'

bedrock_client = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

### Llama Function Calling - Python Method


In [None]:
question = "What are the names of the candidates who will contest the US Presidential Election on Tuesday November 5 2024?"
system_prompt = "Environment: ipython /r/n " \
                "Tools: brave_search /r/n" \
                "Cutting Knowledge Date: December 2023 /r/n" \
                "Today Date: 23 July 2024 /r/n" \
                "You are a helpful assistant." 
    
system = [{"text": system_prompt}]
messages = [{"role": "user", "content": [{"text": question}]}]

converse_api_params = {
    "modelId": modelId,
    "system": system,
    "messages": messages,
    "inferenceConfig": {"temperature": 0.0, "maxTokens": 2048},
}

response = bedrock_client.converse(**converse_api_params)
print(f"{Fore.GREEN}Response: {response['output']['message']['content'][0]['text']}")

We can parse the text of the response, recognize the ```<|python_tag|>``` and forward the call to the ```executor```.  Let's write a simple ```executor``` first.

### 1. Writing the executor
The ```executor``` will parse the string response from the LLM which represents a function call and map it to the internal executable representation which will make the call and return the results. 

In [None]:
class BraveSearchPython:
    api_token = None
    args = {}
    
    def __init__(self, token, function_call):
        self.api_token = token
        string = function_call.split("(")[1].split(")")[0]
        for match in re.findall(r"(\w+)=(?:'((?:[^'\\]|\\.)*)'"
                                r"|\"((?:[^\"\\]|\\.)*)\"|(\d+))", string):
            key = match[0]
            value = match[1] if match[1] else match[2] if match[2] else int(match[3])
            if value and "'" in value:
                value = value.replace("\\'", "'")
            if value and '"' in value:
                value = value.replace('\\"', '"')
            self.args[key] = value
        self.args["count"] = 2
        
    def call(self, query, count) -> []:

        brave = Brave(self.api_token)
        search_results = brave.search(q=query, count=count)
        results = []
        
        for result in search_results.web_results:
            results.append({'title': result["title"], 'url': result["url"], 'description': result["description"][:200]})

        return results


### 2. Initialising the executor and calling Amazon Bedrock
If the response begins with ```<|python_tag|>``` then gather the function call and pass it to the ```executor```. 

In [None]:
function_invocation_response = (response['output']['message']['content'][0]['text'])
messages.append(response['output']['message'])

if function_invocation_response.find("<|python_tag|>") > 0:
    function_call = function_invocation_response[len("<|python_tag|>")+2:].strip()
else:
    function_call = None

if function_call:
    pythonsearch = BraveSearchPython(api_token, function_call)
    tool_response = getattr(pythonsearch, 'call')(**pythonsearch.args)
    # note the use of the user role here when it is actually the assistant
    messages.append({"role": "user", "content": [{"text": str(tool_response)}]})


print(f"{Fore.BLUE}Messages:\n")
for m in messages:
    print(f"{m}")

response = bedrock_client.converse(**converse_api_params)

print(f"{Fore.GREEN}Response: {response['output']['message']['content'][0]['text']}")

You can see there are some implementation details that are sub-optimal using this method.  Firstly, parsing the ```<|python_tag|>``` is inelegant and the code is difficult to understand.  Secondly, the turn choreography forces us to either mis-represent the tool results as having been provided by the ```user``` or not append the message with the ```function_invocation_response``` to the messages collection, mis-representing the flow of the conversation.  Neither of which are ideal.  We can try again by injecting Llama 3.1 special tokens into the messages.  In particular we will use the ```<eom_id>``` [end of message](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/#built-in-python-based-tool-calling) tag which indicates the ```assistant``` is engaged in multi-step reasoning and therefore can support multiple contiguous ```assistant``` messages.

### 3. Threading the Llama 3.1 Special Tokens
We will clear out all but the original ```user``` message from the message collection and add an edited first ```assistant``` message: the message invoking the tool.  We'll edit that by adding the ```<eom_id>``` tag and then play the conversation forward but, faithfully adding the result of calling the tool as an ```assistant``` message.

In [None]:
messages = messages[:1]
special_message = function_invocation_response + '<|eom_id|>'
messages.append({"role": "assistant", "content": [{"text": special_message}]})

function_call = function_invocation_response[len("<|python_tag|>")+2:].strip()
pythonsearch = BraveSearchPython(api_token, function_call)
tool_response = getattr(pythonsearch, 'call')(**pythonsearch.args)
messages.append({"role": "assistant", "content": [{"text": str(tool_response)}]})

print(f"{Fore.BLUE}Messages:\n")
for m in messages:
    print(f"{m}")

response = bedrock_client.converse(**converse_api_params)

print(f"{Fore.GREEN}Response: {response['output']['message']['content'][0]['text']}")

## Llama Function Calling - JSON Method

The other way to use function calling in Llama 3.1 is the JSON method.  When we previewed in the console we provided a JSON structure in the user message which described the tool use.  The Amazon Bedrock native way of doing this is to provide the JSON definition in the ```toolConfig``` and make use of the [Amazon Bedrock Tool Use capability](https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html) in the Amazon Bedrock Converse API.  That is what we will do now.

### 1. Writing the tool definition
Amazon Bedrock tool use requires a JSON Schema file which describes the function which can be called and its arguments.  In this definition there is a single required argument and a single optional argument.  Note that there are a lot of similarities between the [JSON that Amazon Bedrock requires](https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use-inference-call.html) to recognize the tool and the [JSON that Llama 3.1 models require](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/#json-based-tool-calling) but, they are not exactly the same.  

In [None]:
toolConfig = {
  "tools": [
    {
      "toolSpec": {
        "name": "brave_search",
        "description": "search the internet using the brave web search api",
        "inputSchema": {
          "json": {
            "type": "object",
            "properties": {
              "query": {
                "type": "string",
                "description": "User's question"
              },
                "count" : {
                    "type": "number",
                    "description": "the maximum number of results to return"
                }
            },
            "required": ["query"]
          }
        }
      }
    }
  ]
}

### 2. Passing the tool definition to Amazon Bedrock
Passing the tool to Amazon Bedrock is a matter of including the ```toolConfig``` in the parameters for the Converse API.  In the presence of the tool, the model formulates a response to the question that is effectively metadata which will be converted to use of the tool by the executor.  Note that the query passed to the tool is not the question the user asked the foundation model.  The model has extracted the salient details for the search engine.

In [None]:
question = 'What are the names of the candidates who will contest the US Presidential Election on Tuesday November 5 2024?'

system_prompt = "Environment: ipython /r/n " \
                "Cutting Knowledge Date: December 2023 /r/n" \
                "Today Date: 22 September 2024 /r/n" \
                "You are a helpful assistant." \
                "If you need to consult external sources please limit yourself to 5"
    
system = [{"text": system_prompt}]
messages = [{"role": "user", "content": [{"text": question}]}]

converse_api_params = {
    "modelId": modelId,
    "system": system,
    "messages": messages,
    "inferenceConfig": {"temperature": 0.0, "maxTokens": 2048},
    "toolConfig": toolConfig
}

response = bedrock_client.converse(**converse_api_params)
print(f"{Fore.GREEN}Response: {response['output']['message']['content'][0]}")

### 3. Writing the executor
The ```executor``` will parse the JSON response from the LLM which represents a function call and map it to the internal executable representation which will make the call and return the results. We've already written one for the ```Python``` method so we are familiar with the process.

In [None]:
class BraveSearchJson:
    api_token = None
    toolUseId = None
    args = {}
    
    def __init__(self, token, function_call):
        self.api_token = token
        self.toolUseId = function_call['toolUseId']
        self.args = function_call['input']
        
    def call(self, query, count) -> []:

        brave = Brave(self.api_token)
        search_results = brave.search(q=query, count=int(count))
        results = []
        
        for result in search_results.web_results:
            results.append({'title': result["title"], 'url': result["url"], 'description': result["description"][:200]})

        return results


### 4. Initialising the executor and calling Amazon Bedrock
If the response is of type ```toolUse``` then gather the function call and pass it to the ```executor```. 

In [None]:
tool_invocation_response = (response['output']['message']['content'][0]['toolUse'])
messages.append(response['output']['message'])

print(tool_invocation_response)
if tool_invocation_response:
    jsonsearch = BraveSearchJson(api_token, tool_invocation_response)
else:
    jsonsearch = None

if jsonsearch:    
    tool_response = getattr(jsonsearch, 'call')(**jsonsearch.args)
    #note the use of the user role to return the tool use result
    messages.append(
                    {
                        "role": "user",
                        "content": [
                            {
                                'toolResult': {
                                    'toolUseId': jsonsearch.toolUseId,
                                    'content': [
                                        {
                                            "text": str(tool_response)
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                )


print(f"{Fore.BLUE}Messages:\n")
for m in messages:
    print(f"{m}")

response = bedrock_client.converse(**converse_api_params)

print(f"{Fore.GREEN}Response: {response['output']['message']['content'][0]['text']}")

There are a number of things to note here.  Firstly, there appears to be the same problem with turn choreography as we encountered before.  Unfortunately it is not possible to add the ```<|eom_id|>``` tag in the ```toolUse``` response and so we must live with the mis-representation of the ```toolResult``` coming from the user.  What's also interesting to note is that we seem to be able to specify the number of results we would like back from Brave Search by using the system prompt.