# Llama 3.1 Tool Calling 

![](./assets-resources/llama-tool-calling-flow.png)

In [15]:
!pip install -q -U transformers accelerate bitsandbytes huggingface

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


^C


In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [2]:
PROMPT = """ <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an information extraction tool. Extract all names and dates mentioned in the following text. <|eot_id|> <|start_header_id|>user<|end_header_id|> Text: "John Doe was born on January 1, 1990. Jane Smith graduated on June 15, 2010." <|eot_id|> <|start_header_id|>assistant<|end_header_id|> """

In [3]:
input_ids = tokenizer(PROMPT, return_tensors="pt")
response = model.generate(**input_ids, max_length=512)
extracted_text = tokenizer.decode(response[0], skip_special_tokens=True)
print(extracted_text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 system You are an information extraction tool. Extract all names and dates mentioned in the following text.  user Text: "John Doe was born on January 1, 1990. Jane Smith graduated on June 15, 2010."  assistant  Here are the extracted names and dates:

**Names:**

1. John Doe
2. Jane Smith

**Dates:**

1. January 1, 1990
2. June 15, 2010


# Tool-Calling with Llama 3.1

Llama 3.1 can also use tool-calling capabilities to execute specific functions. For instance, you can create a function to execute Python code within a Jupyter Notebook environment. This can be useful for running more complex extraction logic or data processing scripts directly.

In [4]:
SYSTEM_PROMPT = """you are a python data scientist. You run python code to solve tasks. Execute the code in Jupyter Notebook cells."""
tools = [     {         "type": "function",         "function": {             "name": "execute_python",             "description": "Execute python code in a Jupyter notebook cell and returns any result, stdout, stderr, display_data, and error.",             "parameters": {                 "type": "object",                 "properties": {                     "code": {                         "type": "string",                         "description": "The python code to execute in a single cell.",                     }                 },                 "required": ["code"],             },         },     } ]

In [12]:
# !pip install e2b_code_interpreter

In [8]:
SYSTEM_PROMPT = """ <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an AI assistant with the ability to call external functions to get real-time data. The following functions are available for you to use:  - `get_weather`: Gets the current weather for a given location.   Parameters:   - `location`: The city to get the weather for.  - `get_time`: Gets the current time for a given location.   Parameters:   - `location`: The city to get the time for.  To call a function, use the following syntax: <function=function_name>{"parameter1": "value1", "parameter2": "value2"}</function> <|eot_id|> """
USER_PROMPT = """ <|start_header_id|>user<|end_header_id|> What is the weather in San Francisco? <|eot_id|> """

PROMPT = SYSTEM_PROMPT + USER_PROMPT

In [9]:
def get_weather(location):
    return f"The weather in {location} is sunny."

In [10]:
input_ids = tokenizer(PROMPT, return_tensors="pt")
response = model.generate(**input_ids, max_length=512)
extracted_text = tokenizer.decode(response[0])
print(extracted_text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|> <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an AI assistant with the ability to call external functions to get real-time data. The following functions are available for you to use:  - `get_weather`: Gets the current weather for a given location.   Parameters:   - `location`: The city to get the weather for.  - `get_time`: Gets the current time for a given location.   Parameters:   - `location`: The city to get the time for.  To call a function, use the following syntax: <function=function_name>{"parameter1": "value1", "parameter2": "value2"}</function> <|eot_id|>  <|start_header_id|>user<|end_header_id|> What is the weather in San Francisco? <|eot_id|> assistant<|end_header_id|>

{get_weather:{"location": "San Francisco"}}<|eot_id|>


## Integrating into a FastAPI app

In [None]:
from fastapi import FastAPI, Request
import uvicorn

app = FastAPI()

@app.post("/chat")
async def chat(request: Request):
    data = await request.json()
    user_message = data['message']
    prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
    input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
    response = model.generate(**input_ids, max_length=512)
    generated_text = tokenizer.decode(response[0], skip_special_tokens=True)
    return {"response": generated_text}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)