# Llama 3.1 Tool Calling 

![](./assets-resources/llama-tool-calling-flow.png)

In [15]:
!pip install -q -U huggingface_hub transformers accelerate bitsandbytes huggingface

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


^C


In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [3]:
PROMPT = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Environment: ipython
Tools: brave_search
Cutting Knowledge Date: December 2023
Today Date: 25 Jul 2024

You are a helpful assistant<|eot_id|>
<|start_header_id|>user<|end_header_id|>

Who won the T20 World Cup?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

In [6]:
input_ids = tokenizer(PROMPT, return_tensors="pt")
response = model.generate(**input_ids, max_length=512)
extracted_text = tokenizer.decode(response[0])
print("************")
print(extracted_text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


************
<|begin_of_text|>
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Environment: ipython
Tools: brave_search
Cutting Knowledge Date: December 2023
Today Date: 25 Jul 2024

You are a helpful assistant<|eot_id|>
<|start_header_id|>user<|end_header_id|>

Who won the T20 World Cup?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<|python_tag|>brave_search.call(query="T20 World Cup winner")<|eom_id|>


In [7]:
def extract_query(input_string):
    start_index = input_string.find('=') + 1
    end_index = input_string.find(')')
    query = input_string[start_index:end_index]
    return query.strip('"')

input_string = '<|python_tag|>brave_search.call(query="T20 World Cup winner")'
print(extract_query(input_string))

T20 World Cup winner


In [9]:
# !pip install duckduckgo-search
from duckduckgo_search import DDGS
import json

tool_args = extract_query(extracted_text)
web_tool_response = json.dumps(DDGS().text(tool_args, max_results=5))
print(web_tool_response)

[{"title": "T20 World Cup Winners List from 2007-2024 - Sportskeeda", "href": "https://www.sportskeeda.com/cricket/t20-world-cup-winners", "body": "Find out which teams have won the T20 World Cup since 2007 and how they achieved their glory. See the details of each edition, including the final results, margins, players, venues, and hosts."}, {"title": "ICC Men's T20 World Cup - Wikipedia", "href": "https://en.wikipedia.org/wiki/ICC_Men's_T20_World_Cup", "body": "The ICC Men's T20 World Cup (formerly the ICC World Twenty20) is a biennial Twenty20 International cricket tournament, ... For the first time, a host nation competed in the final of the ICC World Twenty20. There were 12 participants for the title including Ireland and Afghanistan as 2012 ICC World Twenty20 Qualifier. It was the first time the ..."}, {"title": "India wins men's T20 World Cup, defeating South Africa in dramatic final", "href": "https://www.cnn.com/2024/06/29/sport/india-south-africa-t20-mens-world-cup-final-spt-i

# Putting it all together

**Optional Exercise:**

Integrating into a FastAPI app?

In [None]:
# from fastapi import FastAPI, Request
# import uvicorn

# app = FastAPI()

# @app.post("/chat")
# async def chat(request: Request):
#     data = await request.json()
#     user_message = data['message']
#     prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#     input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
#     response = model.generate(**input_ids, max_length=512)
#     generated_text = tokenizer.decode(response[0], skip_special_tokens=True)
#     return {"response": generated_text}

# if __name__ == "__main__":
#     uvicorn.run(app, host="0.0.0.0", port=8000)