# Local AI Agent from Scratch

This notebook demonstrates how to build a simple function-calling AI agent from scratch using the OpenAI-compatible API format and a local vLLM inference server. It shows how to structure prompts, parse model outputs, and route function calls.

> **NOTE:** Before starting, launch a vLLM server in a separate terminal using your model of choice: `vllm serve <model>`

I am using [Salesforce/xLAM-2-3b-fc-r](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r) due to its small size and high ranking on the [Berkeley Function-Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html).  The model is running locally on my Nvidia RTX 3060.

Launch command: `vllm serve Salesforce/xLAM-2-3b-fc-r --enable-auto-tool-choice --tool-parser-plugin ./xlam_tool_call_parser.py --tool-call-parser xlam --tensor-parallel-size 1 --dtype float16 --gpu-memory-utilization 0.8`

The full vLLM launch instructions for this particular model can be found in the [_**Using vLLM for Inference**_ section](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r#using-vllm-for-inference) on the model's HugginngFace page.

In [1]:
import json
import os

from openai import OpenAI
import requests

### Set vLLM endpoint URL and create client

In [2]:
client = OpenAI(base_url="http://localhost:8000/v1", api_key="null")
models = client.models.list()
model_id = models.data[0].id
print(model_id)

Salesforce/xLAM-2-3b-fc-r


### Test Chat completion

In [3]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "List three NBA teams"},
]

chat_completion = client.chat.completions.create(
    messages=messages,
    model=model_id,
    stream=True
)

for chunk in chat_completion:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Here are three NBA teams:
1. Los Angeles Lakers
2. Boston Celtics
3. Golden State Warriors

# Control Flow Overview
**1.** The user submits a query to the LLM.

**2.** The LLM determines if a tool call is necessary.  If so, it will decide which function to call and specify the input parameters. 

- The model will only return this information, it will not actually call the function itself.
    - For example, if the query is `"What's the weather like in SF?"` the LLM should specify to call the function named `get_current_weather` with the parameters `{"city": "San Francisco", "state": "CA"}`

**3.** We use the JSON function call to run the actual `get_current_weather` function with the LLM's provided arguments.

**4.** The tool call and context is passed back to the LLM to generate a final response.

### Define tools in OpenAI function calling schema

Two functions schemas are defined: `get_current_weather` and `send_email`.  Each function schema tells the model what the function does and what input arguments are expected. More details on function schemas can be found in the [OpenAI API docs](https://platform.openai.com/docs/guides/function-calling#defining-functions).

In [4]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'",
                    },
                    "state": {
                        "type": "string",
                        "description": "the two-letter abbreviation for the state that the city is"
                        " in, e.g. 'CA' which would mean 'California'",
                    }
                },
                "required": ["city", "state"],
                "additionalProperties": False
            },
            "strict": True
        },
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email to a given recipient with a subject and message.",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {
                        "type": "string",
                        "description": "The recipient email address."
                    },
                    "subject": {
                        "type": "string",
                        "description": "Email subject line."
                    },
                    "body": {
                        "type": "string",
                        "description": "Body of the email message."
                    }
                },
                "required": [
                    "to",
                    "subject",
                    "body"
                ],
                "additionalProperties": False
            },
            "strict": True
        }
    }
]

### Python function definitions for tool execution

The actual Python functions that perform the tasks described in the tool specifications, fetching current weather data and sending an email, are defined here. These functions are called by the model when a tool is invoked.

_These are only dummy functions for simplicity in this example, but in a real use-case these functions would call an API or other method to actually perform an external task._


In [5]:
def get_current_weather(city: str, state: str) -> str:
    # call weather API...
    return f"It is 97 degrees Fahrenheit in {city}."

In [6]:
def send_email(to: str, subject: str, body: str) -> str:
    # actually send email...
    return f"Successfully sent the following email: To: {to}\nSubject: {subject}\n{body}"

### Create system message and user query
Messages are defined as dictionaries: `role` indicates who is speaking (e.g. system, user, assistant), and `content` contains what they say.

The system message will instruct the LLM on how to act and respond.

In [7]:
system_message = {
    "role": "system", 
    "content": "You are a helpful assistant."
}

user_message = {
    "role": "user", 
    "content": "Should I dress for hot or cold weather in SF today?"
}

messages=[
    system_message,
    user_message
]

### Call completion to get tool call parameters

In [8]:
tool_call_response = client.chat.completions.create(
    model=model_id,
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

### Run function with tool call parameters and give tool call + tool output back to model as context

In [9]:
messages.append({
    "role": "assistant",
    "tool_calls": tool_call_response.choices[0].message.tool_calls
})

for tool_call in tool_call_response.choices[0].message.tool_calls:
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    result = eval(name)(**args)

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": str(result)
    })

for m in messages:
    print(m)

{'role': 'system', 'content': 'You are a helpful assistant.'}
{'role': 'user', 'content': 'Should I dress for hot or cold weather in SF today?'}
{'role': 'assistant', 'tool_calls': [ChatCompletionMessageToolCall(id='call_0_f5beb8df42944c9abf139b7f59b452fa', function=Function(arguments='{"city": "San Francisco", "state": "CA"}', name='get_current_weather'), type='function')]}
{'role': 'tool', 'tool_call_id': 'call_0_f5beb8df42944c9abf139b7f59b452fa', 'content': 'It is 97 degrees Fahrenheit in San Francisco.'}


### Run chat completion for final response

In [10]:
stream = client.chat.completions.create(
    model=model_id,
    messages=messages,
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Based on the current temperature of 97 degrees Fahrenheit in San Francisco, you should dress for warm weather.

#### References

- https://docs.vllm.ai/en/stable/examples/online_serving/openai_chat_completion_client_with_tools.html
- https://platform.openai.com/docs/guides/function-calling
- https://huggingface.co/Salesforce/xLAM-2-3b-fc-r