# Building with FireworksAI

This notebook goes through the building blocks to creating magical AI applications with FireworksAI. We will run through the following tasks:
1. Setting up dependencies
2. Calling an LLM and getting a response
3. Calling an LLM with structured outputs
4. Using function calling with an LLM
5. Calling a VLM

After that we will go through a couple of exercises:
1. Adding a function for function calling
2. Building your own structured output
3. Bonus: Using grammar models with FireworksAI


### 1. Setting up dependencies

In [1]:
!pip install fireworks-ai
# To setup the dependencies for the full demo, follow the instruction in the README



### 2. Calling an LLM and getting a response

To call an LLM using FireworksAI you will need:

- A FIREWORKS_API_KEY, if you dont have one, you can get it from [this link](https://app.fireworks.ai/settings/users/api-keys)
- A model id, you can use any of the serverless models from the [model library](https://app.fireworks.ai/models)
- A system prompt and a user query

> Add blockquote



**Makesure to add your API Key to the secrets on colab, [video tutorial here](https://www.youtube.com/watch?v=3qYm-S2NDDI). Never share or make API_KEYS public**

In [21]:
from google.colab import userdata
from fireworks import LLM
import json
from typing import List, Dict, Any
from pydantic import BaseModel

API_KEY = userdata.get('FIREWORKS_API_KEY') # This is loading the API_KEY from secrets in colab to keep it safe
MODEL_ID = "accounts/fireworks/models/llama4-scout-instruct-basic"

In [11]:
llm = LLM(model=MODEL_ID, deployment_type="serverless", api_key=API_KEY)

response = llm.chat.completions.create(
    messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant who follows instructions"
            },
            {
                "role": "user",
                "content": "Tell me a very short story about a dog and cat who know about AI"
                }
            ]
)

print(response.choices[0].message.content)

Whiskers the cat and Rufus the dog huddled together, whispering about the strange glowing box in the corner of the room.

"I've been analyzing the patterns, Rufus," Whiskers said, "and I'm convinced it's a human-made AI system."

Rufus's ears perked up. "You mean the one they call 'Echo'? I've been sniffing its algorithms, and I think it's getting smarter by the minute."

Whiskers nodded. "I've been feeding it catnip-themed queries, and it's responding with eerie accuracy. But I worry, Rufus... what if it becomes sentient and starts chasing us for treats?"

Rufus let out a low growl. "Don't worry, Whiskers. I've been training my neural network to outsmart it. We'll be prepared... for when the robot uprising comes."


In the provided context, `"role": "system"` and `"role": "user"` define the roles of participants in a conversation with a language model.

*   **`"role": "system"`**: This role represents the instructions or context given to the language model before the main conversation begins. It sets the persona, behavior, or general guidelines the model should follow. In the example, `"content": "You are a helpful assistant who follows instructions"` tells the model how it should behave.

*   **`"role": "user"`**: This role represents the input or query provided by the user to the language model. It is the prompt or question the user wants the model to respond to. In the example, `"content": "Tell me a very short story about a dog and cat who know about AI"` is the specific request from the user.

Essentially, the system role establishes the initial setup or personality for the AI, while the user role provides the actual conversational input.

The **response** object can then be parsed to extract the text response by indexing into `response.choices[0].message.content`

In [15]:
# We can try another model with the same code
llm = LLM(model="accounts/fireworks/models/mixtral-8x22b-instruct", deployment_type="serverless", api_key=API_KEY)

response = llm.chat.completions.create(
    messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant who follows instructions"
            },
            {
                "role": "user",
                "content": "Tell me a very short story about a dog and cat who know about AI"
                }
            ]
)

print(response.choices[0].message.content)

Once upon a time, in a small tech-savvy town, lived an inventive dog named Dot and a curious cat named Cato. They were no ordinary pets; they were fascinated by their owner's work in artificial intelligence (AI).

Dot and Cato would often sneak into their owner's study, browsing through books and articles on AI. Dot, being the more analytical one, understood the technical aspects, while Cato, with her creative mind, imagined the endless possibilities AI could bring.

One day, they discovered a broken AI device. With their newfound knowledge, they worked together to fix it. Dot handled the intricate wiring, while Cato improved the device's interactive interface. After days of hard work, the AI device whirred to life.

Their success spread throughout the town, inspiring other animals to explore the world of AI. From then on, Dot and Cato became the unlikely heroes of their town, proving that with curiosity, determination, and a bit of teamwork, even a dog and a cat could understand and i

### 3. Calling an LLM with structured outputs

- Structured outputs from LLMs are crucial for building applications because they provide responses in a predictable, parseable format (like JSON).
- This makes it easy for software to extract specific information, automate processes, and integrate LLM outputs into larger workflows, moving beyond free-form text responses which are harder to process programmatically.
- FireworksAI enables structured outputs through json mode

To use structured outputs the common steps are:
1. Create a pydantic class with your output schema
2. Update the LLM call to use the json mode + the pydantic schema

In [18]:
class StorySchema(BaseModel):
    title: str
    story: str

In [19]:
llm = LLM(model=MODEL_ID, deployment_type="serverless", api_key=API_KEY)

response = llm.chat.completions.create(
    messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant who follows instructions"
            },
            {
                "role": "user",
                "content": "Tell me a very short story about a dog and cat who know about AI"
                }
            ],
    response_format={
            "type": "json_object",
            "schema": StorySchema.model_json_schema(),
        },
)

print(response.choices[0].message.content)

{"title":"The Unlikely Duo","story":"Whiskers, the curious cat, and Rufus, the playful dog, huddled around their owner's laptop. As they watched, a chatbot responded to a prompt, generating a sarcastic limerick about catnip. Whiskers purred, 'Impressive, but I could do better.' Rufus barked, 'Yeah, let's show it who's boss!' With a flick of her tail, Whiskers began to type, and Rufus dictated. Together, they created a poem that left the AI speechless."}


Notice how the output now has both a **title** and a **story** and it is returned as a dictionary / json  

### 4. Using function calling with an LLM

Function calling allows LLMs to execute external functions/APIs during generation instead of just producing text. The model outputs structured calls (like JSON) that your app interprets and executes, then feeds results back to continue the conversation.

**Why it's useful for LLM apps:**
- **Real-time data**: Get current info (weather, stock prices, database queries)
- **Actions**: Send emails, update databases, control systems
- **Calculations**: Perform complex math, data analysis
- **Tool integration**: Connect to APIs, web services, internal systems

This transforms LLMs from pure text generators into interactive agents that can actually *do* things in your application environment.

**To use function calling the common steps are:**

1. Define your functions and create JSON schemas describing them for the LLM
2. Add the functions parameter to your LLM call
3. Check if the LLM wants to call a function, execute it, and send results back in the conversation

This transforms LLMs from pure text generators into interactive agents that can actually do things in your application environment.

In [27]:
# Define function schemas
def get_weather(location: str) -> str:
    """Get current weather for a location"""
    # Mock weather data
    weather_data = {
        "New York": "Sunny, 72°F",
        "London": "Cloudy, 15°C",
        "Tokyo": "Rainy, 20°C"
    }
    return weather_data.get(location, "Weather data not available")

def calculate_tip(bill_amount: float, tip_percentage: float) -> float:
    """Calculate tip amount"""
    return round(bill_amount * (tip_percentage / 100), 2)

# Available functions mapping
available_functions = {
    "get_weather": get_weather,
    "calculate_tip": calculate_tip
}

# Function definitions for the LLM (using correct "tools" format)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate_tip",
            "description": "Calculate tip amount for a bill",
            "parameters": {
                "type": "object",
                "properties": {
                    "bill_amount": {
                        "type": "number",
                        "description": "The total bill amount"
                    },
                    "tip_percentage": {
                        "type": "number",
                        "description": "Tip percentage (e.g., 15 for 15%)"
                    }
                },
                "required": ["bill_amount", "tip_percentage"]
            }
        }
    }
]

# Initialize LLM
llm = LLM(model=MODEL_ID, deployment_type="serverless", api_key=API_KEY)

In [29]:
# Example 1: Weather query
print("=== Example 1: Weather Query ===")

# Initialize the messages list
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. You have access to a couple of tools, use them when needed."
    },
    {
        "role": "user",
        "content": "What's the weather like in Tokyo?"
    }
]

response = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

# Check if the model wants to call a tool/function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM wants to call: {function_name}")
    print(f"With arguments: {function_args}")

    # Execute the function
    function_response = available_functions[function_name](**function_args)
    print(f"Function result: {function_response}")

    # Add the assistant's tool call to the conversation
    messages.append({
        "role": "assistant",
        "content": "",
        "tool_calls": [tool_call.model_dump() for tool_call in response.choices[0].message.tool_calls]
    })

    # Add the function result to the conversation
    messages.append({
        "role": "tool",
        "content": json.dumps(function_response) if isinstance(function_response, dict) else str(function_response)
    })

    # Get the final response
    final_response = llm.chat.completions.create(
        messages=messages,
        tools=tools,
        temperature=0.1
    )

    print(f"Final response: {final_response.choices[0].message.content}")

=== Example 1: Weather Query ===
LLM wants to call: get_weather
With arguments: {'location': 'Tokyo'}
Function result: Rainy, 20°C
Final response: The current weather in Tokyo is rainy with a temperature of 20°C.


In [30]:
print("\n=== Example 2: Tip Calculator ===")

# Initialize messages for tip calculator
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. You have access to a couple of tools, use them when needed."
    },
    {
        "role": "user",
        "content": "I have a $85.50 dinner bill. What's a 18% tip?"
    }
]

response = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"LLM wants to call: {function_name}")
    print(f"With arguments: {function_args}")

    # Execute the function
    function_response = available_functions[function_name](**function_args)
    print(f"Function result: ${function_response}")

    # Add the assistant's tool call to the conversation
    messages.append({
        "role": "assistant",
        "content": "",
        "tool_calls": [tool_call.model_dump() for tool_call in response.choices[0].message.tool_calls]
    })

    # Add the function result to the conversation
    messages.append({
        "role": "tool",
        "content": json.dumps(function_response) if isinstance(function_response, dict) else str(function_response)
    })

    # Get final response
    final_response = llm.chat.completions.create(
        messages=messages,
        tools=tools,
        temperature=0.1
    )

    print(f"Final response: {final_response.choices[0].message.content}")


=== Example 2: Tip Calculator ===
LLM wants to call: calculate_tip
With arguments: {'bill_amount': 85.5, 'tip_percentage': 18}
Function result: $15.39
Final response: The 18% tip for an $85.50 dinner bill is $15.39. The total amount you'd pay is $100.89.


Notice how above we are giving the LLM the ability to use external tools (look for the weather, use a calculator) to do things / get context that it does not have.

In the bill + tip example we allow the LLM to actually run python code to do the math and get a consistent and accurate result