# Streaming With Langchain

In this example, we will introduce LangChain's async streaming, allowing us to receive and view the tokens as they are generated by LLM. The use of streaming is typical in conversational interfaces and can provide a more natural experience for users.

In [2]:
import os
from getpass import getpass

os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGSMITH_ENDPOINT'] = "https://eu.api.smith.langchain.com "
os.environ['LANGSMITH_API_KEY'] =  os.getenv('LANGSMITH_API_KEY') or getpass('Enter your LangSmith API Key: ')
os.environ['LANGSMITH_PROJECT'] = 'LangChain-Streaming'

In [3]:
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY") or getpass(
    "Enter GOOGLE API Key: "
)

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",       # choose appropriate model name
    temperature=0.0,
    streaming=True
)

  from .autonotebook import tqdm as notebook_tqdm


## Streaming with `astream`

We will start by creating a aysnc stream from our LLM. We do this within an `async for` loop, allowing us to iterate through the chunks of data and use them as soon as the async `astream` method returns the tokens to us. By adding a pipe character `|` we can see the individual tokens that are generated. We set `flush` equal to `True` as this forces immediate output to the console, resulting in smoother streaming.

In [4]:
tokens = []

async for token in llm.astream("What is NLP?"):
    tokens.append(token)
    print(token.content, end='|', flush=True)

**NLP stands for Natural Language Processing.**

It's a field of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language in a way that is valuable and meaningful. Think of it as the bridge between human| communication and computer understanding.

### The Core Goal of NLP

The primary goal of NLP is to make computers capable of processing and analyzing large amounts of natural language data. This includes both written text and spoken words. Ultimately, it aims to allow computers| to "read" and "comprehend" text and speech in a similar way humans do, and to respond intelligently.

### How it Works (Simplified)

NLP combines computational linguistics (rule-based modeling of human language) with machine learning and deep| learning models. It involves breaking down language into smaller, manageable pieces and then applying algorithms to extract meaning, identify patterns, and make predictions.

### Key Tasks and Components of NLP

Since we appended each token to the `tokens` list, we can also see what is inside each and every token.

Without streaming we have to wait for the entire output to complete before we see anything

In [5]:
llm.invoke("What is NLP?")

AIMessage(content='**Natural Language Processing (NLP)** is a subfield of artificial intelligence (AI) that focuses on enabling computers to **understand, interpret, and generate human language** in a way that is both meaningful and useful.\n\nIn simpler terms, NLP is about teaching computers to "read," "understand," and "write" human languages (like English, Spanish, Chinese, etc.) just like we do.\n\nHere\'s a breakdown of what that entails:\n\n1.  **Understanding Human Language:** This is the core challenge. Human language is incredibly complex, full of nuances, ambiguities, context, slang, sarcasm, and grammatical rules that aren\'t always explicit. NLP aims to break down this complexity so computers can:\n    *   **Identify words and their meanings:** Even words with multiple meanings (e.g., "bank" – river bank vs. financial institution).\n    *   **Understand sentence structure (syntax):** How words are arranged to form grammatically correct sentences.\n    *   **Grasp the overal

In [6]:
tokens[0]

AIMessageChunk(content="**NLP stands for Natural Language Processing.**\n\nIt's a field of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language in a way that is valuable and meaningful. Think of it as the bridge between human", additional_kwargs={}, response_metadata={'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--302253a8-79eb-4c1f-8181-5b05165a2170', usage_metadata={'input_tokens': 5, 'output_tokens': 1355, 'total_tokens': 1360, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1304}})

In [7]:
tokens[1]

AIMessageChunk(content=' communication and computer understanding.\n\n### The Core Goal of NLP\n\nThe primary goal of NLP is to make computers capable of processing and analyzing large amounts of natural language data. This includes both written text and spoken words. Ultimately, it aims to allow computers', additional_kwargs={}, response_metadata={'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--302253a8-79eb-4c1f-8181-5b05165a2170', usage_metadata={'output_tokens': 50, 'input_tokens': 0, 'total_tokens': 50, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 0}})

We can also merge multiple `AIMessageChunk` objects together with the `+` operator, creating a larger set of tokens / chunk:

In [8]:
tokens[0] + tokens[1]

AIMessageChunk(content="**NLP stands for Natural Language Processing.**\n\nIt's a field of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language in a way that is valuable and meaningful. Think of it as the bridge between human communication and computer understanding.\n\n### The Core Goal of NLP\n\nThe primary goal of NLP is to make computers capable of processing and analyzing large amounts of natural language data. This includes both written text and spoken words. Ultimately, it aims to allow computers", additional_kwargs={}, response_metadata={'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--302253a8-79eb-4c1f-8181-5b05165a2170', usage_metadata={'input_tokens': 5, 'output_tokens': 1405, 'total_tokens': 1410, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1304}})

A word of caution, there is nothing preventing you from merging tokens in the incorrect order, so be cautious to not output any token omelettes:

In [9]:
tokens[4] + tokens[3] + tokens[2] + tokens[1] + tokens[0]

AIMessageChunk(content='   **Tokenization:** Breaking text into individual words or sentences.\n*   **Part-of-Speech Tagging (POS Tagging):** Identifying the grammatical role of each word (noun, verb, adjective, etc.).\n*   **Named learning models. It involves breaking down language into smaller, manageable pieces and then applying algorithms to extract meaning, identify patterns, and make predictions.\n\n### Key Tasks and Components of NLP\n\nNLP encompasses a wide range of tasks, including:\n\n* to "read" and "comprehend" text and speech in a similar way humans do, and to respond intelligently.\n\n### How it Works (Simplified)\n\nNLP combines computational linguistics (rule-based modeling of human language) with machine learning and deep communication and computer understanding.\n\n### The Core Goal of NLP\n\nThe primary goal of NLP is to make computers capable of processing and analyzing large amounts of natural language data. This includes both written text and spoken words. Ultima

## Streaming with Agents


Streaming with agents, particularly the custom agent executor, is a little more complex. Let's begin by constructor a simple agent executor matching what we built in the agentsIntro.

To construct the agent executor we need:

* Tools
* `ChatPromptTemplate`
* Our LLM (already defined with `llm`)
* An agent
* Finally, the agent executor

Let's start defining each.

### Tools

Now we will define a few tools to be used by an async agent executor. Our goal for tool-use in regards to streaming are:

* The tool-use steps will be streamed in one big chunk, ie we do not return the tool use information token-by-token but instead it streams message-by-message.

* The final LLM output _will_ be streamed token-by-token as we saw above.

For these we need to define a few math tools and our final answer tool.

In [11]:
from langchain_core.tools import tool

@tool
def add(x: float, y: float) -> float:
    """Add 'x' and 'y'."""
    return x + y

@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' and 'y'."""
    return x * y

@tool
def exponentiate(x: float, y: float) -> float:
    """Raise 'x' to the power of 'y'."""
    return x ** y

@tool
def subtract(x: float, y: float) -> float:
    """Subtract 'x' from 'y'."""
    return y - x

@tool
def final_answer(answer: str, tools_used: list[str]) -> dict:
    """Use this tool to provide a final answer to the user.
    The answer should be in natural language as this will be provided
    to the user directly. The tools_used must include a list of tool
    names that were used within the `scratchpad`. You MUST use this tool
    to conclude the interaction.
    """
    return {"answer": answer, "tools_used": tools_used}

We'll need all of our tools in a list when defining our `agent` and `agent_executor`.

In [12]:
tools = [add, multiply, exponentiate, subtract, final_answer]

### `ChatPromptTemplate`

We will create our `ChatPromptTemplate`, using a system message, chat history, user input, and a scratchpad for intermediate steps.

In [13]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", (
        "You're a helpful assistant. When answering a user's question "
        "you should first use one of the tools provided. After using a "
        "tool the tool output will be provided back to you. You MUST "
        "then use the final_answer tool to provide a final answer to the user. "
        "DO NOT use the same tool more than once."
    )),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

### Agent

As before, we will define our `agent` with LCEL.

In [14]:
from langchain_core.runnables.base import RunnableSerializable

tools = [add, subtract, multiply, exponentiate, final_answer]

# define the agent runnable
agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="any")
)

### Agent Executor

Finally, we will create the agent executor.

In [15]:
import json
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage

# create tool name to function mapping
name2tool = {tool.name: tool.func for tool in tools}

class CustomAgentExecutor:
    chat_history: list[BaseMessage]

    def __init__(self, max_iterations: int = 3):
        self.max_iterations = max_iterations
        self.chat_history = []
        self.agent:  RunnableSerializable = agent

    def invoke(self, input: str) -> dict:
        # invoke the agent but we do this iteratively in a loop until
        # reaching a final answer

        count = 0
        agent_scratchpad = []
        while count < self.max_iterations:
            # invoke a step for the agent to generate a tool call
            tool_call = self.agent.invoke({
                    "input": input,
                    "chat_history": self.chat_history,
                    "agent_scratchpad": agent_scratchpad
                })

            agent_scratchpad.append(tool_call) # add tool call to scratchpad

            # otherwise we execute the tool and add it's output to the agent scratchpad
            tool_name = tool_call.tool_calls[0]["name"]
            tool_args = tool_call.tool_calls[0]["args"]
            tool_call_id = tool_call.tool_calls[0]["id"]

            tool_execution_output = name2tool[tool_name](**tool_args)
        
            #add the tool output to the agent scratchpad
            tool_message = ToolMessage(
                content = tool_execution_output,
                tool_call_id = tool_call_id
            )
            agent_scratchpad.append(tool_message)
            # add a print so we can see intermediate steps
            print(f"{count}: {tool_name}({tool_args}) => {tool_execution_output}")
            count += 1

            # if the tool call is the final answer tool, we stop
            if tool_call.tool_calls[0]["name"] == "final_answer":
                break

        # add the final output to the chat history
        final_output = tool_execution_output['answer']
        self.chat_history.extend([
            HumanMessage(content=input),
            AIMessage(content=final_output)
        ])

         # return the final answer in dict form
        return json.dumps(tool_execution_output)



Our `agent_executor` is now ready to use, let's quickly test it before adding streaming.

In [16]:
agent_executor = CustomAgentExecutor()
response = agent_executor.invoke(
    "What is 10 + 10?"
)

0: add({'y': 10, 'x': 10}) => 20
1: final_answer({'tools_used': ['add'], 'answer': '10 + 10 = 20'}) => {'answer': '10 + 10 = 20', 'tools_used': ['add']}


In [17]:
response

'{"answer": "10 + 10 = 20", "tools_used": ["add"]}'

Let's modify our `agent_executor` to use streaming and parse the streamed output into a format that we can more easily work with.

First, when streaming with our custom agent executor we will need to pass our callback handler to the agent on every new invocation. To make this simpler we can make the `callbacks` field a configurable field and this will allow us to initialize the agent using the `with_config` method, allowing us to pass the callback handler to the agent with every invocation.

In [18]:
from langchain_core.runnables import ConfigurableField
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",  
    temperature=0.0,
    streaming=True
).configurable_fields(
    callbacks = ConfigurableField(
    id="callbacks",
    name="callbacks",
    description="A list of callbacks to use for streaming"
    )
)

Callback:
* A callback is a function (or object with a function) you pass to another code block so it can "call back" later when something happens. It’s a way to customize what happens when an event occurs, usually as a parameter.

Callback handler:
* In practice, a callback handler is any system/function/object that listens for and processes these callbacks—handling the actual work when triggered (for example: printing, logging, updating a UI, or streaming output).

Everyday analogy:
* You give someone your phone number (callback). When dinner is ready (event), they call you (handle the callback) so you can come eat (react to the event).

The code is making the callbacks field in Gemini LLM configurable, so you (or others) can flexibly insert custom actions for model events (like streaming output or tracing).

We reinitialize our `agent`, nothing changes here:

In [19]:
# define the agent runnable
agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="any")
)

Now, we will define our _custom_ callback handler. This will be a queue callback handler that will allow us to stream the output of the agent through an `asyncio.Queue` object and yield the tokens as they are generated elsewhere.

#### asyncio
asyncio is Python’s built-in framework for writing concurrent programs using the async/await syntax, optimized for I/O-bound tasks like network calls, file/socket I/O, and subprocess management within a single thread via an event loop.​

* Coroutines: Define asynchronous functions with `async def` and suspend/resume them with `await`, allowing other coroutines to run while one is waiting on I/O.
* Task creation: `asyncio.create_task(coro)` schedules a coroutine to run concurrently; use await on the task or asyncio.gather to await multiple tasks.
* Non-blocking waits: Use await `asyncio.sleep(n)` as a placeholder for I/O waits; never use time.sleep in async code as it blocks the event loop.


In [21]:
import asyncio
from langchain_core.callbacks.base import AsyncCallbackHandler


class QueueCallbackHandler(AsyncCallbackHandler):
    """Callback handler that puts tokens into a queue."""

    def __init__(self, queue: asyncio.Queue):
        self.queue = queue
        self.final_answer_seen = False

    async def __aiter__(self):
        while True:
            if self.queue.empty():
                await asyncio.sleep(0.1)
                continue
            token_or_done = await self.queue.get()

            if token_or_done == "<<DONE>>":
                # this means we're done
                return
            if token_or_done == "<<STEP_END>>":
                yield "<<STEP_END>>"
                continue
            if token_or_done:
                yield token_or_done


    async def on_llm_new_token(self, *args, **kwargs) -> None:
        """Put new token in the queue."""
        #print(f"on_llm_new_token: {args}, {kwargs}")
        chunk = kwargs.get("chunk")
        if chunk:
            # check for final_answer tool call
            if tool_calls := chunk.message.additional_kwargs.get("tool_calls"):
                if tool_calls[0]["function"]["name"] == "final_answer":
                    # this will allow the stream to end on the next `on_llm_end` call
                    self.final_answer_seen = True
        self.queue.put_nowait(kwargs.get("chunk"))
        return

    async def on_llm_end(self, *args, **kwargs) -> None:
        """Put Done in the queue to signal completion."""
        #print(f"on_llm_end: {args}, {kwargs}")
        # this should only be used at the end of our agent execution, however LangChain
        # will call this at the end of every tool call, not just the final tool call
        # so we must only send the "done" signal if we have already seen the final_answer
        # tool call
        if self.final_answer_seen:
            self.queue.put_nowait("<<DONE>>")
        else:
            self.queue.put_nowait("<<STEP_END>>")
        return

We can see how this works together in our `agent` invocation:

In [22]:
queue = asyncio.Queue()
streamer = QueueCallbackHandler(queue)

tokens = []

async def stream(query: str):
    response = agent.with_config(
        callbacks=[streamer]
    )
    async for token in response.astream({
        "input": query,
        "chat_history": [],
        "agent_scratchpad": []
    }):
        tokens.append(token)
        print(token, flush=True)

await stream("What is 10 + 10")

content='' additional_kwargs={'function_call': {'name': 'add', 'arguments': '{"y": 10, "x": 10}'}} response_metadata={'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'} id='lc_run--b8720420-4f39-4c94-9170-93f7c23025cf' tool_calls=[{'name': 'add', 'args': {'y': 10, 'x': 10}, 'id': '3e8510e2-1a95-4d3a-9a36-e1e69d3499ca', 'type': 'tool_call'}] usage_metadata={'input_tokens': 392, 'output_tokens': 77, 'total_tokens': 469, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 57}} tool_call_chunks=[{'name': 'add', 'args': '{"y": 10, "x": 10}', 'id': '3e8510e2-1a95-4d3a-9a36-e1e69d3499ca', 'index': None, 'type': 'tool_call_chunk'}]
content='' additional_kwargs={} response_metadata={} id='lc_run--b8720420-4f39-4c94-9170-93f7c23025cf' chunk_position='last'


In [23]:
tokens[0].tool_calls

[{'name': 'add',
  'args': {'y': 10, 'x': 10},
  'id': '3e8510e2-1a95-4d3a-9a36-e1e69d3499ca',
  'type': 'tool_call'}]

In [24]:
tk = tokens[0]

for token in tokens[1:]:
    tk += token

tk

AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'add', 'arguments': '{"y": 10, "x": 10}'}}, response_metadata={'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--b8720420-4f39-4c94-9170-93f7c23025cf', tool_calls=[{'name': 'add', 'args': {'y': 10, 'x': 10}, 'id': '3e8510e2-1a95-4d3a-9a36-e1e69d3499ca', 'type': 'tool_call'}], usage_metadata={'input_tokens': 392, 'output_tokens': 77, 'total_tokens': 469, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 57}}, tool_call_chunks=[{'name': 'add', 'args': '{"y": 10, "x": 10}', 'id': '3e8510e2-1a95-4d3a-9a36-e1e69d3499ca', 'index': None, 'type': 'tool_call_chunk'}], chunk_position='last')

Now we're seeing that the output is being streamed token-by-token. Because we're being streamed a tool call the `content` field is empty. Instead, we can see the tokens being added inside the `tool_calls` fields, within `id`, `function.name`, and `function.arguments`.

In [25]:
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage
from langchain_core.runnables import RunnableSerializable
import json
import asyncio

class CustomAgentExecutor:
    chat_history: list[BaseMessage]

    def __init__(self, max_iterations: int = 3):
        self.chat_history = []
        self.max_iterations = max_iterations
        self.agent: RunnableSerializable = (
            {
                "input": lambda x: x["input"],
                "chat_history": lambda x: x["chat_history"],
                "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
            }
            | prompt
            | llm.bind_tools(tools, tool_choice="any")
        )

    async def invoke(self, input: str, streamer: QueueCallbackHandler, verbose: bool = False): # Changed to async generator
        count = 0
        agent_scratchpad = []
        final_answer = None
        tool_args_final = {}

        while count < self.max_iterations:
            # Create a new queue and streamer for each agent step
            step_queue = asyncio.Queue()
            step_streamer = QueueCallbackHandler(step_queue)

            response = self.agent.with_config(callbacks=[step_streamer])

            current_tool_call = None
            # Consume tokens directly from the astream generator
            async for token in response.astream({
                "input": input,
                "chat_history": self.chat_history,
                "agent_scratchpad": agent_scratchpad
            }):
                if token == "<<DONE>>":
                    # This should not happen here based on QueueCallbackHandler logic,
                    # but handle defensively
                    break
                # The step_streamer handles putting tokens into the step_queue
                # We now consume from the step_streamer's aiter which yields the tokens
                # and "<<STEP_END>>" or "<<DONE>>" signals

                if token == "<<STEP_END>>":
                    yield "<<STEP_END>>"
                    continue # Continue the outer while loop for the next step

                yield token # Yield the token to the consumer

                # Accumulate tool calls to process after the step stream is done
                if hasattr(token, "tool_calls") and token.tool_calls:
                     if current_tool_call is None:
                         current_tool_call = token
                     else:
                         current_tool_call += token

            # Process the tool call that was accumulated AFTER the astream is done for this step
            if current_tool_call and current_tool_call.tool_calls:
                 tool_call_message = AIMessage(
                     content=current_tool_call.content,
                     tool_calls=current_tool_call.tool_calls,
                     tool_call_id=current_tool_call.tool_calls[0]["id"] # Assuming one tool call per step for simplicity
                 )
                 agent_scratchpad.append(tool_call_message)

                 # Extract tool info
                 tool_name = tool_call_message.tool_calls[0]["name"]
                 tool_args = tool_call_message.tool_calls[0]["args"]
                 tool_call_id = tool_call_message.tool_calls[0]["id"]
                 tool_args_final = tool_args # Keep track of the last tool args (useful for final answer)

                 if verbose:
                     print(f"Executing tool: {tool_name} with args: {tool_args}", flush=True)


                 # If final_answer, we are done
                 if tool_name == "final_answer":
                     final_answer = tool_args.get("answer", "")
                     # The final answer is streamed as part of the last step
                     # We don't need to execute final_answer as a Python function here
                     # just update chat history and break
                     self.chat_history.extend([
                         HumanMessage(content=input),
                         AIMessage(content=final_answer)
                     ])
                     # Signal done to the main streamer (though not strictly needed with generator)
                     # streamer.queue.put_nowait("<<DONE>>") # Removed, generator handles end
                     return # End the generator

                 # Execute real tool
                 if tool_name not in name2tool:
                     tool_out = f"Error: Unknown tool {tool_name}"
                 else:
                     try:
                         tool_out = name2tool[tool_name](**tool_args)
                     except Exception as e:
                         tool_out = f"Error executing {tool_name}: {str(e)}"

                 # Add tool result to scratchpad
                 tool_exec = ToolMessage(
                     content=str(tool_out),
                     tool_call_id=tool_call_id
                 )
                 agent_scratchpad.append(tool_exec)
                 count += 1
            else:
                # If no tool call was made in a step (shouldn't happen with tool_choice="any")
                # Treat the content as the final answer or an error
                final_answer = current_tool_call.content if current_tool_call else "Agent did not produce a tool call."
                self.chat_history.extend([
                    HumanMessage(content=input),
                    AIMessage(content=final_answer)
                ])
                yield final_answer # Yield the final content
                return # End the generator


        # If loop ended without final_answer (max iterations reached)
        if final_answer is None:
            final_answer = "Max iterations reached without final answer"
            self.chat_history.extend([
                HumanMessage(content=input),
                AIMessage(content=final_answer)
            ])
            yield final_answer # Yield the final content

        # No explicit return needed for generator end

agent_executor = CustomAgentExecutor(max_iterations=5)

Although this seems like a lot of work, we're now streaming tokens in a way that allows us to pass these tokens on to other parts of our code - such as through a websocket, streamed API response, or some downstream processing.

Let's try this out, we'll put together some simple post-processing to allow us to more nicely format the streamed output from out agent.

In [26]:
agent_executor = CustomAgentExecutor(max_iterations=5) # Re-initialize after modification

async for token in agent_executor.invoke("What is 10 + 10 multiplied by 5 with power 2", QueueCallbackHandler(asyncio.Queue())):
    if token == "<<STEP_END>>":
        print("\n<<STEP_END>>\n", flush=True)
        continue

    # Check if token has tool_calls (LangChain normalized format)
    if hasattr(token, "tool_calls") and token.tool_calls:
        tool_call = token.tool_calls[0]
        # if we have a tool name, we'll print it
        if tool_name := tool_call.get("name"):
            print(f"Calling {tool_name}...", flush=True)
        # if we have arguments, we print them
        if tool_args := tool_call.get("args"):
            print(f"{tool_args}", flush=True)
    else:
        # If not a tool call, it's likely the final answer content
        print(token.content, end="", flush=True)

Calling exponentiate...
{'y': 2, 'x': 5}
Calling multiply...
{'y': 25, 'x': 10}
Calling add...
{'y': 250, 'x': 10}
Calling final_answer...
{'tools_used': ['exponentiate', 'multiply', 'add'], 'answer': '10 + 10 multiplied by 5 with power 2 is 260.'}
