# Streaming

Streaming is an important UX consideration for LLM apps, and agents are no exception. Streaming with agents is made more complicated by the fact that it's not just tokens of the final answer that you will want to stream, but you may also want to stream back the intermediate steps an agent takes.

In this notebook, we'll cover the `stream/astream` and `astream_events` for streaming.

Our agent will use a tools API for tool invocation with the tools:

1. `where_cat_is_hiding`:  Returns a location where the cat is hiding
2. `get_items`: Lists items that can be found in a particular place

These tools will allow us to explore streaming in a more interesting situation where the agent will have to use both tools to answer some questions (e.g., to answer the question `what items are located where the cat is hiding?`).

We've designed `get_items` to use an LLM as part of its implementation to show how to achieve **token by token** streaming even from code nested within tools.

Ready?🏎️

In [1]:
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.prompts import ChatPromptTemplate
from langchain.tools import tool
from langchain_core.callbacks import Callbacks
from langchain_openai import ChatOpenAI

## Create the model

**Attention** 

* For older versions of langchain, we must set `streaming=True` on the LLM.

In [2]:
model = ChatOpenAI(temperature=0, streaming=True)

## Tools

We define two tools that rely on a chat model to generate output!

**Attention**

1. We invoke the model using `astream()` to force the output to stream (unfortunately for older langchain versions you should still set `streaming=True` on the model). Do not use `.ainvoke` if you need token by token streaming.
2. We attach `names` and `tags` to the models inside the tools. This will allow us to later filter based on either name or tags when using `astream_events`.
3. Pay attention to the callback propagation! We have to propagate callbacks if we want our callback handlers (including the one that powers `astream_events`) to be passed to components. This is what allows the chat models to send token by token updates.

In [3]:
import random

In [4]:
@tool
async def where_cat_is_hiding() -> str:
    """Where is the cat hiding right now?"""
    return random.choice(
        ["under the bed", "on the shelf", "in the corner", "in the bathroom"]
    )


# And a tool that uses an LLM under the hood
@tool
async def get_items(place: str, callbacks: Callbacks) -> str:  # <--- Accept callbacks
    """Use this tool to look up which items are in the given place."""
    template = ChatPromptTemplate.from_messages(
        [
            (
                "human",
                "Can you tell me what kind of items i might find in the following place: '{place}'. "
                "List at least 3 such items separating them by a comma. And include a brief description of each item..",
            )
        ]
    )
    chain = template | model.with_config(
        {
            "run_name": "Get Items LLM",
            "tags": ["tool_llm"],
            "callbacks": callbacks,
        }  # <-- Propagate callbacks
    )
    chunks = [chunk async for chunk in chain.astream({"place": place})]
    return "".join(chunk.content for chunk in chunks)

In [5]:
await where_cat_is_hiding.ainvoke({})

'under the bed'

In [6]:
await get_items.ainvoke({"place": "on a a table"})

'On a table, you might find a few common items such as:\n\n1. A book: A book is a written or printed work consisting of pages glued or sewn together along one side and bound in covers. It could be a novel, a textbook, or any other type of reading material.\n\n2. A coffee mug: A coffee mug is a cylindrical-shaped cup typically used for drinking hot beverages like coffee or tea. It usually has a handle for easy gripping and can be made of various materials such as ceramic, glass, or stainless steel.\n\n3. A vase with flowers: A vase is a decorative container, often made of glass or ceramic, used to hold flowers or other ornamental plants. It adds a touch of beauty and freshness to the surroundings, and the flowers inside can vary depending on personal preference or occasion.'

## Initialize the agent

Here, we'll initialize an OpenAI tools agent.

In [7]:
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")
# print(prompt.messages) -- to see the prompt
tools = [get_items, where_cat_is_hiding]
agent = create_openai_tools_agent(
    model.with_config({"tags": ["agent_llm"]}), tools, prompt
)
agent_executor = AgentExecutor(agent=agent, tools=tools).with_config(
    {"run_name": "Agent"}
)

## Stream Intermediate Steps

We'll use `.stream` method of the AgentExecutor to stream the agent's intermediate steps.

The output from `.stream` alternates between (action, observation) pairs, finally concluding with the answer if the agent achieved its objective. 

It'll look like this:

1. actions output
2. observations output
3. actions output
4. observations output

**... (continue until goal is reached) ...**

Then, if the final goal is reached, the agent will output the **final answer**.


The contents of these outputs are summarized here:

| Output             | Contents                                                                                          |
|----------------------|------------------------------------------------------------------------------------------------------|
| **Actions**   |  `actions` `AgentAction` or a subclass, `messages` chat messages corresponding to action invocation |
| **Observations** |  `steps` History of what the agent did so far, including the current action and its observation, `messages` chat message with function invocation results (aka observations)|
| **Final answer** | `output` `AgentFinish`, `messages` chat messages with the final output|

In [8]:
# Note: We use `pprint` to print only to depth 1, it makes it easier to see the output from a high level, before digging in.
import pprint

chunks = []

async for chunk in agent_executor.astream(
    {"input": "what's items are located where the cat is hiding?"}
):
    chunks.append(chunk)
    print("------")
    pprint.pprint(chunk, depth=1)

------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...],
 'output': 'The items located where the cat is hiding in the bathroom are:\n'
           '\n'
           '1. Toilet paper\n'
           '2. Toothbrush\n'
           '3. Soap'}


### Using Messages

You can access the underlying `messages` from the outputs. Using messages can be nice when working with chat applications - because everything is a message!

In [9]:
chunks[0]["actions"]

[OpenAIToolAgentAction(tool='where_cat_is_hiding', tool_input={}, log='\nInvoking: `where_cat_is_hiding` with `{}`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_onVxOwNvzVYzzvk4nK6GjipD', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})], tool_call_id='call_onVxOwNvzVYzzvk4nK6GjipD')]

In [10]:
for chunk in chunks:
    print(chunk["messages"])

[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_onVxOwNvzVYzzvk4nK6GjipD', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})]
[FunctionMessage(content='in the bathroom', name='where_cat_is_hiding')]
[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_J9qR0NxVBuEECA4O9RC6KFEv', 'function': {'arguments': '{\n  "place": "bathroom"\n}', 'name': 'get_items'}, 'type': 'function'}]})]
[FunctionMessage(content='In a bathroom, you might find the following items: \n\n1. Toilet paper: A roll of soft paper used for personal hygiene after using the toilet. It is typically placed on a holder or a wall-mounted dispenser.\n\n2. Toothbrush: A small handheld brush with bristles used for cleaning teeth. It usually has a handle and a head with bristles of varying lengths to reach different areas of the mouth.\n\n3. Soap: A cleansing agent used for washing hands and body. It comes in various forms

In addition, they contain full logging information (`actions` and `steps`) which may be easier to process for rendering purposes.

### Using AgentAction/Observation

The outputs also contain richer structured information inside of `actions` and `steps`, which could be useful in some situations, but can also be harder to parse.

**Attention** `AgentFinish` is not available as part of the `streaming` method. If this is something you'd like to be added, please start a discussion on github and explain why its needed.

In [11]:
async for chunk in agent_executor.astream(
    {"input": "what's items are located where the cat is hiding?"}
):
    # Agent Action
    if "actions" in chunk:
        for action in chunk["actions"]:
            print(f"Calling Tool: `{action.tool}` with input `{action.tool_input}`")
    # Observation
    elif "steps" in chunk:
        for step in chunk["steps"]:
            print(f"Tool Result: `{step.observation}`")
    # Final result
    elif "output" in chunk:
        print(f'Final Output: {chunk["output"]}')
    else:
        raise ValueError()
    print("---")

Calling Tool: `where_cat_is_hiding` with input `{}`
---
Tool Result: `in the bathroom`
---
Calling Tool: `get_items` with input `{'place': 'bathroom'}`
---
Tool Result: `In a bathroom, you might find the following items: 

1. Toilet paper: A soft, absorbent paper used for personal hygiene after using the toilet. It is typically found on a roll and is essential for maintaining cleanliness and comfort in the bathroom.

2. Toothbrush: A small, handheld tool with bristles used for cleaning teeth. It is commonly used with toothpaste to remove plaque and maintain oral hygiene. Toothbrushes come in various sizes and designs, with bristles that are gentle on the gums.

3. Soap: A cleansing agent used for washing hands and body. Soap comes in different forms such as bars, liquid, or foam. It helps to remove dirt, bacteria, and oils from the skin, leaving it clean and refreshed.`
---
Final Output: The items that are located where the cat is hiding in the bathroom are:

1. Toilet paper
2. Toothbr

## Custom Streaming

The default implementation of `stream` may not be appropriate for some applications; e.g., if you want to stream individual tokens, or surfacing intermediate
steps occuring **within** tools. 

Let's take a look at two different ways to customize what the agent is streaming.

1. `astream_events`: **beta** API, introduced in new langchain versions. This is the **recommended** approach.
2. `callbacks`: This can be useful if you're on older versions of LangChain and cannot upgrade. This is **NOT** recommended, as for most applications you'll need to set up a queue and send the callbacks to another worker (i.e., there's hidden complexity!). `astream_events` does this under the hood!

**ATTENTION**

* If you want to see *token by token* streaming, set the LLM to `streaming=True`
* Use `async` code throughout (whether it's callbacks or tools etc) -- we will try to lift this restriction in the future\

**NOTE** You can also use the [astream_log](https://python.langchain.com/docs/expression_language/interface#async-stream-intermediate-steps) API. This API produces a granular log of all events that occur during execution. The log format is based on the [JSONPatch](https://jsonpatch.com/) standard. It's granular, but requires effort to parse. For this reason, we created the `astream_events` API instead.

### Using `astream_events` (Beta)

In [12]:
async for event in agent_executor.astream_events(
    {"input": "where is the cat hiding? what items are in that location?"},
    version="v1",
):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        # Empty content in the context of OpenAI means
        # that the model is asking for a tool to be invoked.
        if content:
            print(content, end="|")
    elif (
        kind in {"on_chat_model_start", "on_chat_model_end"}
        and "tool_llm" in event["tags"]
    ):
        print()
        print()
    elif kind == "on_tool_start":
        print("--")
        print(
            f"Starting tool: {event['name']} with inputs: {event['data'].get('input')}"
        )
    elif kind == "on_tool_end":
        print(f"Done tool: {event['name']}")
        print(f"Tool output was: {event['data'].get('output')}")
        print("--")
    elif kind == "on_chain_start":
        if event["name"] == "Agent":
            print(
                f"Starting agent: {event['name']} with input: {event['data'].get('input')}"
            )
    elif kind == "on_chain_end":
        if event["name"] == "Agent":
            print()
            print("--")
            print(
                f"Done agent: {event['name']} with output: {event['data'].get('output')['output']}"
            )
        pass

Starting agent: Agent with input: {'input': 'where is the cat hiding? what items are in that location?'}
--
Starting tool: where_cat_is_hiding with inputs: {}
Done tool: where_cat_is_hiding
Tool output was: in the corner
--
--
Starting tool: get_items with inputs: {'place': 'corner'}


In| a| corner|,| you| might| find|:

|1|.| Lamp|:| A| small| table| lamp| that| provides| soft| and| warm| lighting|.| It| is| usually| placed| on| a| corner| table| or| shelf|,| adding| both| functionality| and| ambiance| to| the| space|.

|2|.| Book|shelf|:| A| tall| and| narrow| book|shelf| that| fits| perfectly| in| a| corner|.| It| is| designed| to| hold| books|,| magazines|,| or| decorative| items|,| providing| storage| and| displaying| opportunities| while| optimizing| the| use| of| space|.

|3|.| Plant|:| A| p|otted| plant| placed| in| a| corner| can| bring| life| and| freshness| to| the| area|.| It| could| be| a| small| indoor| plant| or| a| larger| p|otted| tree|,| adding| a| touch| of| nature|

### Using Callbacks (Legacy)

Another approach to streaming is using callbacks. This may be useful if you're still on an older version of LangChain and cannot upgrade.

Generall, this is **NOT** a recommended approach because:

1. for most applications, you'll need to create two workers, write the callbacks to a queue and have another worker reading from the queue (i.e., there's hidden complexity to make this work).
2. **end** events may be missing some metadata (e.g., like run name). So if you need the additional metadata, you should inherit from `BaseTracer` instead of `AsyncCallbackHandler` to pick up the relevant information from the runs (aka traces), or else implement the aggregation logic yourself based on the `run_id`.
3. There is inconsistent behavior with the callbacks (e.g., how inputs and outputs are encoded) depending on the callback type that you'll need to workaround.

For illustration purposes, we implement a callback below that shows how to get *token by token* streaming. Feel free to implement other callbacks based on your application needs.

But `astream_events` does all of this you under the hood, so you don't have to!

In [13]:
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
from uuid import UUID

from langchain_core.callbacks.base import AsyncCallbackHandler
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult


# Here is a custom handler that will print the tokens to stdout.
# Instead of printing to stdout you can send the data elsewhere; e.g., to a streaming API response
class TokenByTokenHandler(AsyncCallbackHandler):
    def __init__(self, tags_of_interest: List[str]) -> None:
        """A custom call back handler.

        Args:
            tags_of_interest: Only LLM tokens from models with these tags will be
                              printed.
        """
        self.tags_of_interest = tags_of_interest

    async def on_chat_model_start(
        self,
        serialized: Dict[str, Any],
        messages: List[List[BaseMessage]],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        """Run when a chat model starts running."""
        overlap_tags = self.get_overlap_tags(tags)

        if overlap_tags:
            print(",".join(overlap_tags), end=": ", flush=True)

    async def on_llm_end(
        self,
        response: LLMResult,
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> None:
        """Run when LLM ends running."""
        overlap_tags = self.get_overlap_tags(tags)

        if overlap_tags:
            # Who can argue with beauty?
            print()
            print()

    def get_overlap_tags(self, tags: Optional[List[str]]) -> List[str]:
        """Check for overlap with filtered tags."""
        if not tags:
            return []
        return sorted(set(tags or []) & set(self.tags_of_interest or []))

    async def on_llm_new_token(
        self,
        token: str,
        *,
        chunk: Optional[Union[GenerationChunk, ChatGenerationChunk]] = None,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> None:
        """Run on new LLM token. Only available when streaming is enabled."""
        overlap_tags = self.get_overlap_tags(tags)

        if token and overlap_tags:
            print(token, end="|", flush=True)

In [14]:
handler = TokenByTokenHandler(tags_of_interest=["tool_llm", "agent_llm"])

result = await agent_executor.ainvoke(
    {"input": "where is the cat hiding and what items can be found there?"},
    {"callbacks": [handler]},
)

agent_llm: 

agent_llm: 

tool_llm: In| a| bathroom|,| you| might| find| the| following| items|:| 

|1|.| Toilet| paper|:| A| soft|,| absorb|ent| paper| used| for| personal| hygiene| after| using| the| toilet|.| It| is| typically| found| on| a| roll| and| is| essential| for| maintaining| cleanliness| and| comfort| in| the| bathroom|.

|2|.| Tooth|brush|:| A| small|,| handheld| tool| with| br|istles| used| for| cleaning| teeth|.| It| is| commonly| used| with| tooth|paste| to| remove| plaque| and| maintain| oral| hygiene|.| Tooth|brush|es| come| in| various| sizes| and| designs|,| and| are| usually| stored| near| the| sink| or| in| a| tooth|brush| holder|.

|3|.| Soap|:| A| cleansing| agent| used| for| washing| hands| and| body|.| Soap| comes| in| different| forms| such| as| bars|,| liquid|,| or| foam|,| and| is| typically| used| with| water| to| create| l|ather|.| It| helps| remove| dirt|,| bacteria|,| and| other| imp|urities|,| leaving| the| skin| feeling| clean| and| refreshed|.| Soap