# Streaming With LangChain

Streaming is critical in making applications based on LLMs feel responsive to end-users.

Important LangChain primitives like LLMs, parsers, prompts, retrievers, and agents implement the LangChain [Runnable Interface](/docs/expression_language/interface).

This interface provides two general approaches to stream content:

1. sync `stream` and async `astream`: a **default implementation** of streaming that streams the **final output** from the chain.
2. async `astream_events` and async`astream_log`: these provide a way to stream both **intermediate steps** and **final output** from the chain.

Here, we'll take a look at both approaches, and try to understand a bit of what's happening under the hood. 🥷

## Using Stream

All `Runnable` objects implement a sync method called `stream` and an async variant called `astream`. 

These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.

Streaming is only possible if all steps in the program know how to operate an **input stream**, process an input chunk one at a time, and yield a corresponding output chunk.

Such processing logic can range from being trivial (e.g., output the tokens generated by the LLM) to fairly difficult (e.g., streaming partial JSON results before the full JSON is available).

The best place to start exploring streaming is with the single most important components in LLMs apps -- the LLLMs themselves!

### LLMs and Chat Models

Large language models and their chat variants are the primary bottleneck in LLM based apps. 🙊

Large language models may take up to a **few seconds** to generate a complete response to a query.

This is far larger than the **~200-300 ms** threshold at which an application still feels responsive to an end user.

The primary solution to this problem is to stream the output from the model **token by token**.

In [1]:
from langchain.chat_models import ChatAnthropic

model = ChatAnthropic()

chunks = []
async for chunk in model.astream("hello. tell me something about yourself"):
    chunks.append(chunk)
    print(chunk.content, end="|")

 Hello|!| My| name| is| Claude|.| I|'m| an| artificial| intelligence| assistant| created| by| An|throp|ic| to| be| helpful|,| harmless|,| and| honest|.||

Let's inspect one of the chunks

In [2]:
chunks[0]

AIMessageChunk(content=' Hello')

We got back something called an `AIMessageChunk`. This chunk represents a part of an `AIMessage`.

Message chunks are additive by design -- one can simply add them up to get the state of the response so far!

In [3]:
chunks[0] + chunks[1] + chunks[2] + chunks[3] + chunks[4]

AIMessageChunk(content=' Hello! My name is')

### Chains

Virtually all LLM applications involve more steps than just a call to a language model.

Let's build a simple chain using `LangChain Expression Language` (`LCEL`) that combines a prompt, model and a parser and verify that streaming works.

We will use `StrOutputParser` to parse the output from the model. This is a simple parser that extracts the `content` field from an `AIMessageChunk`, giving us the `token` returned by the model.

:::{.callout-tip}
LCEL provides a *declarative* way to specify a "program" by chainining together LangChain primitives. Chains created using LCEL benefit from an automatic implementation of `stream`, and `astream` allowing streaming of final output. In fact, chains created with LCEL implement
the entire standard Runnable interface.
:::

In [4]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
parser = StrOutputParser()
chain = prompt | model | parser

async for chunk in chain.astream({"topic": "parrot"}):
    print(chunk, end="|")

 Here|'s| a| silly| joke| about| par|rots|:|

What| do| you| call| a| par|rot| that| flew| away|?| |
A| polygon|!|

Get| it|?| Because| if| a| par|rot| flies| far| away|,| eventually| it| becomes| a| dot| on| the| horizon|,| which| looks| like| a| point|,| which| sounds| like| "|polygon|"| when| said| out| loud|!| It|'s| a| very| cor|ny| joke| playing| on| words|,| but| I| hope| it| made| you| chuck|le| a| bit|.| Par|rots| can| be| pretty| funny| birds|,| so| there|'s| lots| of| potential| for| bird|-|b|rained| humor| about| them|!||

:::{.callout-info}
You do not have to use the `LangChain Expression Language` to use LangChain and can instead rely on a standard **imperative** programming approach by
caling `invoke`, `batch` or `stream` on each component individually, assigning the results to variables and then using them downstream as you see fit.

If that works for your needs, then that's fine by us 👌!
:::

### Working with Input Streams

What if you wanted to stream JSON from the output as it was being generated?

If you were to rely on `json.loads` to parse the partial json, the parsing would fail as the partial json wouldn't be valid json.

You'd likely be at a complete loss of what to do and claim that it wasn't possible to stream JSON.

Well, turns out there is a way to do it -- the parser needs to operate on the **input stream**, and attempt to extract valid json
from partial model output.

Let's see such a parser in action to understand what this means.

In [5]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI()

chain = model | JsonOutputParser()  # This parser only works with OpenAI right now
async for text in chain.astream(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`'
):
    print(text)

{}
{'countries': []}
{'countries': [{}]}
{'countries': [{'name': ''}]}
{'countries': [{'name': 'France'}]}
{'countries': [{'name': 'France', 'population': ''}]}
{'countries': [{'name': 'France', 'population': '67'}]}
{'countries': [{'name': 'France', 'population': '67,'}]}
{'countries': [{'name': 'France', 'population': '67,081'}]}
{'countries': [{'name': 'France', 'population': '67,081,'}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {'name': 'Spain'}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {'name': 'Spain', 'population': ''}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {'name': 'Spain', 'population': '46'}]}
{'countries': [{'name': 'France', 'population': '67,081,000'}, {'name': 'Spain', 'population': '46,'}]}
{'co

:::{.callout-warning}
Any steps in the chain that operate on **full inputs** rather than on **input streams** can break streaming functionality via `stream` or `astream`.
:::

:::{.callout-tip}
Later, we will discuss the `astream_events` API which streams results from intermediate steps. This API will stream results from intermediate steps even if the chain contains steps that only operate on **finalized inputs**.
:::

Let's use the previous example and add a bit of logic on top that breaks streaming using a function that extracts the countries names from the finalized JSON.

In [6]:
from langchain_core.output_parsers import JsonOutputParser


def _extract_country_names(inputs):
    """A function that does not operates on input streams and breaks streaming."""
    if not isinstance(inputs, dict):
        return ""

    if "countries" not in inputs:
        return ""

    countries = inputs["countries"]

    if not isinstance(countries, list):
        return ""

    country_names = [
        country.get("name") for country in countries if isinstance(country, dict)
    ]
    return country_names


chain = model | JsonOutputParser() | _extract_country_names

async for text in chain.astream(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries"'
):
    print(text, end="|")

[None, None, None]|

:::{.callout-tip}
Using a generator function (a function that uses `yield`) allows writing code that operators on **input streams**
:::

In [7]:
from langchain_core.output_parsers import JsonOutputParser


async def _extract_country_names(input_stream):
    """A function that operates on input streams."""
    country_names_so_far = set()

    async for input in input_stream:
        if not isinstance(input, dict):
            continue

        if "countries" not in input:
            continue

        countries = input["countries"]

        if not isinstance(countries, list):
            continue

        for country in countries:
            name = country.get("name")
            if not name:
                continue
            if name not in country_names_so_far:
                yield name
                country_names_so_far.add(name)


chain = model | JsonOutputParser() | _extract_country_names

async for text in chain.astream(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries"'
):
    print(text, end="|")

France|Spain|Japan|

### Non-streaming components

Some built-in components like Retrievers do not offer any `streaming`. What happens if we try to `stream` them? 🤨

In [10]:
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
    ["harrison worked at kensho", "harrison likes spicy food"],
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

chunks = [chunk for chunk in retriever.stream("where did harrison work?")]
chunks

[[Document(page_content='harrison worked at kensho'),
  Document(page_content='harrison likes spicy food')]]

Stream just yielded the final result from that component. This is OK 🥹! 

Not all components have to implement streaming as in some cases such implementation may be unnecessary, difficult or not make sense.

:::{.callout-info}
An LCEL chain constructed using using non-streaming components, will still be able to stream in a lot of cases, with streaming of partial output starting after the last non-streaming step in the chain.
:::

In [11]:
retrieval_chain = (
    {
        "context": retriever.with_config(run_name="Docs"),
        "question": RunnablePassthrough(),
    }
    | prompt
    | model
    | StrOutputParser()
)

In [12]:
for chunk in retrieval_chain.stream(
    "Where did harrison work? Write 3 made up sentences about this place."
):
    print(chunk, end="|")

|H|arrison| worked| at| Kens|ho|,| a| renowned| technology| company| specializing| in| artificial| intelligence| and| data| analytics|.| Located| in| the| heart| of| Silicon| Valley|,| Kens|ho| offers| a| vibrant| and| innovative| work| environment|,| fostering| collaboration| and| creativity| among| its| employees|.| With| state|-of|-the|-art| facilities| and| cutting|-edge| technology|,| Harrison| was| part| of| a| dynamic| team| that| revolution|ized| the| way| businesses| approach| data| analysis|.| At| Kens|ho|,| Harrison| had| the| opportunity| to| work| with| industry|-leading| experts| and| contribute| to| groundbreaking| projects| that| shaped| the| future| of| AI| technology|.||

Now that we've seen how `stream` and `astream` works. Let's bravely venture into the world of streaming events. 🏞️

## Using Stream Events

Event Streaming is a **beta** API, and may change a bit based on feedback. 

:::{.callout-info}
Introduced in langchain-core **0.1.14**.
:::

When using the astream_events API, for everything to work properly please:

* Use `async` throughout the code (e.g., async tools etc)
* Propagate callbacks if defining custom functions / runnables
* Whenever using runnables without LCEL, make sure to call `.astream()` on LLMs rather than `.ainvoke` to force the LLM to stream tokens.

### Event Reference

Below is a reference table that shows some events that might be emitted by the various Runnable objects.

⚠️ When streaming the inputs for the runnable will not be available until the input stream has been entirely consumed This means that the inputs will be available at for the corresponding `end` hook rather than `start` event.


| event                | name             | chunk                           | input                                         | output                                          |
|----------------------|------------------|---------------------------------|-----------------------------------------------|-------------------------------------------------|
| on_chat_model_start  | [model name]     |                                 | {"messages": [[SystemMessage, HumanMessage]]} |                                                 |
| on_chat_model_stream | [model name]     | AIMessageChunk(content="hello") |                                               |                                                 |
| on_chat_model_end    | [model name]     |                                 | {"messages": [[SystemMessage, HumanMessage]]} | {"generations": [...], "llm_output": None, ...} |
| on_llm_start         | [model name]     |                                 | {'input': 'hello'}                            |                                                 |
| on_llm_stream        | [model name]     | 'Hello'                         |                                               |                                                 |
| on_llm_end           | [model name]     |                                 | 'Hello human!'                                |
| on_chain_start       | format_docs      |                                 |                                               |                                                 |
| on_chain_stream      | format_docs      | "hello world!, goodbye world!"  |                                               |                                                 |
| on_chain_end         | format_docs      |                                 | [Document(...)]                               | "hello world!, goodbye world!"                  |
| on_tool_start        | some_tool        |                                 | {"x": 1, "y": "2"}                            |                                                 |
| on_tool_stream       | some_tool        | {"x": 1, "y": "2"}              |                                               |                                                 |
| on_tool_end          | some_tool        |                                 |                                               | {"x": 1, "y": "2"}                              |
| on_retriever_start   | [retriever name] |                                 | {"query": "hello"}                            |                                                 |
| on_retriever_chunk   | [retriever name] | {documents: [...]}              |                                               |                                                 |
| on_retriever_end     | [retriever name] |                                 | {"query": "hello"}                            | {documents: [...]}                              |
| on_prompt_start      | [template_name]  |                                 | {"question": "hello"}                         |                                                 |
| on_prompt_end        | [template_name]  |                                 | {"question": "hello"}                         | ChatPromptValue(messages: [SystemMessage, ...]) |



If you're on an earlier version of LangChain this API will not exist for you! 

This is currently a **beta API**, and we would love to hear any feedback about it.

In [13]:
import langchain_core

langchain_core.__version__

'0.1.14'

### Chat Model

Let's start off by looking at the events produced by a chat model.

In [14]:
events = []
async for event in model.astream_events("hello", version="v1"):
    events.append(event)

  warn_beta(



:::{.callout-info}

Hey what's that funny version="v1" parameter in the API?! 😾

This is a **beta API**, and we're almost certainly going to make some changes to it.

This version parameter will allow us to mimimize such breaking changes to your code. 

Essentially, we are annoying you now, so we don't have to annoy you later.
:::

Let's take a look at the few of the start event and a few of the end events.

In [16]:
events[:3]

[{'event': 'on_chat_model_start',
  'run_id': '60b654c8-f516-4a0e-ac75-d7b702789f39',
  'name': 'ChatOpenAI',
  'tags': [],
  'metadata': {},
  'data': {'input': 'hello'}},
 {'event': 'on_chat_model_stream',
  'run_id': '60b654c8-f516-4a0e-ac75-d7b702789f39',
  'tags': [],
  'metadata': {},
  'name': 'ChatOpenAI',
  'data': {'chunk': AIMessageChunk(content='')}},
 {'event': 'on_chat_model_stream',
  'run_id': '60b654c8-f516-4a0e-ac75-d7b702789f39',
  'tags': [],
  'metadata': {},
  'name': 'ChatOpenAI',
  'data': {'chunk': AIMessageChunk(content='Hello')}}]

In [17]:
events[-2:]

[{'event': 'on_chat_model_stream',
  'run_id': '60b654c8-f516-4a0e-ac75-d7b702789f39',
  'tags': [],
  'metadata': {},
  'name': 'ChatOpenAI',
  'data': {'chunk': AIMessageChunk(content='')}},
 {'event': 'on_chat_model_end',
  'name': 'ChatOpenAI',
  'run_id': '60b654c8-f516-4a0e-ac75-d7b702789f39',
  'tags': [],
  'metadata': {},
  'data': {'output': AIMessageChunk(content='Hello! How can I assist you today?')}}]

### Chain

Let's re-use example chain that parsed streaming JSON to explore the streaming events API.

In [18]:
chain = model | JsonOutputParser()  # This parser only works with OpenAI right now

events = [
    event
    async for event in chain.astream_events(
        'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
        version="v1",
    )
]

If you examine at the first few events, you'll notice that there are **3** different start events rather than **2** start events.

The three start events correspond to:

1. The chain (model + parser)
2. The model
3. The parser

In [19]:
events[:3]

[{'event': 'on_chain_start',
  'run_id': '2ba4da15-91ef-4d47-8fac-a6bd634f31de',
  'name': 'RunnableSequence',
  'tags': [],
  'metadata': {},
  'data': {'input': 'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`'}},
 {'event': 'on_chat_model_start',
  'name': 'ChatOpenAI',
  'run_id': 'd8a52c93-a3ce-4931-9a95-18a0a609fbec',
  'tags': ['seq:step:1'],
  'metadata': {},
  'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`')]]}}},
 {'event': 'on_parser_start',
  'name': 'JsonOutputParser',
  'run_id': '4e5e3ee6-343a-4b8b-a3b5-b9d2fdf9be43',
  'tags': ['seq:step:2'],
  'metadata': {

What do you think you'd see if you looked at the last 3 events? what about the middle?

Let's use this API to take output the stream events from the model and the parser. We're ignoring start events, end events and events from the chain.

In [20]:
num_events = 0

async for event in chain.astream_events(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
    version="v1",
):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        print(f"Chat model chunk: {repr(event['data']['chunk'].content)}", flush=True)
    if kind == "on_parser_stream":
        print(f"Parser chunk: {event['data']['chunk']}", flush=True)
    num_events += 1
    if num_events > 30:
        # Truncate the output
        print("...")
        break

Chat model chunk: ''
Parser chunk: {}
Chat model chunk: '{\n'
Chat model chunk: ' '
Chat model chunk: ' "'
Chat model chunk: 'countries'
Chat model chunk: '":'
Parser chunk: {'countries': []}
Chat model chunk: ' [\n'
Chat model chunk: '   '
Parser chunk: {'countries': [{}]}
Chat model chunk: ' {\n'
Chat model chunk: '     '
Chat model chunk: ' "'
Chat model chunk: 'name'
Chat model chunk: '":'
Parser chunk: {'countries': [{'name': ''}]}
Chat model chunk: ' "'
Parser chunk: {'countries': [{'name': 'France'}]}
Chat model chunk: 'France'
Chat model chunk: '",\n'
Chat model chunk: '     '
Chat model chunk: ' "'
...


Because both the model and the parser support streaming, we see sreaming events from both components in real time! Kind of cool isn't it? 🦜

### Filtering Events

Because this API produces so many events, it is useful to be able to filter on events.

You can filter by either component `name`, component `tags` or component `type`.

#### By Name

In [21]:
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
    {"run_name": "my_parser"}
)

max_events = 0
async for event in chain.astream_events(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
    version="v1",
    include_names=["my_parser"],
):
    print(event)
    max_events += 1
    if max_events > 10:
        # Truncate output
        print("...")
        break

{'event': 'on_parser_start', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': []}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{}]}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{'name': ''}]}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': '6801e146-21a7-4660-8521-a6e2ed6080df', 'tags': ['seq:step:2'], 'metadat

#### By Type

In [22]:
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
    {"run_name": "my_parser"}
)

max_events = 0
async for event in chain.astream_events(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
    version="v1",
    include_types=["chat_model"],
):
    print(event)
    max_events += 1
    if max_events > 10:
        # Truncate output
        print("...")
        break

{'event': 'on_chat_model_start', 'name': 'model', 'run_id': 'c702b10a-7b47-41dd-83d9-0163d9489bce', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`')]]}}}
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': 'c702b10a-7b47-41dd-83d9-0163d9489bce', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageChunk(content='')}}
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': 'c702b10a-7b47-41dd-83d9-0163d9489bce', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageChunk(content='{\n')}}
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': 'c702b10a-7b47-41dd-83d9-0163d9489bce', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageChunk(content=' ')}}


#### By Tags

:::{.callout-warning}

Tags are inherited by child components of a given runnable. 

If you're using tags to filter, make sure that this is what you want.
:::

In [24]:
chain = (model | JsonOutputParser()).with_config({"tags": ["my_chain"]})

max_events = 0
async for event in chain.astream_events(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
    version="v1",
    include_tags=["my_chain"],
):
    print(event)
    max_events += 1
    if max_events > 10:
        # Truncate output
        print("...")
        break

{'event': 'on_chain_start', 'run_id': 'c5699d72-3afb-44d7-bd66-ecf073abe668', 'name': 'RunnableSequence', 'tags': ['my_chain'], 'metadata': {}, 'data': {'input': 'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`'}}
{'event': 'on_chat_model_start', 'name': 'ChatOpenAI', 'run_id': '220e6a6e-f4bc-4614-afdd-4cc709f9d1fb', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`')]]}}}
{'event': 'on_parser_start', 'name': 'JsonOutputParser', 'run_id': '6c34c77c-cf48-48c5-a151-3e227634e873', 'tags': ['seq:step:2', 'my_chain'], 'metadata': 

### Non-streaming components

Remember how some components don't stream well because they don't operate on **input streams**?

While such components can break streaming of the final output when using `astream`, `astream_events` will still yield stream events of intermediate steps that support streaming!

In [25]:
def _extract_country_names(inputs):
    """A function that does not operates on input streams and breaks streaming."""
    if not isinstance(inputs, dict):
        return ""

    if "countries" not in inputs:
        return ""

    countries = inputs["countries"]

    if not isinstance(countries, list):
        return ""

    country_names = [
        country.get("name") for country in countries if isinstance(country, dict)
    ]
    return country_names


chain = (
    model | JsonOutputParser() | _extract_country_names
)  # This parser only works with OpenAI right now

As expected, the `astream` API doesn't work correctly because `_extract_country_names` doesn't operate on streams.

In [26]:
async for chunk in chain.astream(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
):
    print(chunk)

['France', 'Spain', 'Japan']


Now, let's confirm that with astream_events we're still seeing streaming output from the model and the parser.

In [27]:
num_events = 0

async for event in chain.astream_events(
    'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`',
    version="v1",
):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        print(f"Chat model chunk: {repr(event['data']['chunk'].content)}", flush=True)
    if kind == "on_parser_stream":
        print(f"Parser chunk: {event['data']['chunk']}", flush=True)
    num_events += 1
    if num_events > 30:
        # Truncate the output
        print("...")
        break

Chat model chunk: ''
Parser chunk: {}
Chat model chunk: '{\n'
Chat model chunk: ' '
Chat model chunk: ' "'
Chat model chunk: 'countries'
Chat model chunk: '":'
Parser chunk: {'countries': []}
Chat model chunk: ' [\n'
Chat model chunk: '   '
Parser chunk: {'countries': [{}]}
Chat model chunk: ' {\n'
Chat model chunk: '     '
Chat model chunk: ' "'
Chat model chunk: 'name'
Chat model chunk: '":'
Parser chunk: {'countries': [{'name': ''}]}
Chat model chunk: ' "'
Parser chunk: {'countries': [{'name': 'France'}]}
Chat model chunk: 'France'
Chat model chunk: '",\n'
Chat model chunk: '     '
Chat model chunk: ' "'
Chat model chunk: 'population'
Chat model chunk: '":'
Parser chunk: {'countries': [{'name': 'France', 'population': ''}]}
Chat model chunk: ' "'
...


### Propagating Callbacks

:::{.callout-attention}

If you're using runnables defining custom tools, you should propagate callbacks otherwise no stream events will be generated.

:::

:::{.callout-info}

When using RunnableLambdas or @chain decorator, callbacks are propagated automatically behind the scenes.

:::

In [28]:
from langchain_core.runnables import RunnableLambda
from langchain_core.tools import tool


def reverse_word(word: str):
    return word[::-1]


reverse_word = RunnableLambda(reverse_word)


@tool
def my_func(word: str):
    """My func"""
    return reverse_word.invoke(word)


async for event in my_func.astream_events("hello", version="v1"):
    print(event)

{'event': 'on_tool_start', 'run_id': '4ff0f956-252a-47ea-ba19-c4208b848914', 'name': 'my_func', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_tool_stream', 'run_id': '4ff0f956-252a-47ea-ba19-c4208b848914', 'tags': [], 'metadata': {}, 'name': 'my_func', 'data': {'chunk': 'olleh'}}
{'event': 'on_tool_end', 'name': 'my_func', 'run_id': '4ff0f956-252a-47ea-ba19-c4208b848914', 'tags': [], 'metadata': {}, 'data': {'output': 'olleh'}}


In [29]:
@tool
def correct_tool(word: str, callbacks):
    """A tool that correctly propagates callbacks."""
    return reverse_word.invoke(word, {"callbacks": callbacks})


async for event in correct_tool.astream_events("hello", version="v1"):
    print(event)

{'event': 'on_tool_start', 'run_id': '5c08b99f-15d2-4131-a8e0-6d7fc5ce1fd7', 'name': 'correct_tool', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': 'c8c97861-6f1d-4e8d-abcf-7d13f63e2566', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': 'c8c97861-6f1d-4e8d-abcf-7d13f63e2566', 'tags': [], 'metadata': {}, 'data': {'input': 'hello', 'output': 'olleh'}}
{'event': 'on_tool_stream', 'run_id': '5c08b99f-15d2-4131-a8e0-6d7fc5ce1fd7', 'tags': [], 'metadata': {}, 'name': 'correct_tool', 'data': {'chunk': 'olleh'}}
{'event': 'on_tool_end', 'name': 'correct_tool', 'run_id': '5c08b99f-15d2-4131-a8e0-6d7fc5ce1fd7', 'tags': [], 'metadata': {}, 'data': {'output': 'olleh'}}


In [30]:
from langchain_core.runnables import RunnableLambda


async def reverse_and_double(word: str):
    return await reverse_word.ainvoke(word) * 2


reverse_and_double = RunnableLambda(reverse_and_double)

await reverse_and_double.ainvoke("1234")

async for event in reverse_and_double.astream_events("1234", version="v1"):
    print(event)

{'event': 'on_chain_start', 'run_id': '24e3e2d7-8105-42bb-9339-66f3f81083a7', 'name': 'reverse_and_double', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': '3591decf-02c0-4533-8d84-e3426b107cb3', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': '3591decf-02c0-4533-8d84-e3426b107cb3', 'tags': [], 'metadata': {}, 'data': {'input': '1234', 'output': '4321'}}
{'event': 'on_chain_stream', 'run_id': '24e3e2d7-8105-42bb-9339-66f3f81083a7', 'tags': [], 'metadata': {}, 'name': 'reverse_and_double', 'data': {'chunk': '43214321'}}
{'event': 'on_chain_end', 'name': 'reverse_and_double', 'run_id': '24e3e2d7-8105-42bb-9339-66f3f81083a7', 'tags': [], 'metadata': {}, 'data': {'output': '43214321'}}


Or you can use the chain decorator

In [31]:
from langchain_core.runnables import chain


@chain
async def reverse_and_double(word: str):
    return await reverse_word.ainvoke(word) * 2


await reverse_and_double.ainvoke("1234")

async for event in reverse_and_double.astream_events("1234", version="v1"):
    print(event)

{'event': 'on_chain_start', 'run_id': '7be3c371-f5bd-47e5-bc80-19f0284582b0', 'name': 'reverse_and_double', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': '6fb0970b-855a-4a14-83d6-a5068926f6db', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': '6fb0970b-855a-4a14-83d6-a5068926f6db', 'tags': [], 'metadata': {}, 'data': {'input': '1234', 'output': '4321'}}
{'event': 'on_chain_stream', 'run_id': '7be3c371-f5bd-47e5-bc80-19f0284582b0', 'tags': [], 'metadata': {}, 'name': 'reverse_and_double', 'data': {'chunk': '43214321'}}
{'event': 'on_chain_end', 'name': 'reverse_and_double', 'run_id': '7be3c371-f5bd-47e5-bc80-19f0284582b0', 'tags': [], 'metadata': {}, 'data': {'output': '43214321'}}
