## Streaming Agent Messages with Token Streaming

You can stream messages generated by an agent in real-time using either `on_messages_stream()` or `run_stream()`. Additionally, you can stream the individual tokens generated by the underlying model.

### Token Streaming with `model_client_stream=True`

* Setting `model_client_stream=True` enables token streaming from the model client.
* This will cause the agent to yield `ModelClientStreamingChunkEvent` messages within the `on_messages_stream()` and `run_stream()` generators.
* These `ModelClientStreamingChunkEvent` messages contain the individual tokens as they are generated by the model.
* This provides a very granular view of the model's output.

**Important Note:**

* The underlying model API must support streaming tokens for this feature to function.
* Consult your model provider's documentation to confirm token streaming capabilities.

### `on_messages_stream()`

* This method provides an asynchronous generator.
* It yields each individual message produced by the agent.
* If `model_client_stream=True`, it will also yield `ModelClientStreamingChunkEvent` messages.
* The final item yielded is the complete response message, accessible through the `chat_message` attribute.
* This allows you to observe the agent's thought process, actions, and token generation as they occur.
* For example, you can use `Console` to print these messages to the console as they are generated.
* You can observe the agent using tools like `web_search` and see the results that influence its response.

In [22]:
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_core import CancellationToken
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.ui import Console

In [23]:

model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
)
agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    model_client_stream=True,
)

In [24]:
async def assistant_run_stream() -> None:
    # Option 1: read each message from the stream (as shown in the previous example).
    # async for message in agent.on_messages_stream(
    #     [TextMessage(content="Share information about azure ai search.", source="user")],
    #     cancellation_token=CancellationToken(),
    # ):
    #     print(message)

    # Option 2: use Console to print all messages as they appear.
    await Console(
        agent.on_messages_stream(
            [TextMessage(content="Share information about azure ai search.", source="user")],
            cancellation_token=CancellationToken(),
        ),
        output_stats=True,  # Enable stats printing.
    )


# Use asyncio.run(assistant_run_stream()) when running in a script.
await assistant_run_stream()

---------- assistant ----------
Azure AI Search, also known as Azure Cognitive Search, is a cloud search service provided by Microsoft Azure. It is designed to help developers build rich search experiences over large volumes of content, enabling capabilities like full-text search, sophisticated querying, and more. Here are some key features and functionalities of Azure AI Search:

1. **Full-Text Search**: Azure Cognitive Search provides powerful indexing and querying capabilities that allow users to search through large sets of unstructured and structured data.

2. **AI Enrichment**: The service can enrich content using built-in cognitive skills to extract insights from various data sources. This includes image analysis, natural language processing, and entity recognition.

3. **Faceted Navigation**: It allows users to narrow down search results through faceted navigation, which is filtering based on categories or properties.

4. **Scalability**: Being part of Azure, it can scale to fi