How to impose token count limits on ChatHistory #12450

yuichiromukaiyama · 2025-06-11T09:40:50Z

yuichiromukaiyama
Jun 11, 2025

I was previously looking into the following issue. I don’t think the discussion has progressed much, but in Semantic Kernel, when the token count of ChatHistory increases, is there a best practice for trimming the conversation history—either by removing only the minimum necessary parts of the history or by cutting off the last user prompt—once it exceeds a certain token threshold?

#6155

I understand that there is a feature for summarizing ChatHistory. However, my understanding is that this function has to be called explicitly. For example, in an AI agent implementation where inference is run multiple times, you have to remember to call it every time. Rather than summarizing, I feel I would rather just delete excess tokens by trimming the conversation history or the last message content directly.

Currently, this is what I’m doing:
• I call the inference function wrapped in another function. Before inference, I manually count the tokens using tiktoken, and delete parts manually.

TaoChenOSU · 2025-06-13T20:03:16Z

TaoChenOSU
Jun 13, 2025
Collaborator

Hi @yuichiromukaiyama,

Looks like you are using Python based on your comment in the other post.

Here is a sample on using the truncation history reducer with the ChatCompletionAgent: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/agents/chat_completion_agent/chat_completion_agent_truncate_history_reducer_single_agent.py

You can set the reducer to auto reduce:

history_truncation_reducer = ChatHistoryTruncationReducer(
  target_count=reducer_msg_count,
  threshold_count=reducer_threshold,
  auto_reduce=True,
)

3 replies

moonbox3 Jun 17, 2025
Collaborator

Just a note: the auto_reduce assumes you're adding messages to the chat history via add_message_async(...), which isn't explicitly shown in that sample.

yuichiromukaiyama Jun 21, 2025
Author

@TaoChenOSU @moonbox3

Thank you! I tested it with the code below, and I confirmed that messages are correctly deleted from ChatHistory when the number of messages exceeds the specified limit. However, what I actually intended was to delete messages when a certain number of tokens is exceeded, so that the context window wouldn’t overflow.

Currently, I’m manually counting tokens using tiktoken for this purpose, but I was wondering if there’s any way to automatically delete messages based on token count rather than the number of message turns.

Does ChatHistoryTruncationReducer only support specifying the number of turns? Or would it be possible to add custom functions—like hooks—to this history object, so that a customized function could be executed each time a message is added?

from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import AuthorRole, ChatHistoryTruncationReducer, ChatMessageContent


async def main() -> None:
    k = Kernel()
    k.add_service(
        AzureChatCompletion(
            service_id="openai",
            deployment_name="gpt-4.1-nano-2025-04-14",
            api_key="dummy",
            endpoint="https://dummy.openai.azure.com",
        )
    )

    client = k.get_service("openai")

    if not isinstance(client, AzureChatCompletion):
        return

    settings = client.get_prompt_execution_settings_class()(service_id="openai")

    history = ChatHistoryTruncationReducer(target_count=3, threshold_count=3, auto_reduce=True)
    await history.add_message_async(ChatMessageContent(role=AuthorRole.USER, content="Hi!"))
    await history.add_message_async(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Hello!"))
    await history.add_message_async(ChatMessageContent(role=AuthorRole.USER, content="How are you?"))
    await history.add_message_async(ChatMessageContent(role=AuthorRole.ASSISTANT, content="I'm good, thanks."))
    await history.add_message_async(ChatMessageContent(role=AuthorRole.USER, content="What can you do?"))
    await history.add_message_async(ChatMessageContent(role=AuthorRole.ASSISTANT, content="I can help you."))

    print("Chat history after reduction:", len(history.messages))
    for r in await client.get_chat_message_contents(history, settings):
        print(f"{r.role}: {r.content}")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

yuichiromukaiyama Jun 21, 2025
Author

If such a token-based truncation feature doesn’t exist, I’m thinking of creating a custom class in my project that overrides the reduce method of ChatHistoryTruncationReducer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to impose token count limits on ChatHistory #12450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to impose token count limits on ChatHistory #12450

Uh oh!

yuichiromukaiyama Jun 11, 2025

Replies: 1 comment · 3 replies

Uh oh!

TaoChenOSU Jun 13, 2025 Collaborator

Uh oh!

moonbox3 Jun 17, 2025 Collaborator

Uh oh!

yuichiromukaiyama Jun 21, 2025 Author

Uh oh!

yuichiromukaiyama Jun 21, 2025 Author

yuichiromukaiyama
Jun 11, 2025

Replies: 1 comment 3 replies

TaoChenOSU
Jun 13, 2025
Collaborator

moonbox3 Jun 17, 2025
Collaborator

yuichiromukaiyama Jun 21, 2025
Author

yuichiromukaiyama Jun 21, 2025
Author