Skip to content

Commit 4b23389

Browse files
authored
Python: Emit token usage with streaming chat completion agent. (#12416)
### Motivation and Context The chat completion agent was not emitting token using during streaming invocation because we were only allowing through `response.items`. In the case of token usage, `response.items` is [] and the usage is contained as part of the message's `metadata` dict. This PR fixes that bug and allows for `response.items or response.metadata.get("usage")`. Two new samples are added to the concepts/agents/chat_completion dir to show how one can track token use for streaming and non-streaming agent invocation. Token usage handling is also added to the chat completion agent integration tests. <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> ### Description - Fixes a bug where we weren't emitting the streaming token usage for the chat completion agent. Also now includes the `prompt_tokens_details` and `completion_tokens_details` models that are returned, but not previously handled. - Adds new samples - Updates integration tests to track token usage and make sure they're non-zero. - Closes #12411 <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [X] The code builds clean without any errors or warnings - [X] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [X] All unit tests pass, and I have added new tests where possible - [X] I didn't break anyone 😄
1 parent c3c8dfa commit 4b23389

File tree

6 files changed

+268
-5
lines changed

6 files changed

+268
-5
lines changed

python/samples/concepts/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,13 @@
4444

4545
- [Chat Completion Agent as Kernel Function](./agents/chat_completion_agent/chat_completion_agent_as_kernel_function.py)
4646
- [Chat Completion Agent Function Termination](./agents/chat_completion_agent/chat_completion_agent_function_termination.py)
47-
- [Chat Completion Agent Templating](./agents/chat_completion_agent/chat_completion_agent_prompt_templating.py)
4847
- [Chat Completion Agent Message Callback Streaming](./agents/chat_completion_agent/chat_completion_agent_message_callback_streaming.py)
4948
- [Chat Completion Agent Message Callback](./agents/chat_completion_agent/chat_completion_agent_message_callback.py)
49+
- [Chat Completion Agent Templating](./agents/chat_completion_agent/chat_completion_agent_prompt_templating.py)
50+
- [Chat Completion Agent Streaming Token Usage](./agents/chat_completion_agent/chat_completion_agent_streaming_token_usage.py)
5051
- [Chat Completion Agent Summary History Reducer Agent Chat](./agents/chat_completion_agent/chat_completion_agent_summary_history_reducer_agent_chat.py)
5152
- [Chat Completion Agent Summary History Reducer Single Agent](./agents/chat_completion_agent/chat_completion_agent_summary_history_reducer_single_agent.py)
53+
- [Chat Completion Agent Token Usage](./agents/chat_completion_agent/chat_completion_agent_token_usage.py)
5254
- [Chat Completion Agent Truncate History Reducer Agent Chat](./agents/chat_completion_agent/chat_completion_agent_truncate_history_reducer_agent_chat.py)
5355
- [Chat Completion Agent Truncate History Reducer Single Agent](./agents/chat_completion_agent/chat_completion_agent_truncate_history_reducer_single_agent.py)
5456

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Copyright (c) Microsoft. All rights reserved.
2+
3+
import asyncio
4+
from typing import Annotated
5+
6+
from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread
7+
from semantic_kernel.connectors.ai.completion_usage import CompletionUsage
8+
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
9+
from semantic_kernel.functions import kernel_function
10+
11+
"""
12+
The following sample demonstrates how to create a chat completion agent
13+
and use it with streaming responses. It also shows how to track token
14+
usage during the streaming process.
15+
"""
16+
17+
18+
# Define a sample plugin for the sample
19+
class MenuPlugin:
20+
"""A sample Menu Plugin used for the concept sample."""
21+
22+
@kernel_function(description="Provides a list of specials from the menu.")
23+
def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]:
24+
return """
25+
Special Soup: Clam Chowder
26+
Special Salad: Cobb Salad
27+
Special Drink: Chai Tea
28+
"""
29+
30+
@kernel_function(description="Provides the price of the requested menu item.")
31+
def get_item_price(
32+
self, menu_item: Annotated[str, "The name of the menu item."]
33+
) -> Annotated[str, "Returns the price of the menu item."]:
34+
return "$9.99"
35+
36+
37+
async def main() -> None:
38+
agent = ChatCompletionAgent(
39+
service=AzureChatCompletion(),
40+
name="Assistant",
41+
instructions="Answer questions about the menu.",
42+
plugins=[MenuPlugin()],
43+
)
44+
45+
# Create a thread for the agent
46+
# If no thread is provided, a new thread will be
47+
# created and returned with the initial response
48+
thread: ChatHistoryAgentThread = None
49+
50+
user_inputs = [
51+
"Hello",
52+
"What is the special soup?",
53+
"How much does that cost?",
54+
"Thank you",
55+
]
56+
57+
completion_usage = CompletionUsage()
58+
59+
for user_input in user_inputs:
60+
print(f"\n# User: '{user_input}'")
61+
async for response in agent.invoke_stream(
62+
messages=user_input,
63+
thread=thread,
64+
):
65+
if response.content:
66+
print(response.content, end="", flush=True)
67+
if response.metadata.get("usage"):
68+
completion_usage += response.metadata["usage"]
69+
print(f"\nStreaming Usage: {response.metadata['usage']}")
70+
thread = response.thread
71+
print()
72+
73+
# Print the completion usage
74+
print(f"\nStreaming Total Completion Usage: {completion_usage.model_dump_json(indent=4)}")
75+
76+
"""
77+
Sample Output:
78+
79+
# User: 'Hello'
80+
Hello! How can I help you with the menu today?
81+
82+
# User: 'What is the special soup?'
83+
The special soup today is Clam Chowder. Would you like more details or are you interested in something else from
84+
the menu?
85+
86+
# User: 'How much does that cost?'
87+
The Clam Chowder special soup costs $9.99. Would you like to add it to your order or ask about something else?
88+
89+
# User: 'Thank you'
90+
You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your meal!
91+
92+
Streaming Total Completion Usage: {
93+
"prompt_tokens": 1150,
94+
"prompt_tokens_details": {
95+
"audio_tokens": 0,
96+
"cached_tokens": 0
97+
},
98+
"completion_tokens": 134,
99+
"completion_tokens_details": {
100+
"accepted_prediction_tokens": 0,
101+
"audio_tokens": 0,
102+
"reasoning_tokens": 0,
103+
"rejected_prediction_tokens": 0
104+
}
105+
}
106+
"""
107+
108+
109+
if __name__ == "__main__":
110+
asyncio.run(main())
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Copyright (c) Microsoft. All rights reserved.
2+
3+
import asyncio
4+
from typing import Annotated
5+
6+
from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread
7+
from semantic_kernel.connectors.ai.completion_usage import CompletionUsage
8+
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
9+
from semantic_kernel.functions import kernel_function
10+
11+
"""
12+
The following sample demonstrates how to create a chat completion agent
13+
and use it with non-streaming responses. It also shows how to track token
14+
usage during agent invoke.
15+
"""
16+
17+
18+
# Define a sample plugin for the sample
19+
class MenuPlugin:
20+
"""A sample Menu Plugin used for the concept sample."""
21+
22+
@kernel_function(description="Provides a list of specials from the menu.")
23+
def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]:
24+
return """
25+
Special Soup: Clam Chowder
26+
Special Salad: Cobb Salad
27+
Special Drink: Chai Tea
28+
"""
29+
30+
@kernel_function(description="Provides the price of the requested menu item.")
31+
def get_item_price(
32+
self, menu_item: Annotated[str, "The name of the menu item."]
33+
) -> Annotated[str, "Returns the price of the menu item."]:
34+
return "$9.99"
35+
36+
37+
async def main() -> None:
38+
agent = ChatCompletionAgent(
39+
service=AzureChatCompletion(),
40+
name="Assistant",
41+
instructions="Answer questions about the menu.",
42+
plugins=[MenuPlugin()],
43+
)
44+
45+
# Create a thread for the agent
46+
# If no thread is provided, a new thread will be
47+
# created and returned with the initial response
48+
thread: ChatHistoryAgentThread = None
49+
50+
user_inputs = [
51+
"Hello",
52+
"What is the special soup?",
53+
"How much does that cost?",
54+
"Thank you",
55+
]
56+
57+
completion_usage = CompletionUsage()
58+
59+
for user_input in user_inputs:
60+
print(f"\n# User: '{user_input}'")
61+
async for response in agent.invoke(
62+
messages=user_input,
63+
thread=thread,
64+
):
65+
if response.content:
66+
print(response.content)
67+
if response.metadata.get("usage"):
68+
completion_usage += response.metadata["usage"]
69+
thread = response.thread
70+
print()
71+
72+
# Print the completion usage
73+
print(f"\nNon-Streaming Total Completion Usage: {completion_usage.model_dump_json(indent=4)}")
74+
75+
"""
76+
Sample Output:
77+
78+
# User: 'Hello'
79+
Hello! How can I help you with the menu today?
80+
81+
82+
# User: 'What is the special soup?'
83+
The special soup today is Clam Chowder. Would you like to know more about it or see the other specials?
84+
85+
86+
# User: 'How much does that cost?'
87+
The Clam Chowder special costs $9.99. Would you like to add that to your order or need more information?
88+
89+
90+
# User: 'Thank you'
91+
You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!
92+
93+
Non-Streaming Total Completion Usage: {
94+
"prompt_tokens": 772,
95+
"prompt_tokens_details": {
96+
"audio_tokens": 0,
97+
"cached_tokens": 0
98+
},
99+
"completion_tokens": 92,
100+
"completion_tokens_details": {
101+
"accepted_prediction_tokens": 0,
102+
"audio_tokens": 0,
103+
"reasoning_tokens": 0,
104+
"rejected_prediction_tokens": 0
105+
}
106+
}
107+
"""
108+
109+
110+
if __name__ == "__main__":
111+
asyncio.run(main())

python/semantic_kernel/agents/chat_completion/chat_completion_agent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ async def invoke_stream(
452452

453453
if (
454454
role == AuthorRole.ASSISTANT
455-
and response.items
455+
and (response.items or response.metadata.get("usage"))
456456
and not any(
457457
isinstance(item, (FunctionCallContent, FunctionResultContent)) for item in response.items
458458
)
Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,56 @@
11
# Copyright (c) Microsoft. All rights reserved.
22

3+
34
from openai.types import CompletionUsage as OpenAICompletionUsage
5+
from openai.types.completion_usage import CompletionTokensDetails, PromptTokensDetails
46

57
from semantic_kernel.kernel_pydantic import KernelBaseModel
68

79

810
class CompletionUsage(KernelBaseModel):
9-
"""Completion usage information."""
11+
"""A class representing the usage of tokens in a completion request."""
1012

1113
prompt_tokens: int | None = None
14+
prompt_tokens_details: PromptTokensDetails | None = None
1215
completion_tokens: int | None = None
16+
completion_tokens_details: CompletionTokensDetails | None = None
1317

1418
@classmethod
1519
def from_openai(cls, openai_completion_usage: OpenAICompletionUsage):
16-
"""Create a CompletionUsage object from an OpenAI response."""
20+
"""Create a CompletionUsage instance from an OpenAICompletionUsage instance."""
1721
return cls(
1822
prompt_tokens=openai_completion_usage.prompt_tokens,
23+
prompt_tokens_details=openai_completion_usage.prompt_tokens_details
24+
if openai_completion_usage.prompt_tokens_details
25+
else None,
1926
completion_tokens=openai_completion_usage.completion_tokens,
27+
completion_tokens_details=openai_completion_usage.completion_tokens_details
28+
if openai_completion_usage.completion_tokens_details
29+
else None,
2030
)
2131

2232
def __add__(self, other: "CompletionUsage") -> "CompletionUsage":
23-
"""Add two CompletionUsage objects."""
33+
"""Combine two CompletionUsage instances by summing their token counts."""
34+
35+
def _merge_details(cls, a, b):
36+
"""Merge two details objects by summing their fields."""
37+
if a is None and b is None:
38+
return None
39+
kwargs = {}
40+
for field in cls.__annotations__:
41+
x = getattr(a, field, None)
42+
y = getattr(b, field, None)
43+
value = None if x is None and y is None else (x or 0) + (y or 0)
44+
kwargs[field] = value
45+
return cls(**kwargs)
46+
2447
return CompletionUsage(
2548
prompt_tokens=(self.prompt_tokens or 0) + (other.prompt_tokens or 0),
2649
completion_tokens=(self.completion_tokens or 0) + (other.completion_tokens or 0),
50+
prompt_tokens_details=_merge_details(
51+
PromptTokensDetails, self.prompt_tokens_details, other.prompt_tokens_details
52+
),
53+
completion_tokens_details=_merge_details(
54+
CompletionTokensDetails, self.completion_tokens_details, other.completion_tokens_details
55+
),
2756
)

python/tests/integration/agents/chat_completion_agent/test_chat_completion_agent_integration.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import pytest
66

77
from semantic_kernel.agents import ChatCompletionAgent
8+
from semantic_kernel.connectors.ai.completion_usage import CompletionUsage
89
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatCompletion
910
from semantic_kernel.contents import AuthorRole, ChatMessageContent, StreamingChatMessageContent
1011
from semantic_kernel.contents.image_content import ImageContent
@@ -86,10 +87,15 @@ async def test_invoke(self, chat_completion_agent: ChatCompletionAgent, agent_te
8687
"""Test invoke of the agent."""
8788
responses = await agent_test_base.get_invoke_with_retry(chat_completion_agent, messages="Hello")
8889
assert len(responses) > 0
90+
usage: CompletionUsage = CompletionUsage()
8991
for response in responses:
9092
assert isinstance(response.message, ChatMessageContent)
9193
assert response.message.role == AuthorRole.ASSISTANT
9294
assert response.message.content is not None
95+
if response.metadata.get("usage"):
96+
usage += response.metadata["usage"]
97+
assert usage.prompt_tokens > 0
98+
assert usage.completion_tokens > 0
9399

94100
@pytest.mark.parametrize("chat_completion_agent", ["azure", "openai"], indirect=True, ids=["azure", "openai"])
95101
async def test_invoke_with_thread(self, chat_completion_agent: ChatCompletionAgent, agent_test_base: AgentTestBase):
@@ -115,10 +121,15 @@ async def test_invoke_stream(self, chat_completion_agent: ChatCompletionAgent, a
115121
"""Test invoke stream of the agent."""
116122
responses = await agent_test_base.get_invoke_stream_with_retry(chat_completion_agent, messages="Hello")
117123
assert len(responses) > 0
124+
usage: CompletionUsage = CompletionUsage()
118125
for response in responses:
119126
assert isinstance(response.message, StreamingChatMessageContent)
120127
assert response.message.role == AuthorRole.ASSISTANT
121128
assert response.message.content is not None
129+
if response.metadata.get("usage"):
130+
usage += response.metadata["usage"]
131+
assert usage.prompt_tokens > 0
132+
assert usage.completion_tokens > 0
122133

123134
@pytest.mark.parametrize("chat_completion_agent", ["azure", "openai"], indirect=True, ids=["azure", "openai"])
124135
async def test_invoke_stream_with_thread(

0 commit comments

Comments
 (0)