-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Library name and version
Azure.AI.Agents.Persistent 1.2.0-beta.8
Describe the bug
Each subsequent call of PersistentAgentsChatClient.GetStreamingResponseAsync on the same thread takes 500 - 1000 ms longer than the previous one. This is because of a really inefficient line of code that loops through every previous ThreadRun every time. Just to see if any of them are still running.
await foreach (ThreadRun? run in _client!.Runs.GetRunsAsync(threadId, limit: 1, ListSortOrder.Descending, cancellationToken: cancellationToken).ConfigureAwait(false))
Not only that, it does a separate API call for each one.
It explicitly sets `limit` to 1 and order to Descending. Which does not mean:
"Get the single most recent run"
which would make more sense, but instead, because of the foreach, means
"Loop through every single run ever made on this thread, one at a time. Don't batch the API calls or anything"
We can see that _client!.Runs.GetRunsAsync returns AsyncPageable<ThreadRun>, and AsyncPageable clearly states
/// Enumerate the values in the collection asynchronously. This may /// make multiple service requests.
Expected behavior
I think the expected behavior of this line of code may have originally been
"Get the single most recent run"
and looking at every single run might be unintentional. That's what would make more sense to me. Get most recent run, see if it's still active and needs to be canceled or is a special case of a tool response.
If it is indeed necessary to look at every single run, I would expect it not to be forced to do it one at a time! Remove limit: 1 and let it use the default of 20.
Or, at a minimum, since I have set the ThreadAndRunOptions.TruncationStrategy to be new Truncation(TruncationStrategy.LastMessages) { LastMessages = 10 }, then at least make a special case for me to not have to loop through the entire message history when I'm trying to window the context to just the 10 most recent anyway.
Actual behavior
Really slow. Lots of API calls.
Reproduction Steps
Step 1: Create azure-openai agent in Foundry
Step 2:
var client = new PersistentAgentsClient("your connection string", new DefaultAzureCredential());
PersistentAgentThread thread = client.Threads.CreateThread();
var message = new ChatMessage(ChatRole.User, "Hi!");
for (int i = 0; i < 25; i++)
{
var startTime = DateTime.UtcNow;
await foreach (var update in client.AsIChatClient("your agent id").GetStreamingResponseAsync([message], options: new ChatOptions() { ConversationId = thread.Id }))
{
//Console.Write(update.Text);
}
Console.WriteLine($"Duration: {DateTime.UtcNow - startTime}");
}
Step 3: :(
Environment
- net8.0
- Windows 11 Enterprise
- Visual Studio 18.0.11201.2