Skip to content

Build benchmark of OpenAI assistants API latency #198

@EdmundKorley

Description

@EdmundKorley

Since Assistants API is now basically deprecated, we want to start an epic to add support for the Responses API, which all improvements for building RAG chatbots in OpenAI will land.

Image

First we want to measure the latency of the OpenAI assistants api so we can measure what the performance gains will be. Anecdotally, the Responses API seems at least 2x as fast but we need to measure this so will be building a benchmark in this ticket.

Here is a high level plan:

Create a set of test assistants

[small-vector-store (2MB), medium-vector-store (10MB), large-vector-store (100MB)]
[kunji-assistant (100MB)] make copy of prod assistant
[english-vector-store, hindi-vector-store] × [english-queries, hindi-queries]

On the example assistants created, run a list of 50-100 test queries and measure latency for each call and compute mean latency.

Will try to set up as a CLI command not as part of test suite because we don't want to mock out OpenAI API (we want to make real calls to the service).

The /threads/sync which is just used for load testing.

CI

We can run the benchmark in GitHub Actions but eventually we could post results to spreadsheet.

Next steps

This is to prepare for a separate ticket where we will add a separate set of endpoints for the synchronous Responses API and re-run the benchmark to measure the performance improvements we sell.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions