Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agents] compute-embeddings: Implement batch and async execution #337

Merged
merged 7 commits into from
Sep 5, 2023

Conversation

eolivelli
Copy link
Member

@eolivelli eolivelli commented Sep 4, 2023

Sending batches of requests to the API allows you to reduce the number of calls and mitigate the risk to fall into errors due to rate limits.

Summary:

  • now the compute-embeddings agent sends API requests in batches and it is fully async
  • unfortunately the OpenAI Java client is not async, so actually only Vertex AI and Hugging Face REST are really async
  • introduced a new parameter batch-size with default value 10 in the compute-embeddings agent
  • introduced a new parameter flush-interval with default 0 (ms) in the compute-embeddings agent
  • the API call is executed on reaching batch-size pending records or flush-interval ms of wait.
  • the feature is disabled by default as it could add extra latency to the chatbot pipelines

Interesting things to document:

  • you should enable batching (batch-size > 0 and flush-interval > 0) on the pipelines that are not sensitive to latency (like background text processing)
  • on the chatbot pipelines it is better to keep the defaults (flush-interval to 0) in order to reduce the latency
pipeline:
  - name: "compute-embeddings"
    id: "step1"
    type: "compute-ai-embeddings"
    output: "chunks-topic"
    configuration:
      model: "text-embedding-ada-002" # This needs to match the name of the model deployment, not the base model
      embeddings-field: "value.embeddings_vector"
      text: "{{% value.text }}"
      batch-size: 10
      flush-interval: 1000

Backward compatiblity
This PR is 100% backward compatible because we set flush-interval to 0 by default, so the effect is that we always compute only 1 embeddings at a time.

@eolivelli eolivelli closed this Sep 4, 2023
@eolivelli eolivelli reopened this Sep 4, 2023
@eolivelli eolivelli merged commit d2bb1ea into main Sep 5, 2023
8 checks passed
@eolivelli eolivelli deleted the impl/batch-embeddings branch September 5, 2023 06:40
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant