Skip to content

[Question]: How is OpenAI API Call Concurrency Managed? #1095

@zhangxingeng

Description

@zhangxingeng

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi there,

First of all, thank you for sharing this amazing project! I’ve been going through the code, and I really appreciate the effort and thought put into it.

I had a question regarding how the project manages concurrent OpenAI API calls. From what I can see, multiple functions (e.g., extract_entities, kg_query, mix_kg_vector_query, etc.) use llm_model_func, which in turn calls openai_complete_if_cache. However, I didn’t notice any explicit mechanism limiting the number of concurrent requests to OpenAI.

Given that OpenAI enforces rate limits, I was wondering:

  1. Is there an existing mechanism to control the number of concurrent API calls that I might have missed?
  2. If not, was this an intentional design choice, or could this potentially lead to exceeding rate limits when multiple requests are sent simultaneously?
  3. Would adding an async semaphore (e.g., asyncio.Semaphore) be a recommended way to limit concurrent calls if needed?

I just wanted to clarify this before running it at scale. I really appreciate any insights you can provide. Thanks again for the great work!

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions