[Question]: How is OpenAI API Call Concurrency Managed?

### Do you need to ask a question?

- [x] I have searched the existing question and discussions and this question is not already answered.
- [x] I believe this is a legitimate question, not just a bug or feature request.

### Your Question





**Hi there,**  

First of all, thank you for sharing this amazing project! I’ve been going through the code, and I really appreciate the effort and thought put into it.  

I had a question regarding how the project manages concurrent OpenAI API calls. From what I can see, multiple functions (e.g., `extract_entities`, `kg_query`, `mix_kg_vector_query`, etc.) use `llm_model_func`, which in turn calls `openai_complete_if_cache`. However, I didn’t notice any explicit mechanism limiting the number of concurrent requests to OpenAI.  

Given that OpenAI enforces rate limits, I was wondering:  

1. **Is there an existing mechanism to control the number of concurrent API calls that I might have missed?**  
2. **If not, was this an intentional design choice, or could this potentially lead to exceeding rate limits when multiple requests are sent simultaneously?**  
3. **Would adding an async semaphore (e.g., `asyncio.Semaphore`) be a recommended way to limit concurrent calls if needed?**  

I just wanted to clarify this before running it at scale. I really appreciate any insights you can provide. Thanks again for the great work!  



### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: How is OpenAI API Call Concurrency Managed? #1095

Do you need to ask a question?

Your Question

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: How is OpenAI API Call Concurrency Managed? #1095

Description

Do you need to ask a question?

Your Question

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions