Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel processing for OpenAI completion models #1460

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pbevan1
Copy link
Contributor

@pbevan1 pbevan1 commented Feb 22, 2024

Add parallel processing for OpenAI completion models

Implements https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py to speed up OpenAI API calls as per #1410

  • Makes requests concurrently, to maximize throughput
  • Throttles request and token usage, to stay under rate limits (max_tokens_per_minute, max_requests_per_minute)
  • Retries failed requests up to {max_attempts} times, to avoid missing data

Gives many X speedup for OpenAI models (dependent on users rate limits)

Uses a 1 token dummy request to get a user & model specific Token Per Minute (TPM) rate limit. Requests Per Minute (RPM) are not available programmatically for some reason, but I've raised this with openai. Both RPM and TPM can be overridden in the gen_kwargs using max_tokens_per_minute and max_requests_per_minute. Not sure if/where that should be documented?

Just implemened for openai chat completions since completions is now legacy, but easy to also do this for completions. Also just separated the local model call from the OpenAI model call as not sure people would want to do async/parallel for a local model?

Also, I follow the openai example and cache requests to jsonl file. I cache in temp and do clean up after, but please let me know if it would be better just doing it in memory (I'm not certain of the size of all of the evaluations).

@CLAassistant
Copy link

CLAassistant commented Feb 22, 2024

CLA assistant check
All committers have signed the CLA.

@haileyschoelkopf
Copy link
Contributor

Thanks very much for this PR! I will try to review it as soon as I can.

@sanchit-ahuja
Copy link

sanchit-ahuja commented Mar 3, 2024

Hi @pbevan1 ,
This PR looks quite good to me. Left some comments that you can easily address. Some few general thoughts that I had. @haileyschoelkopf please feel free to add in/chip in.

  1. Can you add some unit tests for your core logic?
  2. Move constants to a constant file and then import them to the file. (Have added comments about this)
  3. Your logic functions are big. Is it possible to chunk them down to smaller functions?

I cache in temp and do clean up after, but please let me know if it would be better just doing it in memory

AFAIK, the lm-eval-harness does not support any parallelization. Is it possible for you to do a stress test or an evaluation for both the approaches and then put up numbers here? It will be a great exercise and will help us make an informed decision as to how we want to proceed ahead with supporting these kind of operations for other LM models as well. (Again, @haileyschoelkopf please feel free to correct me here, I might be wrong)

I believe adding tests should be a priority here.

@LSinev
Copy link
Contributor

LSinev commented Jun 13, 2024

Any plans on resolving conflicts, adding changes/updates and merging before the new release?

@Peter-Devine
Copy link

Bump - this would be a really useful feature that would massively speed up eval time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants