-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Description
Problem:
Current implementation processes prompts sequentially, making evaluations on large datasets extremely slow. Need concurrent processing with rate limit protection.
Requirements:
- Process prompts in batches with configurable concurrency
- Add semaphore-based rate limiting. Read more about semaphore here.
- Respect API rate limits
- Provide batch size configuration
- Handle failures gracefully in batch mode
Acceptance Criteria:
-
ModelAdapterhasbatch_generate()method - Configurable
max_concurrent_requestsparameter - Configurable
batch_sizeparameter - Respects rate limits (uses
asyncio.Semaphore) - Failed requests don't stop entire batch
- 10x faster on 100+ prompt datasets
- Tests verify concurrent execution
Implementation Hints:
async def batch_generate(
self,
prompts: List[str],
batch_size: int = 10,
max_concurrent: int = 5,
**kwargs
) -> List[str]:
semaphore = asyncio.Semaphore(max_concurrent)
async def generate_with_limit(prompt):
async with semaphore:
return await self._generate_single(prompt, **kwargs)
batches = [prompts[i:i+batch_size] for i in range(0, len(prompts), batch_size)]
all_results = []
for batch in batches:
batch_results = await asyncio.gather(*[
generate_with_limit(p) for p in batch
], return_exceptions=True)
all_results.extend(batch_results)
return all_resultsReactions are currently unavailable