Hi,
The current scikit-llm is implemented in a synchronous way - the prompts are sent to the api one-by-one.
This is not ideal when we have a large dataset and a high tier (high TPM/RPM) account. Is it possible to incorporate batched async feature?
Reference:
oaib