Move batching from Task
to LLM
, fix vLLM.generate
and add DISTILABEL_LOG_LEVEL
#371
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR moves the batching from the
Task
to theLLM
so that theLLM
handles the batches in the best way possible rather than via a simple for-loop, since there are some LLM engines that have mechanisms to handle batches in a more efficient way. Also theprepare_input
abstract method has been removed fromLLM
and is no longer needed, unless a specific LLM implementation requires it.Also this PR fixes
vLLM.generate
and stops propagating the unsolicitedinputs
through thePipeline
, so that only the ones solicited via theproperty
are kept.Besides that, this PR also adds the
DISTILABEL_LOG_LEVEL
environment variable to control the log level ofdistilabel
, which defaults toINFO
.Example
Find a full example at https://huggingface.co/datasets/alvarobartt/instruction-dataset-mistral-7b-instruct-v0.2/blob/main/example.py