Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` #371

alvarobartt · 2024-03-01T09:41:00Z

Description

This PR moves the batching from the Task to the LLM so that the LLM handles the batches in the best way possible rather than via a simple for-loop, since there are some LLM engines that have mechanisms to handle batches in a more efficient way. Also the prepare_input abstract method has been removed from LLM and is no longer needed, unless a specific LLM implementation requires it.

Also this PR fixes vLLM.generate and stops propagating the unsolicited inputs through the Pipeline, so that only the ones solicited via the property are kept.

Besides that, this PR also adds the DISTILABEL_LOG_LEVEL environment variable to control the log level of distilabel, which defaults to INFO.

Example

Find a full example at https://huggingface.co/datasets/alvarobartt/instruction-dataset-mistral-7b-instruct-v0.2/blob/main/example.py

plaguss

🚀

alvarobartt added 9 commits February 27, 2024 17:09

Remove prepare_input from LLM abstract class

30da8d8

Add inputs to LLM.generate instead of input

402c369

Fix Task.process running LLM.generate

6cd6ac3

Add DISTILABEL_LOG_LEVEL for logging

dd95563

Add _logger to LLM

a162630

Remove for-loop in favour of vLLM batching

2ee9167

Fix vLLM.generate

a18e7f1

Add vLLM docstring

c7b003a

Skip unsolicited inputs in Task

4fb29c2

alvarobartt changed the title ~~WIP~~ Move batching from Task to LLM, fix vLLM.generate and add DISTILABEL_LOG_LEVEL Mar 1, 2024

alvarobartt requested review from plaguss and gabrielmbmb March 1, 2024 13:28

alvarobartt self-assigned this Mar 1, 2024

alvarobartt added fix improvement labels Mar 1, 2024

alvarobartt added this to the 1.0.0 milestone Mar 1, 2024

alvarobartt marked this pull request as ready for review March 1, 2024 13:28

plaguss approved these changes Mar 1, 2024

View reviewed changes

alvarobartt merged commit 9937c43 into core-refactor Mar 2, 2024
0 of 4 checks passed

alvarobartt deleted the generate-handles-batch branch March 2, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` #371

Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` #371

alvarobartt commented Mar 1, 2024 •

edited

Loading

plaguss left a comment

Move batching from Task to LLM, fix vLLM.generate and add DISTILABEL_LOG_LEVEL #371

Move batching from Task to LLM, fix vLLM.generate and add DISTILABEL_LOG_LEVEL #371

Conversation

alvarobartt commented Mar 1, 2024 • edited Loading

Description

Example

plaguss left a comment

Choose a reason for hiding this comment

Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` #371

Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` #371

alvarobartt commented Mar 1, 2024 •

edited

Loading