Use `QueueHandler` for `Pipeline` logging #489

gabrielmbmb · 2024-03-28T13:56:00Z

Description

This PR updates the logging setup for the local Pipeline to use a QueueHandler and a QueueListener (a thread in the main process). With this setup, child processes send log messages using a multiprocessing.Queue and the QueueHandler to the QueueListener, which will be in charge of handling the log messages.

This change was mainly motivated to improve the logging messages in environments like Google Colab and IPython Notebooks, in which not all the logging messages (from the child processes) were being displayed.

alvarobartt · 2024-03-28T15:42:53Z

src/distilabel/llm/base.py

@@ -72,8 +70,8 @@ def generate(
        per input in `inputs`."""
        pass

-    @cached_property
-    def generate_parameters(self) -> List[inspect.Parameter]:
+    @property


Then the from functools import cached_property can be removed too, right?

yes! BTW I removed it because in some environments pickle was not able to serialize it.

alvarobartt · 2024-03-28T15:44:21Z

src/distilabel/steps/task/generate_embeddings.py

@@ -45,6 +45,8 @@ class GenerateEmbeddings(Step):
    llm: LLM

    def load(self) -> None:
+        """Loads the `LLM` used to generate the embeddings."""
+        super().load()


If all the steps inheriting from Step need to actually call the super().load() isn't it better if we just add it as a separate method that runs within the post_init? i.e. model_post_init on _Step?

Sadly we can't do that :( We need to create the logger after having called setup_logging in each process.

Can we then just do step.load() and then step.setup_logging? But to manage that within the Pipeline instead of having to always add that as part of the load method :/

gabrielmbmb and others added 12 commits March 27, 2024 20:13

Use QueueHandler to send all logs to main process

72fd69c

Remove multiprocess dep

68607b7

Fix multiprocess logging

cd92dd5

multiprocess again?

f45122c

Fix setting up listener

d7144c2

Use multiprocessing again

7dee9ee

Remove get_logger function

6e7838c

Set multiprocessing start method to forkserver

2e48886

Fix cannot pickle cache_property

ea4acf6

Fix _logger not set

5e027c2

Improve visibility on step loading

319430d

Add missing log level from env variable

7766986

gabrielmbmb added enhancement New feature or request fix labels Mar 28, 2024

gabrielmbmb added this to the 1.0.0 milestone Mar 28, 2024

gabrielmbmb requested a review from alvarobartt March 28, 2024 13:56

gabrielmbmb self-assigned this Mar 28, 2024

Set httpx logging level to critical

eaad2e4

alvarobartt approved these changes Mar 28, 2024

View reviewed changes

gabrielmbmb added 4 commits March 28, 2024 22:22

Merge branch 'core-refactor' into better_logging

f93cae6

Fix unit tests

dc783b5

Create basic in logger in BasePipeline

e58dcb4

Ignore error

044a13b

gabrielmbmb merged commit 2cd8391 into core-refactor Mar 28, 2024
4 checks passed

gabrielmbmb deleted the better_logging branch March 28, 2024 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `QueueHandler` for `Pipeline` logging #489

Use `QueueHandler` for `Pipeline` logging #489

gabrielmbmb commented Mar 28, 2024 •

edited

Loading

alvarobartt Mar 28, 2024

gabrielmbmb Mar 28, 2024

alvarobartt Mar 28, 2024

gabrielmbmb Mar 28, 2024

alvarobartt Mar 28, 2024

Use QueueHandler for Pipeline logging #489

Use QueueHandler for Pipeline logging #489

Conversation

gabrielmbmb commented Mar 28, 2024 • edited Loading

Description

alvarobartt Mar 28, 2024

Choose a reason for hiding this comment

gabrielmbmb Mar 28, 2024

Choose a reason for hiding this comment

alvarobartt Mar 28, 2024

Choose a reason for hiding this comment

gabrielmbmb Mar 28, 2024

Choose a reason for hiding this comment

alvarobartt Mar 28, 2024

Choose a reason for hiding this comment

Use `QueueHandler` for `Pipeline` logging #489

Use `QueueHandler` for `Pipeline` logging #489

gabrielmbmb commented Mar 28, 2024 •

edited

Loading