Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use QueueHandler for Pipeline logging #489

Merged
merged 17 commits into from
Mar 28, 2024
Merged

Conversation

gabrielmbmb
Copy link
Member

@gabrielmbmb gabrielmbmb commented Mar 28, 2024

Description

This PR updates the logging setup for the local Pipeline to use a QueueHandler and a QueueListener (a thread in the main process). With this setup, child processes send log messages using a multiprocessing.Queue and the QueueHandler to the QueueListener, which will be in charge of handling the log messages.

This change was mainly motivated to improve the logging messages in environments like Google Colab and IPython Notebooks, in which not all the logging messages (from the child processes) were being displayed.

image

@gabrielmbmb gabrielmbmb added enhancement New feature or request fix labels Mar 28, 2024
@gabrielmbmb gabrielmbmb added this to the 1.0.0 milestone Mar 28, 2024
@gabrielmbmb gabrielmbmb self-assigned this Mar 28, 2024
@@ -72,8 +70,8 @@ def generate(
per input in `inputs`."""
pass

@cached_property
def generate_parameters(self) -> List[inspect.Parameter]:
@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the from functools import cached_property can be removed too, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! BTW I removed it because in some environments pickle was not able to serialize it.

@@ -45,6 +45,8 @@ class GenerateEmbeddings(Step):
llm: LLM

def load(self) -> None:
"""Loads the `LLM` used to generate the embeddings."""
super().load()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the steps inheriting from Step need to actually call the super().load() isn't it better if we just add it as a separate method that runs within the post_init? i.e. model_post_init on _Step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly we can't do that :( We need to create the logger after having called setup_logging in each process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we then just do step.load() and then step.setup_logging? But to manage that within the Pipeline instead of having to always add that as part of the load method :/

@gabrielmbmb gabrielmbmb merged commit 2cd8391 into core-refactor Mar 28, 2024
4 checks passed
@gabrielmbmb gabrielmbmb deleted the better_logging branch March 28, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fix
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants