Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify logs management #1696

Merged
merged 17 commits into from
Nov 11, 2021
Merged

Simplify logs management #1696

merged 17 commits into from
Nov 11, 2021

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Nov 4, 2021

Related to #1683 and #1538

With the Pipeline's debug functionality, the behavior of the root logger was modified in a way that affected some Haystack submodules' loggers. This PR should restore their original behavior.

Note: as you can see, the PR introduces a little block of code in each of the submodules. The code is identical, so a small function could be made out of it. However, I think hiding this configuration in a function might make less obvious how the loggers work for that specific submodule, which might cause subtle problems later. Opinions welcome 🙂


UPDATE: following a deeper look at the current status, we decided to simplify the whole logger's situation along the lines of #1714, and I'm updating this PR in that direction.

Checklist:

  • Drop the log capture feature from the Pipeline's debug function
  • Remove all handlers from all loggers
  • Set the root logger to WARNING to keep most of the underlying libraries quiet (Elasticsearch, FAISS, etc)
  • Set the haystack logger to INFO to see enough logs during the execution of tutorials and scripts
  • Remove custom log levels from any Haystack module
  • Normalize out usage of log levels across different modules, pushing some logs back to DEBUG and a few others up to INFO
  • Format all logs coming from haystack in a standard way

@ZanSara
Copy link
Contributor Author

ZanSara commented Nov 8, 2021

I dug a bit more in the loggers situation. I was hoping to find a simple solution, but right now what I proposed seems to be the simplest possible.

I have a couple of ideas for improving the current status:

  1. The logs capture feature of the pipeline's debug mode was probably overkill: we might consider removing it with its related magic (even though I enjoyed a lot implementing it 😅)
  2. We could review the log levels as such:
    • Leave the root logger on WARNING, to avoid too much noise from transformers, faiss or elasticsearch
    • Set the main haystack logger to INFO so that we can understand what's going on
    • Remove all special setups from the loggers of the submodules
    • Review the logs coming from the modeling part. A good amount of those should be just moved back to DEBUG.

With both the above points, the entire logs configuration code could be summarized in two lines in haystack/__init__.py:

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", datefmt="%m/%d/%Y %H:%M:%S", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

which I actually like a lot more than what I've created so far 😅

@ZanSara ZanSara changed the title Fix loggers for selected Haystack modules Simplify logs management Nov 9, 2021
@ZanSara ZanSara marked this pull request as draft November 9, 2021 14:32
@ZanSara
Copy link
Contributor Author

ZanSara commented Nov 9, 2021

Note for the review: I suggest cloning the branch and run a couple of tutorials to check if the output is too verbose (It is definitely way more verbose than it was before).

@ZanSara ZanSara marked this pull request as ready for review November 9, 2021 15:56
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me! I ran a few tutorials and noticed that when we load a Reader model, we print device information twice, so I fixed that.

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with Tutorial 1 and the logs are back to the old, verbose state. Probably even a bit more verbose now.

As discussed, let's clean up a bit and adjust the log level of a few messages. Made some suggestions here. It should be quick to do this first iteration based on output of tutorial 1 now and then we can iterate on smaller log messages that catch out attention in the next weeks.

@brandenchan I was wondering if we need to update any information in the documentation about activating debug in pipe.run() as we dropped the debug_logs param here and are not returning the logs anymore. However, I couldn't find it in the docs anywhere. My best guess was this section but this is rather about adding debug information for custom nodes. Do we have this already documented anywhere else?

@ZanSara
Copy link
Contributor Author

ZanSara commented Nov 10, 2021

I've been running a few more tutorials and noticed some other logs that might change level:

  • WARNING - haystack.document_stores.base - Duplicate Documents: Document with id '7ed12f389f7f085bb30c7d00abd26f81' already exists in index 'document' -> Goes to INFO?
  • WARNING - haystack.modeling.data_handler.processor - Currently no support in Processor for returning problematic ids. This is thrown unconditionally in
    logger.warning(f"Currently no support in Processor for returning problematic ids")
    --> INFO?

@tholor
Copy link
Member

tholor commented Nov 11, 2021

Yeah, first one can go to info. Second to Debug even, I'd say

@tholor tholor added the type:refactor Not necessarily visible to the users label Nov 11, 2021
@ZanSara ZanSara merged commit 42c8edc into master Nov 11, 2021
@ZanSara ZanSara deleted the single_module_logger_levels branch November 11, 2021 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suppress useless warning from transformers about not loaded model weights
3 participants