Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent smart_open from writing to logs on import #476

Merged
merged 6 commits into from Apr 25, 2020
Merged

Prevent smart_open from writing to logs on import #476

merged 6 commits into from Apr 25, 2020

Conversation

mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Apr 16, 2020

Motivation

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the fix I expected :)

I thought the issue were the global (import-time) logging calls. Not the relative order of imports.
If this works it definitely deserves a code comment!

@piskvorky
Copy link
Owner

piskvorky commented Apr 16, 2020

I did some tests to wrap my head around what's happening. Recording my thoughts here:

Before this PR:

  1. app imports lib
  2. lib calls logger.info on import
  3. logging prints the raw message to stderr
  4. app configures logging with logging.basicConfig()
  5. app calls logger.info
  6. message is correctly formatted to log

After this PR:

  1. app imports lib
  2. lib adds NullHandler
  3. lib calls logger.info on import
  4. logging ignores the message
  5. app configures logging with logging.basicConfig()
  6. app calls logger.info
  7. message is correctly formatted to log

I was surprised by "Before 6)": I thought "Before 4)" would be ignored because "Before 3)" already configured the logging system, but that's not the case. "Before 4)" still takes effect.

So #475 is less severe than I thought. The application's logging was not messed up – users just saw some cryptic messages in their stderr on smart_open import – incl. indirect import via other libraries.

@piskvorky
Copy link
Owner

Btw with this PR, users will not see these unable to import 'smart_open.gcs', disabling that module. messages at all. Is that what we want? And if it is, why are they there in the first place?

The global (import-time) logging is weird. Perhaps this should be a warning, or not import-time?

@mpenkov
Copy link
Collaborator Author

mpenkov commented Apr 24, 2020

@radim @menshikh-iv OK, I tried a different approach. Instead of logging the import error, we handle it silently, and instead raise an exception when the user tries to use the module that raised the import error. For example, here's what they'll see if they try to open a GCS URL without the google-cloud-core library installed:

$ python -c 'import smart_open;smart_open.open("gs://foo/bar")'
Traceback (most recent call last):
  File "/home/misha/git/smart_open/smart_open/transport.py", line 74, in get_transport
    submodule = _REGISTRY[scheme]
KeyError: 'gs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/misha/git/smart_open/smart_open/smart_open_lib.py", line 224, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File "/home/misha/git/smart_open/smart_open/smart_open_lib.py", line 398, in _open_binary_stream
    submodule = transport.get_transport(scheme)
  File "/home/misha/git/smart_open/smart_open/transport.py", line 76, in get_transport
    raise NotImplementedError(message)
NotImplementedError: Unable to handle scheme 'gs', expected one of ('', 'file', 'hdfs', 'http', 'https', 's3', 's3a', 's3n', 's3u', 'scp', 'sftp', 'ssh', 'webhdfs'). Extra dependencies required by 'gs' may be missing. See <https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst> for details.

Please have a look and let me know what you think. If this is acceptable, then we can roll back the previous logging-related commits.

@radim
Copy link

radim commented Apr 24, 2020

Wrong Radim 😉

@mpenkov
Copy link
Collaborator Author

mpenkov commented Apr 24, 2020

I meant @piskvorky, whoops

@mpenkov mpenkov changed the title Configure logging handlers before submodule imports Prevent smart_open from writing to logs on import Apr 25, 2020
@mpenkov mpenkov merged commit dd6f7b5 into piskvorky:master Apr 25, 2020
@mpenkov mpenkov deleted the fix-logging branch April 25, 2020 05:55
__version__ = version.__version__

logger = logging.getLogger(__name__)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this revert? Isn't it safer to add the handler early on, to avoid issues like this in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted it because it was no longer a necessary part of the solution.

If you think it's worth keeping for future-proofing the existing solution, then I can add it back in.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical, but it makes sense to add the handler as soon as possible. Who knows what the contributed modules will do in the future (a missed logging call on import…).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importing gensim prints stuff and configures logging Unable to load smart_open.gcs
4 participants