Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move optional dependencies to extras #454

Merged
merged 26 commits into from
Apr 7, 2020

Conversation

Amertz08
Copy link
Contributor

@Amertz08 Amertz08 commented Apr 1, 2020

IMPORTANT: this PR requires a migration path

We should think about how and when we will expose users to this change.

Motivation

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

Workflow

Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.

setup.py Outdated Show resolved Hide resolved
@mpenkov mpenkov changed the title Implement extras requires migration path Implement extras Apr 2, 2020
setup.py Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@mpenkov mpenkov changed the title Implement extras Move optional dependencies to extras Apr 6, 2020
Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @Amertz08 !!

@Tyrannas
Copy link

Tyrannas commented Apr 6, 2020

I'm not sure about this but I think the imports in the code should now be in try except ? Like if someone installs the smart_open[gcs] the smart_open.s3 will throw an import error

@Amertz08
Copy link
Contributor Author

Amertz08 commented Apr 6, 2020

Test coverage keeps failing in 3.8. @Tyrannas technically not yet because this doesn't not install the AWS deps when you do smart_open[gsc] as they are still in the base requirements.

@Amertz08
Copy link
Contributor Author

Amertz08 commented Apr 6, 2020

We can still add it just so it's not forgotten though.

@Tyrannas
Copy link

Tyrannas commented Apr 6, 2020

We can still add it just so it's not forgotten though.

I was thinking something like:

try:
    import boto3
except ImportError:
    logging.getLogger('smart_open').info("Boto3 dependency is not installed, smart_open s3 will therefore not be working.")

Using info instead of warning to avoid regular loggers to be spammed since logging level is by default warning.

What do you think? Something like that

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 6, 2020

@Tyrannas I think we already do this:

https://github.com/RaRe-Technologies/smart_open/blob/29092ba823ba61028598dbe94324b750f19244a1/smart_open/transport.py#L43

The next step is to get the backends to automatically tell the users which packages they need, e.g. "to enable S3 support, run pip install smart_open[s3]". Not sure what the best way to do that is, but it's probably out of scope for this PR.

@Tyrannas
Copy link

Tyrannas commented Apr 6, 2020

@Tyrannas I think we already do this:

https://github.com/RaRe-Technologies/smart_open/blob/29092ba823ba61028598dbe94324b750f19244a1/smart_open/transport.py#L43

The next step is to get the backends to automatically tell the users which packages they need, e.g. "to enable S3 support, run pip install smart_open[s3]". Not sure what the best way to do that is, but it's probably out of scope for this PR.

@mpenkov yeah but for example you are importing .s3 in smart_open __init__ file and .s3 imports boto3, so this will lead to an Import Error

So it's not totally out of scope since it will prevent smart_open to crash

@Amertz08
Copy link
Contributor Author

Amertz08 commented Apr 6, 2020

We can just use deprecation warnings. It's built into python.
https://docs.python.org/3.6/library/warnings.html

smart_open/s3.py Outdated
import botocore.client
import botocore.exceptions
except ImportError:
sys.stderr.write("Install via smart_open[aws] or smart_open[all] to use this module")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this instead of a logger.info() ? Otherwise the message will be displayed all the time for every library that is not installed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a least the s3 import needs to be taken out from init.py otherwise the actual version will sys.exit(1) without boto3 installed

Copy link
Contributor Author

@Amertz08 Amertz08 Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger will only print if the root logger is configured which is no guarantee.

As far as that import path in __init__.py goes it would seem that it would need to be removed. Once those underlying dependencies are dropped from being installed I believe the import will cause issues. If sys.exit is dropped and the shortcut path maintained it would cause the import error to be suppressed here and thrown elsewhere. Whereas by maintaining the sys.exit the program exits exactly where the message to fix the issue is printed vs exiting with an import error elsewhere that doesn't immediately tell you what the issue is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably want a deprecation warning for the import path in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unaware of that import path so thank you for pointing that out.

Copy link
Contributor Author

@Amertz08 Amertz08 Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so didn't see those imports in transport.py. sys.exit will cause an issue every time. So yeah this is a bigger issue than just the short import path in __init__.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move register_transport('smart_open.s3') to after the try/except block in each of the cloud modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ moving the register after try/except did not work so I moved it back. Also I modified the try/except block to reraise the ImportError as the register_transport would never see it w/o it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two separate problems.

  1. We import s3.iter_bucket in init.py. This will explode if boto3 is not installed. Currently, we always install boto3 (for backwards compatibility) so it's not an acute problem.
  2. When we try to register all the transports, we log a warning for each submodule we couldn't import. This will be annoying for people who intentionally didn't install dependencies for a particular submodule.

I think the solution for (1) is simple. When we stop installing boto3 by default, we can try-catch around the culprit import. We can also remove the import altogether, and direct people to use smart_open.s3.iter_bucket via a warning.

I think the solution for (2) is to group the submodules that failed to register together, and then log a single warning like:

The following submodules could not be imported due to missing dependencies: abs, s3, gcs. To use those submodules, you must install their dependencies.  See <URL> for more info.

Once people get used to the new extra handling (a few versions later), we can reduce the verbosity of the log message.

@Amertz08 I'd like to include your PR in a soon-to-be bugfix release of smart_open (we need to do one because the Google Cloud deps are messing things up for some people). Can you please roll it back to the state it was in when I approved it? I will then merge. We can continue our discussion and work in a later PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please roll back all commits after 0d86f9f. Don't force push, it will axe our discussions).

@Amertz08
Copy link
Contributor Author

Amertz08 commented Apr 6, 2020

Yeah so I'm rethinking how to do the deprecation warnings. I was unaware of the register_transport function attempting to import all of the modules. The way it is currently written will be really spammy in stdout. I think a fix might be to check the scheme in get_transport and if it's in one of the modules we throw the DeprecationWarning there. I believe this would limit the warnings to the user who are actually using those modules.

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@mpenkov
Copy link
Collaborator

mpenkov commented Apr 7, 2020

Merging. Thank you for the hard work @Amertz08 .

I've made additional issues #463 and #464 to continue the discussion @Tyrannas .

@mpenkov mpenkov merged commit 085ab22 into piskvorky:master Apr 7, 2020
@Amertz08 Amertz08 deleted the extra-requires branch April 7, 2020 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't package all cloud dependencies at once
4 participants