Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated smart_open.smart_open function with a thin wrapper #544

Merged
merged 10 commits into from
Oct 8, 2020

Conversation

mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Oct 1, 2020

We're about to do a major version bump, so it's a good chance to drop deprecated functionality.

@mpenkov mpenkov requested a review from piskvorky October 1, 2020 11:57
@mpenkov mpenkov changed the title Delete deprecated smart_open.samrt_open function Delete deprecated smart_open.smart_open function Oct 1, 2020
@piskvorky
Copy link
Owner

piskvorky commented Oct 1, 2020

Scary. Any harm keeping smart_open.smart_open as an alias of smart_open.open?

This will break a lot of code… including our own in gensim-data.

@mpenkov
Copy link
Collaborator Author

mpenkov commented Oct 1, 2020

Scary. Any harm keeping smart_open.smart_open as an alias of smart_open.open?

Yeah, the harm is that they're not strictly compatible - the former passes parameters as *args, the latter wraps them in a dict.

This will break a lot of code… including our own in gensim-data.

We could either pin the version, or stop using the deprecated function. It's been deprecated for years now.

@piskvorky
Copy link
Owner

piskvorky commented Oct 1, 2020

Yeah, we'll have to re-release the datasets. They're meant to be immutable, so can't (=shouldn't) edit.

@piskvorky
Copy link
Owner

piskvorky commented Oct 1, 2020

It's been deprecated for years now.

Less than a year (Nov 2019).

The smart_open README looks broken / unfinished; can you have a look?

Version 2.0 will introduce a backwards incompatible installation method with regards to the cloud dependencies. A migration path to minimize breaking was introduced in version x.x.x. If you want to maintain backwards compatibility (installing all dependencies) install this package via smart_open[all] now and once the change is made you should not have any issues. If all you care about is AWS dependencies for example you can install via smart_open[s3] and once the dependency change is made you will simply drop the unwanted dependencies. You can read more about the motivations here

@mpenkov
Copy link
Collaborator Author

mpenkov commented Oct 1, 2020

A migration path to minimize breaking was introduced in version x.x.x.

Let's take a moment to truly appreciate this prophetically ironic sentence...

@krzemienski
Copy link

krzemienski commented Oct 1, 2020

A migration path to minimize breaking was introduced in version x.x.x.

Let's take a moment to truly appreciate this prophetically ironic sentence...

Or maybe another moment to wonder just how smart smart_open is? Glad I’ve learned to watch repos I use in production code!

@mpenkov
Copy link
Collaborator Author

mpenkov commented Oct 2, 2020

Yeah, we'll have to re-release the datasets. They're meant to be immutable, so can't (=shouldn't) edit.

If we're gonna open up that can of worms, then it may be worth thinking about how we handle gensim-data. As @gojomo and @menshikh-iv pointed out, the current approach has several shortcomings:

  1. Code accompanying each dataset is not in version control. It's an __init__.py file attached to each release.
  2. There is no dependency control. If the dataset depends on a package that isn't already installed, or touches a part of gensim that no longer exists, it isn't possible to install it.
  3. There is a security risk because gensim downloads a Python script from a network location and then executes it.

While most of the above is for us to work out as part of the gensim effort (e.g. piskvorky/gensim#2283), the relevance to smart_open is as follows:

  1. We cannot easily locate datasets that use the old deprecated smart_open.smart_open function. We have to go through each release and eyeball the __init__.py files. This is a bad sign.
  2. We cannot pin the smart_open version for each dataset.

@piskvorky WDYT about the above points? How shall we move forward, both with this PR and in general?

@gojomo
Copy link
Contributor

gojomo commented Oct 2, 2020

One way to still-soften the removal would be to leave a stub that errors with a message pointing to help (or if possible, the exact recommended line-of-code with which to replace old usages). Compared to just a cannot import name 'smart_open' error on import, or a has no attribute 'smart_open' error on a smart_open.smart_open() usage, it will save those hitting it some Googling, and could also mention "function was deprecated in x.x.x [ha] and is now removed".

For gensim-data, I'd recommend it become its own stub PyPI package, with versioned releases, & any code that ever executes/downloads other large out-of-package data in source control. Adding a new named dataset would require a new release, which users would pip install --upgrade gensim-data so their download-stubs know the new dataset name/remotes.

@piskvorky
Copy link
Owner

piskvorky commented Oct 2, 2020

Gensim-data is low priority for me right now (and has been for a long time). I'd probably design it differently (and not call it api – what API?), but I'd still keep the immutability of a resource release/version (of both code and data), that's vital. The "code" part can be handled by standard packaging.

This came to my attention only because I saw deprecations coming from a dataset that relies on smart_open.open, which will trigger a hard fail after this PR. Other than that I have no bandwidth for gensim-data atm, sorry.

@piskvorky WDYT about the above points? How shall we move forward, both with this PR and in general?

Fix README + keep smart_open.smart_open. I understand it's incompatible (params in args vs dict) so an alias is risky, so probably an error stub like @gojomo says.

'for more information.' % sorted(kwargs)
)
del kwargs
return open(**locals())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the actual end goal is true deprecation/removal of the function itself - not just some sets of supported parameters – perhaps OK to fail in this case, too. (Or at least: show some deprecation warning before proceeding?) However, if it's true that in this case, the caller can just use smart_open.open() with exactly the same params instead, the message could assure them that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't fail in this case because the gensim-data stuff will break.

I agree with the other suggestion.

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever solution! I like it.

How about the incomplete README?

smart_open/smart_open_lib.py Outdated Show resolved Hide resolved
@mpenkov mpenkov changed the title Delete deprecated smart_open.smart_open function Replace deprecated smart_open.smart_open function with a thin wrapper Oct 4, 2020
@mpenkov mpenkov merged commit d03bdeb into develop Oct 8, 2020
@mpenkov mpenkov deleted the rm_smart_open branch October 8, 2020 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants