Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pathlib monkeypatch with replacement of pathlib.Path.open #436

Merged
merged 10 commits into from Mar 21, 2020

Conversation

menshikh-iv
Copy link
Contributor

@menshikh-iv menshikh-iv commented Mar 21, 2020

Problem

If you use pathlib.Path instead of str in your code, you should use the internal open method like

with mypath.open("r") as infile:
    ...

of course, you can pass them to smart_open like this

with smart_open.open(mypath, 'r') as infile:
    ...

but this doesn't looks "good enough"

Solution

I implement "mokeypatch" function that replaces pathlib.Path.open to smart_open.open

Example

from pathlib import Path
from smart_open.smart_open_lib import patch_pathlib

patch_pathlib()

pth = Path("/dev/random")

with pth.open("rb") as infile:
    print(pth.read(4))

@mpenkov mpenkov merged commit 90d4f72 into piskvorky:master Mar 21, 2020
@mpenkov
Copy link
Collaborator

mpenkov commented Mar 21, 2020

Looks good to me! Merged.

Thank you for your contribution @menshikh-iv !!

@menshikh-iv menshikh-iv deleted the mokeypath4pathlib branch March 21, 2020 13:41
Drop-in replacement of ``pathlib.Path.open``
--------------------------------------------

Now you can natively use ``smart_open.open`` with your ``Path`` objects.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "now" referring to? @mpenkov let's clean this up a bit. How about:

``smart_open.open`` can also be used with ``Path`` objects. The built-in `Path.open()` is not able to read text from compressed files, so use ``patch_pathlib`` to replace it with `smart_open.open()` instead:

.. code-block:: python

  >>> from pathlib import Path
  >>> from smart_open.smart_open_lib import patch_pathlib
  >>> 
  >>> patch_pathlib()  # replace `Path.open` with `smart_open.open`.
  >>>
  >>> path = Path("smart_open/tests/test_data/crime-and-punishment.txt.gz")
  >>>
  >>> with path.open("r") as infile:
  ...     print(infile.readline()[:41])
   
  В начале июля, в чрезвычайно жаркое время

  >>> # You can also use the patch as a context manager, to automatically restore the original ``Path.open()`` at the end:
  >>> with patch_pathlib():
  ...     with Path("smart_open/tests/test_data/crime-and-punishment.txt.gz").open("r") as infile:
  ...         print(infile.readline()[:41])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with first part, but definitely agains of context-manager usage in this case, because:

your variant

with patch_pathlib():
    with Path("smart_open/tests/test_data/crime-and-punishment.txt.gz").open("r") as infile:
        print(infile.readline()[:41])

how I'll do that

with smart_open("smart_open/tests/test_data/crime-and-punishment.txt.gz") as infile:
    print(infile.readline()[:41])

shorter and simpler (and nothing will change If you replace str to Path here)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true.

pathlib = sys.modules.get("pathlib", None)

if not pathlib:
raise RuntimeError("Can't patch 'pathlib.Path.open', you should import 'pathlib' first")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import it ourselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do that ourselves, this is bad tone.

BTW - this doesn't help to the user, because the user will not have Path class available even if we import that ourselves

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow – imported modules are singletons, it doesn't matter who imported it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, but user has no access to pathlib itself (without digging into sys.modules), like

In [1]: def f(): 
   ...:     import pathlib 
   ...:                                                                         

In [2]: pathlib                                                                 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-7bc182bbe5df> in <module>
----> 1 pathlib

NameError: name 'pathlib' is not defined

I expect that user UNDERSTAND what they do, for this reason, I expect than user firstly import pathlib and after apply monkeypathing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get it. What's the point of failing with "import pathlib first!", when we can import it ourselves?

Copy link
Owner

@piskvorky piskvorky Mar 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…plus pathlib seems to be already pre-imported internally:

>>> import sys
>>> sys.modules.get("pathlib", None)
<module 'pathlib' from '/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pathlib.py'>

So when can this if condition actually happen? Is it some Python version compatibility thing?
If so, needs a better comment, and a different exception message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my environment:

(smart_open) sergeyich:smart_open misha$ python
Python 3.7.6 (default, Dec 30 2019, 19:38:28)
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules.get('pathlib', 'not there')
'not there'

However, smart_open already imports pathlib by itself: https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/smart_open_lib.py#L51

So, given that we're already doing it, it's probably simpler to patch it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @mpenkov, if we do that anyway, I agree.

smart_open/tests/test_smart_open.py Show resolved Hide resolved
lines = infile.readlines()

_patch_pathlib(obj.old_impl)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the context manager? That should be the preferred way of restoring the original, instead of calling _patch_pathlib(obj.old_impl).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added context manager for testing purposes, I see no cases for "real usage" of context manager in this case

@piskvorky
Copy link
Owner

piskvorky commented Mar 21, 2020

Sorry for the late review, I see this was already merged. Needs a little language + API clean up.

@piskvorky
Copy link
Owner

piskvorky commented Mar 21, 2020

@mpenkov the main smart_open badge shows a red "build failing" after this merge – can you have a look? Thanks.

@mpenkov mpenkov mentioned this pull request Mar 22, 2020
@mpenkov
Copy link
Collaborator

mpenkov commented Mar 22, 2020

@menshikh-iv @piskvorky Please see #437, I've incorporated the post-merge comments into there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants