Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heads up: pathlib.VirtualPath may be coming soon #347

Open
barneygale opened this issue Jul 6, 2023 · 6 comments
Open

Heads up: pathlib.VirtualPath may be coming soon #347

barneygale opened this issue Jul 6, 2023 · 6 comments

Comments

@barneygale
Copy link

barneygale commented Jul 6, 2023

I've logged a CPython PR that adds a private pathlib._VirtualPath class:

python/cpython#106337

If/when that lands, it's not much more work to drop the underscore and introduce pathlib.VirtualPath to the world. This would be Python 3.13 at the earliest.

Would it be suitable for use in cloudpathlib? It would be great to have your feedback

@barneygale
Copy link
Author

barneygale commented Dec 27, 2023

Hello - I've published CPython's private PathBase class in a PyPI package: https://pypi.org/project/pathlib-abc/0.1.0/. No docs yet but I should have them ready soon.

If the PyPI package succeeds and matures, I'm hopeful we can make PathBase public in Python itself. Hopefully you can make it work, do let me know if not!

@rpmcginty
Copy link

If the PathBase does get adopted in the standard library in upcoming releases, using that as the base class for CloudPath would make adoption of this package all the more powerful.

@barneygale not that you have to provide this, but would you be able to provide another limited implementation example (in addition to TarPath) for something like S3Path or something to that effect?

@barneygale
Copy link
Author

barneygale commented Jan 4, 2024

Here's a basic example:

import errno
import io
import boto3
from pathlib_abc import PathBase, UnsupportedOperation


class S3Path(PathBase):
    __slots__ = ('client',)

    def __init__(self, *pathsegments, client=None):
        super().__init__(*pathsegments)
        self.client = client or boto3.client('s3')

    def __repr__(self):
        return f"{type(self).__name__}({str(self)!r}, client={self.client!r})"

    def __hash__(self):
        return hash(str(self))

    def __eq__(self, other):
        if not isinstance(other, S3Path):
            return NotImplemented
        return str(self) == str(other)

    def with_segments(self, *pathsegments):
        return type(self)(*pathsegments, client=self.client)

    @property
    def bucket(self):
        if self.is_absolute() and len(self.parts) > 1:
            return self.parts[1]
        return ''

    @property
    def key(self):
        if self.is_absolute() and len(self.parts) > 1:
            return '/'.join(self.parts[2:])
        return ''

    def open(self, mode='r', buffering=-1, encoding=None, errors=None, newline=None):
        if buffering != -1 or not self.is_absolute():
            raise UnsupportedOperation()
        action = ''.join(c for c in mode if c not in 'btU')
        if action == 'r':
            try:
                fileobj = self.client.get_object(Bucket=self.bucket, Key=self.key)['Body']
            except self.client.exceptions.NoSuchKey:
                raise FileNotFoundError(errno.ENOENT, 'Not found', str(self)) from None
        else:
            raise UnsupportedOperation()
        if 'b' not in mode:
            fileobj = io.TextIOWrapper(fileobj, encoding, errors, newline)
        return fileobj

if __name__ == '__main__':
    path = S3Path('/mybucket/mydir/myfile.txt')
    print(path)
    print(repr(path))
    print(path.bucket)
    print(path.key)
    print(path.read_text())

Would it be useful for me to add stat() and iterdir() too?

@jayqi
Copy link
Member

jayqi commented Jan 4, 2024

Hi @barneygale,

Very cool!

I saw that python/cpython#106337 got merged. Does that mean that the private pathlib._PathBase will definitely be released with Python 3.13 later this year? Does that mean we should treat pathlib-abc as a backport rather than a prototype?

It would be great if you have documentation coming about the benefits of using PathBase.

Is the benefit mainly that we'd be able to reduce the amount of code we have by inheriting methods from PathBase instead?

We also struggle a lot with typechecking—we want to have some kind of common denominator AnyPath that pathlib and cloudpathlib types would subtype. Is this something PathBase might be able to help with?

@barneygale
Copy link
Author

Does that mean that the private pathlib._PathBase will definitely be released with Python 3.13 later this year? Does that mean we should treat pathlib-abc as a backport rather than a prototype?

The cut-off date for new features in 3.13 is May this year. It's possible but unlikely that PathBase will be made public in time. I think 3.14 or 3.15 is a more likely timeframe. Things can move slowly in CPython development!

I think you should treat pathlib-abc as a prototype at the moment. It will become a backport once this stuff is public in CPython. I appreciate the PyPI package description is misleading - will fix.

It would be great if you have documentation coming about the benefits of using PathBase.

Is the benefit mainly that we'd be able to reduce the amount of code we have by inheriting methods from PathBase instead?

Pretty much. You get all the PurePath behaviour, plus a few high-level methods like glob() and read_text() that are implemented in terms of lower-level abstract methods.

We also struggle a lot with typechecking—we want to have some kind of common denominator AnyPath that pathlib and cloudpathlib types would subtype. Is this something PathBase might be able to help with?

That's the ultimate goal, yeah. Once PathBase is in CPython, you'll be able to write:

def do_thing(path: pathlib.PathBase | os.PathLike):
    if not isinstance(path, pathlib.PathBase):
        path = pathlib.Path(path)
    with path.open('r') as f:
        ...

Or if you don't care about accepting strings/arbitrary pathlike objects:

def do_thing(path: pathlib.PathBase):
    with path.open('r') as f:
        ...

This doesn't work with the PyPI package because pathlib.Path isn't a subclass of pathlib_abc.PathBase. Perhaps that's solvable with some temporary __instancecheck__() magic?

@barneygale
Copy link
Author

Looks like it's fairly simple to make pathlib classes virtual subclasses of the ABCs:

diff --git a/pathlib_abc/__init__.py b/pathlib_abc/__init__.py
index db9ac15..aa5718e 100644
--- a/pathlib_abc/__init__.py
+++ b/pathlib_abc/__init__.py
@@ -1,3 +1,4 @@
+import abc
 import functools
 ntpath = object()
 from . import _posixpath as posixpath
@@ -169,7 +170,7 @@ class _PathParents(Sequence):
         return "<{}.parents>".format(type(self._path).__name__)
 
 
-class PurePathBase:
+class PurePathBase(abc.ABC):
     """Base class for pure path objects.
 
     This class *does not* provide several magic methods that are defined in
@@ -1143,3 +1144,8 @@ class PathBase(PurePathBase):
     def as_uri(self):
         """Return the path as a URI."""
         self._unsupported("as_uri")
+
+
+import pathlib
+PurePathBase.register(pathlib.PurePath)
+PathBase.register(pathlib.Path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants