You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Next problem is that these "bucket" entries don't actually have a bucket attribute set, it causes confusion inside, so the next bucket.glob('*') causes havoc inside, it pulls 2nd bucket into the 1st one somehow: raise ValueError("{!r} is not in the subpath of {!r}" ValueError: '/bucket2' is not in the subpath of '/bucket1' OR one path is relative and the other is absolute.
Moreover, using library like this: files = list(AnyPath('s3://bucket1').glob('*')) produces next: S3Path('s3://bucket1/bucket1/root_folder')
which is obviously incorrect with bucket name twice in the path (and consequent .glob('*') failing as well).
Is it me doing something horribly wrong, or S3 is broken right now?
The text was updated successfully, but these errors were encountered:
pjbull
changed the title
Library is broken for S3 buckets?
Globbing top-level bucket returns malformed CloudPaths
Jan 5, 2023
Thanks for the report @ssoj13. There are 2 separate issues here, both of which just affect .glob, not any other methods.
Issue 1 - Globbing across buckets
Globbing across buckets is not currently implemented, and likely will not be since it would need to be specially handled.
CloudPath("s3://").glob("*") # this throws an error
To get all of the buckets that a user can see, you can use iterdir:
CloudPath("s3://").iterdir() # this lists buckets
In this case, we should at least raise a user-friendly error that indicates that globbing across buckets is not supported.
Issue 2 - Globbing at bucket-level results in malformed paths
Your second issue, CloudPath("s3://bucket").glob("*") is a bug that looks like it was just introduced by #304. You can try version 0.11.0 and see if it reproduces or not. It likely will not reproduce but be substantially slower. To work around for now, you could use iterdir at the top level to have equivalent behavior to glob("*"). Also, .glob should work as expected within folders, e.g. CloudPath("s3://bucket/folder").glob("*").
The fix here should be to properly form paths at the top level so this doesn't happen.
As I understand, simple code like this is supposed to work just fine, but it's not:
So, in bucket S3Paths I have malformed url like: "s3:////bucket1":
The error happens here: https://github.com/drivendataorg/cloudpathlib/blob/master/cloudpathlib/cloudpath.py#L398
It happens when
s3://
gets joined with/bucket1
via slash inhttps://github.com/drivendataorg/cloudpathlib/blob/master/cloudpathlib/client.py#L64
Next problem is that these "bucket" entries don't actually have a bucket attribute set, it causes confusion inside, so the next
bucket.glob('*')
causes havoc inside, it pulls 2nd bucket into the 1st one somehow:raise ValueError("{!r} is not in the subpath of {!r}" ValueError: '/bucket2' is not in the subpath of '/bucket1' OR one path is relative and the other is absolute.
Moreover, using library like this:
files = list(AnyPath('s3://bucket1').glob('*'))
produces next:S3Path('s3://bucket1/bucket1/root_folder')
which is obviously incorrect with bucket name twice in the path (and consequent .glob('*') failing as well).
Is it me doing something horribly wrong, or S3 is broken right now?
The text was updated successfully, but these errors were encountered: