Skip to content

glob: difference between CloudPath and Paths #154

@remi-braun

Description

@remi-braun

Hello,

I have a replicated filetree between my S3 compatible storage and a local filesystem.
However, there is a discrepency when I use the glob function.
The filesystem glob works fine, but I have to add "**/*/" at the beginning of the path for CloudPath's glob

# Composition of the bucket (sorry for the long names)
# NB: there are duplicates with and without /, may be related to #148
>> list(cloud_path.glob("**/*"))
[S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS/'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS/20191215T110441_S2_T30TXP_L2A_122756/'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS/20191215T110441_S2_T30TXP_L2A_122756'), 
S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS/20191215T110441_S2_T30TXP_L2A_122756/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif')]

# Not working ways of retrieving the file 14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif
>> list(cloud_path.glob("DAX/BY_PRODUCTS/*/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif"))
[]
/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif')]
>> list(cloud_path.glob("**/DAX/BY_PRODUCTS/*/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif"))
[]
>> list(cloud_path.glob("*DAX/BY_PRODUCTS/*/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif"))
[]

# Correct way
>> list(cloud_path.glob("**/*DAX/BY_PRODUCTS/*/14_20191215T110441_S2_T30TXP_L2A_122756_CLIP.tif"))
[S3Path('s3://my_bucket/MODULES/PROCESS/CLIPPING/DAX/BY_PRODUCTS/20191215T110441_S2_T30TXP_L2A_122756]

Note: I am still on my Ceph compatible storage (see #148), I hope this is not related 😓

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions