Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for matching file paths against Unix shell style patterns (glob). #1251

Open
RinChanNOWWW opened this issue Jan 29, 2023 · 8 comments

Comments

@RinChanNOWWW
Copy link

RinChanNOWWW commented Jan 29, 2023

Hope OpenDAL to support glob operation like the glob crate.

And both blocking and non-blocking methods are needed.

@ClSlaid
Copy link
Contributor

ClSlaid commented Feb 13, 2023

I'm afraid this will be not possible to implement a perfect glob API, by perfect I mean only those objects matching the glob will be transmitted between OpenDAL and underlying storage services.

Given such conclusion, since OpenDAL offers BlockingObjectLister and ObjectLister, I suggest listing with filter.

use futures::TryStreamExt;
use glob::Pattern;
use opendal::{services::Fs, Operator};

#[tokio::main]
async fn main() {
    let mut op_builder = Fs::default();
    op_builder.root("/tmp/opendal/");
    op_builder.atomic_write_dir("/tmp/opendal/");
    let op = Operator::create(op_builder).expect("must success").finish();

    for i in 0..100 {
        let path = format!("valid/dir/test-{}.txt", i);
        op.object(&path).create().await.unwrap();
        let junk = format!("invalid/dir/junk-{}.txt", i);
        op.object(&junk).create().await.unwrap();
    }

    let gm = Pattern::new("valid/dir/test-*.txt").expect("should valid");

    let mut lister = op.object("/").scan().await.unwrap();

    // cannot:
    // while let Some(obj) = lister.try_next().await.unwrap().filter()
    // this will result in early endint of streaming
    while let Some(obj) = lister.try_next().await.unwrap() {
        if gm.matches(obj.path()) {
            println!("{} is valid", obj.path());
        }
    }
}

But, really, it's a little too verbose...

@Xuanwo
Copy link
Member

Xuanwo commented Feb 13, 2023

Maybe we can provide a API scan_glob()

@Xuanwo
Copy link
Member

Xuanwo commented Apr 12, 2023

It seems interesting to provide a op.glob("media/**/*.jpg"), users can:

let it = op.glob("media/**/*.jpg").await?;

while let Some(entry) = it.next().await? {
   do_something(&entry)
}

@xyjixyjixyji
Copy link
Contributor

It seems interesting to provide a op.glob("media/**/*.jpg"), users can:

let it = op.glob("media/**/*.jpg").await?;

while let Some(entry) = it.next().await? {
   do_something(&entry)
}

This seems interesting, I will have a look then xD.

@xyjixyjixyji
Copy link
Contributor

This seems interesting, I will have a look then xD.

I cannot think of a better way of wrapping list with a simple filter.... Just as #1251 (comment) said.

@suyanhanx
Copy link
Member

This seems interesting, I will have a look then xD.

I cannot think of a better way of wrapping list with a simple filter.... Just as #1251 (comment) said.

You always have to check one by one.😣

@xyjixyjixyji
Copy link
Contributor

You always have to check one by one.😣

Yeah, the sequential scan is unavoidable... This is a good to have feature though, but not that primitive lol. I'm not sure that if opendal should support such higher level operations.

@Xuanwo
Copy link
Member

Xuanwo commented Apr 13, 2023

I'm not sure that if opendal should support such higher level operations.

OpenDAL is open to adding features that align with our vision. And yes, it would be a good to have feature. Therefore, there is no rush to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants