Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it easy to index opendal with db #3977

Open
prabirshrestha opened this issue Jan 13, 2024 · 2 comments
Open

make it easy to index opendal with db #3977

prabirshrestha opened this issue Jan 13, 2024 · 2 comments

Comments

@prabirshrestha
Copy link

I'm building a next cloud alternative using opendal in rust and now at a phase where I need to index the fs backend to a db so I can implement a rich search experience.

Currently I have basic indexing working but is a dumb one without any performance improvements. Here is the pseudo code.

let version = guid::new();
let out ds = op.lister_with("/").recurive(true).await?;
while let Some(de) = ds.try_next().await? {
   // add de to the db with version with path, parent, filename, mtime and size.
}
// delete all entry from db that is not equal to version

While this work this could be slow. I'm wondering if there are better ways to make this faster.

  1. Parallel indexer. Is there something like jwalk or godirwalk which can be used walk in parallel?
  2. Instead of delete at the end I was thinking if I can delete it as I go along when it is a re-indexing instead of the first index. For example list / and if diff any missing entries in / in db and delete it and so on. I see that there is FlatLister but wondering if there can be some other lister that will help me here.

After the initial indexing, I don't expect the reindexing to be run again as since I'm building the webdav server I can do delete/insert/updates on the fly to the db but curious how others would manage indexing.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 15, 2024

Interesting idea! I'm also considering listing concurrently. Do you mind if the results are returned out of order?

@prabirshrestha
Copy link
Author

My indexed table contains path, parent, name and few other metadata such as size, mime_type, mtime. Since I don't have any sort of hierarchy or relation in the table the order doesn't matter for me. Currently my backend webdav server always sorts it alphabetically based on the name and plan to add sort by one of these columns. In this case it would be name, size, mtime.

image

image

I have concept of storage where one can attach arbitrary backends (Storage Type) from opendal. I plan to add some sort of UX in this settings and the files page that indexing is in progress so you may not see everything.

image

Once the initial indexing has complete I expect the user to use my webdav server, so I can make sure the db is in sync without reindexing.

With rust+opendal, I feel my app running in synology is 5x faster than the Synology Photos just navigating with indexing disabled, so I'm not worried about other perf yet (and this is without any optimization on my side). I do have lot of photos and I expect lot of users trying for the first time to enable indexing so having a very fast speed to recursively traverse is very important to me as this would be one of the first impression for a new user.

I have been very impressed with https://github.com/SmilyOrg/photofield at how fast it indexes. So looking at how it does would be good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants