Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file storage structure to improve delete performance in large file sets #12248

Closed
3 tasks done
emrahnazif opened this issue Mar 19, 2022 · 4 comments
Closed
3 tasks done

Comments

@emrahnazif
Copy link

Preflight Checklist

Describe the Bug

I have a "collection A" with "multiple files field" in it. Some times, I am deleting items from "collection A", so the files are no longer needed.
Regularly, I am deleting the files are no longer necessary.
After having around 1.8m files (all stored locally), deleting a file takes up to 10 secs (both API and App).
Mean while creating new items or uploading files works smoothly (I don't know if there are any performance drop as well, I am running directus on a good machine).

While normal running, CPU level is is perfectly fine, however, while I am deleting the files CPU hits 100% and this immediately affects all other operations which takes long.

I am deleting files 1by1, so I don't know how "Delete Multiple Files" would work.

I am running v9.7.0, on a dedicated mac mini m1 machine.

To Reproduce

Create a collection with multiple files in it.
Add many items with around 1-2 million files.
Try to delete a file.

Errors Shown

No errors.

What version of Directus are you using?

v9.7.0

What version of Node.js are you using?

v16.14.0

What database are you using?

mysql 8

What browser are you using?

chome or through API

What operating system are you using?

macOS

How are you deploying Directus?

running locally

@emrahnazif
Copy link
Author

I assume this may be related to "find and delete" instead of deleting with absolute path.
Image files can have several transformations, I believe that is the reason of "find and delete".

Do you have any suggestions?

@rijkvanzanten
Copy link
Member

I assume this may be related to "find and delete" instead of deleting with absolute path.
Image files can have several transformations, I believe that is the reason of "find and delete".

That's exactly what it is. It has to delete all files that start with the same prefix. Doing a "delete where file starts with X" isn't available in all/most platforms, which in turn means that the delete operation has to scan over every file name, and delete the file when the name matches the check.

There's no real way around that now. The only way to fix this would be to upgrade the way Directus manages thumbnails in the first place, so we can easily delete a whole folder worth of thumbnails, rather then having to find individual files. So for example something like:

// before
example.jpeg
example-5e49aa94cacb75d547c4ee1d8f32bdbb.jpeg

// after
example.jpeg
thumbnails/
  example/
    5e49aa94cacb75d547c4ee1d8f32bdbb.jpeg

@rijkvanzanten rijkvanzanten changed the title Performance drop on delete files Improve file storage structure to improve delete performance in large file sets Mar 21, 2022
@emrahnazif
Copy link
Author

Thank you for your prompt response. Your solution looks simple and great. That should fix.
This is a nice bottleneck for projects with many files. Apparently, not too many people deal with millions of files.

For a dirty solution to this, I created 2nd "local" storage with a new STORAGE_LOCAL_ROOT="./uploads-0001".
Probably I will end up creating 100-200 "local" storages and randomly distribute the files.

Do you confirm, using "S3_DRIVER" won't fix this?

@rijkvanzanten
Copy link
Member

Do you confirm, using "S3_DRIVER" won't fix this?

It won't fully fix it, but it might run faster. I'm not sure if AWS can process file reads quicker than the node can read the local file system.

@directus directus locked and limited conversation to collaborators May 25, 2022
@rijkvanzanten rijkvanzanten converted this issue into discussion #13559 May 25, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants