-
-
Notifications
You must be signed in to change notification settings - Fork 108
feat: Implement new virtual glob filesystem #971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…alls which may degrade performance over network mounts
I'm wondering why the C.JPG test didn't pass? I have passed manually my e2e tests against a local server. Test pass or fail in the same way ;-) , for folders and google takeouts, raw file systems and zipped folders and zipped takeout. So, good job! I'm wondering if this kind of optimization can be applied when parsing distant zip files? Thank you a lot. |
Oh, well that's interesting! Good catch. I think that'll be due to filesystems. I'm developing on macOS, which uses case-insensitive filesystems by default. Assuming you use a standard linux setup with ext4 (which github actions does), that'll be case-sensitive, so Given that, I think we should consider the previous test "C.jpg" matching "C.JPG" to be the desired behaviour, and revert your change. Instead, I would switch L123 to use lowercase filenames. Eg, switch it to: (the above being said - no idea how case-insensitivity should be handled for zip files etc...) |
But yes, this can likely be used for remote zip files too, but would need to experiment a little to see how exactly zip files are handled in go! Should be doable though. And should be able to reuse much of the existing implementation; it might be as simple as using a Would probably want to do this in a separate PR though; keep things separate. |
Issue
When uploading images from a networked folder, immich-go slows down due to a large number of syscalls to get file metadata. This can be seen in the attached flamegraph, obtained by trying to upload pictures from a mounted SMB drive:
Fix
We can improve this by walking the filesystem beforehand, and keeping the results in memory. Not only does this batch all the metadata reads, it also deduplicates them, reducing overall network and disk usage, greatly improving performance.
This PR implements such a "virtual" filesystem, as a drop-in replacement for the existing implementation.
Improvements
In my environment, I was able to go from uploading one picture every 2 seconds, to 25 pictures per second; a near 50x increase.
Screenshots
Before, with syscall heavy implementation:

After, with virtual FS implementation:

Notes
All changes to the tests were made before the FS was implemented; they have been run on both the previous and the new FS implementations to ensure backwards compatibility.