Skip to content

feat: Implement new virtual glob filesystem #971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

PaulSonOfLars
Copy link

@PaulSonOfLars PaulSonOfLars commented Jun 14, 2025

Issue

When uploading images from a networked folder, immich-go slows down due to a large number of syscalls to get file metadata. This can be seen in the attached flamegraph, obtained by trying to upload pictures from a mounted SMB drive:

Fix

We can improve this by walking the filesystem beforehand, and keeping the results in memory. Not only does this batch all the metadata reads, it also deduplicates them, reducing overall network and disk usage, greatly improving performance.

This PR implements such a "virtual" filesystem, as a drop-in replacement for the existing implementation.

Improvements

In my environment, I was able to go from uploading one picture every 2 seconds, to 25 pictures per second; a near 50x increase.

Screenshots

Before, with syscall heavy implementation:
image

After, with virtual FS implementation:
image

Notes

All changes to the tests were made before the FS was implemented; they have been run on both the previous and the new FS implementations to ensure backwards compatibility.

@simulot
Copy link
Owner

simulot commented Jun 16, 2025

I'm wondering why the C.JPG test didn't pass?

I have passed manually my e2e tests against a local server. Test pass or fail in the same way ;-) , for folders and google takeouts, raw file systems and zipped folders and zipped takeout. So, good job!

I'm wondering if this kind of optimization can be applied when parsing distant zip files?

Thank you a lot.

@PaulSonOfLars
Copy link
Author

Oh, well that's interesting! Good catch.

I think that'll be due to filesystems. I'm developing on macOS, which uses case-insensitive filesystems by default. Assuming you use a standard linux setup with ext4 (which github actions does), that'll be case-sensitive, so .JPG and .jpg would be considered different files.

Given that, I think we should consider the previous test "C.jpg" matching "C.JPG" to be the desired behaviour, and revert your change. Instead, I would switch L123 to use lowercase filenames. Eg, switch it to: parts = strings.Split(lowerFName, string(os.PathSeparator) - that should change the matching logic to be properly case-insensitive.
Glad I wrote that test :D

(the above being said - no idea how case-insensitivity should be handled for zip files etc...)

@PaulSonOfLars
Copy link
Author

PaulSonOfLars commented Jun 16, 2025

But yes, this can likely be used for remote zip files too, but would need to experiment a little to see how exactly zip files are handled in go!

Should be doable though. And should be able to reuse much of the existing implementation; it might be as simple as using a ZipReadCloser for the rootFS instead of an actual directory! Will try that out on the e2e testdata zip :)

Would probably want to do this in a separate PR though; keep things separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants