Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtualize zip gzip / compressed files to reduce not just space but churn #605

Open
brendanheywood opened this issue Mar 26, 2024 · 0 comments

Comments

@brendanheywood
Copy link
Contributor

This is a very 'out there' idea :)

The idea is that if you upload a large compressed file, eg an mbz backup file which has lots of binary content (ie a moodle backup which does contain files on purpose) then objectfs under the hood will disassemble the file, and for each binary file see if that file is already in object storage with an exact hash match. If it is then it replaces it with a reference and then stores the reduced size compressed file. It reverses this operation on the way out transparently. These files would not be compatible with cloudfront serving as the file is just there ready to go in its final form. This would need to be bullet proof and the generated file must be an exact binary match of the original or it could create of problems elsewhere.

This way if automated backups are constantly churning backup files, the backup files themselves are quite minimal. But it is still wasting a lot of time grinding these files when file-less backups are better.

Most of the time a binary file inside a compressed file is not attempted to be compressed at all its just inserted, so there shouldn't really be a massive change in the overall space used for a single file pulled apart. But there will be massive space reduction when files are never deleted (for PITR recovery for example)

There would need to be specific code written for each type of compression format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant