-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file type based chunking / compression heuristic #82
Comments
release 0.25 introduced different compression options and compression type is store with the data chunk, so this issues idea is possible to implement now. |
is this related to restic's interesting new chunking algorithm? |
No, borg also does CDC, just using a different algorithm. For chunking, it is not easy, but just as an example: For compression, it is easier, we could e.g. do something like: |
It could chunk along boundaries of files not only within TARs, but vm-images; header information of media-files that might change more likely compared to the data-stream; open-office files are also compressed ZIP files; EXIF-tags and some raw image formats might contain previews that might fit neatly into a chunk… etc. etc. I could see some real benefits. And funky heuristics and support for certain filetypes could be added incrementally without harming any compatibility. And yet, it would require a lot of work and knowledge of all the various formats… and it needs to be crafted carefully so the parser will robustly only output suggested chunk-sizes to the chunker, and not be prone to security issues when trying to parse dozens of filetypes. Cool idea, but maybe something for post-1.0 ? |
it would be nice if there was a more lax content equivalent mode that could more effectively dedup compressed things |
see also #765. |
Btrfs has the following logic, maybe we could do something similar:
|
File type based compression implemented by: #810 |
I am splitting this into multiple tickets:
So, as all is covered now in these tickets, I am closing this one as duplicate. |
We could have special chunkers / no chunking and compression algorithms / no compression for specific file types (as determined by file ext. or magic), file sizes, etc.
The text was updated successfully, but these errors were encountered: