Skip to content

chexum/nnzoom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

nnzoom: identify how to re-compress compressed files

In an effort to store things uncompressed (to make best use of deduplication), nnzoom tries to find out how to get the same binary stream. It can also be used to allow storing the archives with this year's compression, while still being able to re-create the original archive to check a signature provided.

Sometimes it is trivial - just the right header options need to be re-created, timestamp and/or filename, though the stored filename may be somethign you can't re-create easily.

It's often tedious to identify the compression level if not the best/fast compression method was used, as only these are saved in the headers.

Occasionally it's hard, as some program versions and options are using different bitstreams, and not everytime the official GNU sources.

    * gzip-1.2.4a.tar.gz {"gzip": ["-9", 35615, 8, 0, 917999688, 2, 3, null, null, null, null]}
    * gzip-1.2.4.tar.gz {"gzip": ["-9", 35615, 8, 0, 745769892, 2, 3, null, null, null, null]}
    * gzip-1.3.12.tar.gz {"gzip": ["-9", 35615, 8, 0, 1176517776, 2, 3, null, null, null, null]}
    * gzip-1.3.13.tar.gz {"gzip-1.3.12-ubuntu --rsyncable": ["-9", 35615, 8, 0, 0, 2, 3, null, null, null, null]}
    * gzip-1.3.9.tar.gz {"gzip": ["-9", 35615, 8, 0, 1166171396, 2, 3, null, null, null, null]}
    * gzip-1.4.tar.gz {"gzip": ["-9", 35615, 8, 0, 0, 2, 3, null, null, null, null]}
    * gzip-1.5.tar.gz {"gzip-1.3.12-ubuntu --rsyncable": ["-9", 35615, 8, 0, 0, 2, 3, null, null, null, null]}
    * gzip-1.6.tar.gz {"gzip": ["-9", 35615, 8, 0, 0, 2, 3, null, null, null, null]}

Currently supported

  • Deflate with "gzip" with any versions. Many of the "--rsyncable" patches provide slightly different bitstreams, so ideally, these need to be kept as a separate executable.
  • Deflate with "minigzip" and/or a zlib based tool.
  • Deflate with "pigz", which has rsyncable as an option by default.
  • As a test, "zopfli", and "7z -tgzip"

TODO

  • Other compression tools (bzip2, xz)
  • Keep a partial hash of the uncompressed file to stop earlier if it's not a match
  • older "libatomic_ops" and "gc" tarballs use a too good compression which is not pigz/zopfli/7z
  • Older "libz" likewise (wonder why?)
  • Check if BSD is really using zlib which I called minigzip
  • Actually re-create the original archive based on the output
  • While looking at OpenSSH tarballs, and learning of BSD gzsig(1), I also found "https://joeyh.name/code/pristine-tar/"

About

Identify how to re-compress compressed files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages