Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksums that have changed #18044

Closed
ilovezfs opened this issue Sep 13, 2017 · 31 comments
Closed

Checksums that have changed #18044

ilovezfs opened this issue Sep 13, 2017 · 31 comments

Comments

@ilovezfs
Copy link
Contributor

I have only checked the stable specs so far, but the checksums for the following tarballs have changed, and I expect more to change over the coming days:

airspy: https://github.com/airspy/host/archive/v1.0.9.tar.gz
algernon: https://github.com/xyproto/algernon/archive/1.5.1.tar.gz
amazon-ecs-cli: https://github.com/aws/amazon-ecs-cli/archive/v0.6.4.tar.gz
apache-brooklyn-cli: https://github.com/apache/brooklyn-client/archive/rel/apache-brooklyn-0.11.0.tar.gz
aptly: https://github.com/smira/aptly/archive/v1.1.1.tar.gz
arabica: https://github.com/jezhiggins/arabica/archive/2016-January.tar.gz
assh: https://github.com/moul/advanced-ssh-config/archive/v2.6.0.tar.gz
aws-apigateway-importer: https://github.com/awslabs/aws-apigateway-importer/archive/aws-apigateway-importer-1.0.1.tar.gz
aws-sdk-cpp: https://github.com/aws/aws-sdk-cpp/archive/1.1.50.tar.gz
azure-cli: https://github.com/Azure/azure-xplat-cli/archive/v0.10.15-July2017.tar.gz
bartycrouch: https://github.com/Flinesoft/BartyCrouch/archive/3.8.1.tar.gz
bazel@0.2: https://github.com/bazelbuild/bazel/archive/0.2.3.tar.gz
blink1: https://github.com/todbot/blink1/archive/v1.98.tar.gz
bluepill: https://github.com/linkedin/bluepill/archive/v1.1.2.tar.gz
c14-cli: https://github.com/online-net/c14-cli/archive/0.1.tar.gz
certbot: https://github.com/certbot/certbot/archive/v0.18.1.tar.gz
cjdns: https://github.com/cjdelisle/cjdns/archive/cjdns-v20.tar.gz
cli53: https://github.com/barnybug/cli53/archive/0.8.9.tar.gz
cocoapods: https://github.com/CocoaPods/CocoaPods/archive/1.3.1.tar.gz
codequery: https://github.com/ruben2020/codequery/archive/v0.21.0.tar.gz
convox: https://github.com/convox/rack/archive/20170818204318.tar.gz
cookiecutter: https://github.com/audreyr/cookiecutter/archive/1.5.1.tar.gz
cpprestsdk: https://github.com/Microsoft/cpprestsdk/archive/v2.9.1.tar.gz
deisctl: https://github.com/deis/deis/archive/v1.13.4.tar.gz
docker-machine-completion: https://github.com/docker/machine/archive/v0.12.2.tar.gz
docker-machine-driver-vultr: https://github.com/janeczku/docker-machine-vultr/archive/v1.4.0.tar.gz
docker-machine-parallels: https://github.com/Parallels/docker-machine-parallels/archive/v1.2.3.tar.gz
echoprint-codegen: https://github.com/echonest/echoprint-codegen/archive/v4.12.tar.gz
erlang: https://github.com/erlang/otp/archive/OTP-20.0.4.tar.gz
erlang@17: https://github.com/erlang/otp/archive/OTP-17.5.6.9.tar.gz
erlang@18: https://github.com/erlang/otp/archive/OTP-18.3.4.tar.gz
erlang@19: https://github.com/erlang/otp/archive/OTP-19.3.tar.gz
flow: https://github.com/facebook/flow/archive/v0.54.1.tar.gz
gearboy: https://github.com/drhelius/Gearboy/archive/gearboy-2.3.1.tar.gz
gearsystem: https://github.com/drhelius/Gearsystem/archive/gearsystem-2.2.tar.gz
git-lfs: https://github.com/git-lfs/git-lfs/archive/v2.2.1.tar.gz
git-town: https://github.com/Originate/git-town/archive/v4.2.1.tar.gz
giter8: https://github.com/foundweekends/giter8/archive/v0.9.0.tar.gz
glbinding: https://github.com/cginternals/glbinding/archive/v2.1.3.tar.gz
globjects: https://github.com/cginternals/globjects/archive/v1.0.0.tar.gz
gollum: https://github.com/trivago/gollum/archive/v0.4.5.tar.gz
gomplate: https://github.com/hairyhenderson/gomplate/archive/v2.0.1.tar.gz
google-java-format: https://github.com/google/google-java-format/archive/google-java-format-1.4.tar.gz
gosu: https://github.com/gosu-lang/gosu-lang/archive/v1.14.6.tar.gz
hh: https://github.com/dvorka/hstr/archive/1.22.tar.gz
hub: https://github.com/github/hub/archive/v2.2.9.tar.gz
ice: https://github.com/zeroc-ice/ice/archive/v3.7.0.tar.gz
io: https://github.com/stevedekorte/io/archive/2017.09.06.tar.gz
jmxtrans: https://github.com/jmxtrans/jmxtrans/archive/jmxtrans-parent-267.tar.gz
kompose: https://github.com/kubernetes/kompose/archive/v1.1.0.tar.gz
kops: https://github.com/kubernetes/kops/archive/1.7.0.tar.gz
kube-aws: https://github.com/kubernetes-incubator/kube-aws/archive/v0.9.8.tar.gz
kubernetes-cli@1.3: https://github.com/kubernetes/kubernetes/archive/v1.3.10.tar.gz
libgit2: https://github.com/libgit2/libgit2/archive/v0.26.0.tar.gz
liquigraph: https://github.com/fbiville/liquigraph/archive/liquigraph-3.0.1.tar.gz
logtalk: https://github.com/LogtalkDotOrg/logtalk3/archive/lgt3112stable.tar.gz
mame: https://github.com/mamedev/mame/archive/mame0189.tar.gz
metashell: https://github.com/metashell/metashell/archive/v3.0.0.tar.gz
minimesos: https://github.com/ContainerSolutions/minimesos/archive/0.13.0.tar.gz
mogenerator: https://github.com/rentzsch/mogenerator/archive/1.31.tar.gz
monax: https://github.com/monax/monax/archive/v0.18.0.tar.gz
mongoose: https://github.com/cesanta/mongoose/archive/6.8.tar.gz
nats-streaming-server: https://github.com/nats-io/nats-streaming-server/archive/v0.5.0.tar.gz
ndpi: https://github.com/ntop/nDPI/archive/2.0.tar.gz
opencv: https://github.com/opencv/opencv/archive/3.3.0.tar.gz
openvdb: https://github.com/dreamworksanimation/openvdb/archive/v4.0.2.tar.gz
pcap_dnsproxy: https://github.com/chengr28/Pcap_DNSProxy/archive/v0.4.9.0.tar.gz
ponscripter-sekai: https://github.com/sekaiproject/ponscripter-fork/archive/v0.0.6.tar.gz
prometheus: https://github.com/prometheus/prometheus/archive/v1.7.1.tar.gz
protobuf: https://github.com/google/protobuf/archive/v3.4.0.tar.gz
protobuf@3.1: https://github.com/google/protobuf/archive/v3.1.0.tar.gz
pumba: https://github.com/gaia-adm/pumba/archive/0.4.5.tar.gz
rclone: https://github.com/ncw/rclone/archive/v1.37.tar.gz
redex: https://github.com/facebook/redex/archive/v1.1.0.tar.gz
robot-framework: https://github.com/robotframework/robotframework/archive/3.0.2.tar.gz
rom-tools: https://github.com/mamedev/mame/archive/mame0189.tar.gz
sops: https://github.com/mozilla/sops/archive/2.0.10.tar.gz
ssdb: https://github.com/ideawu/ssdb/archive/1.9.4.tar.gz
swagger-codegen: https://github.com/swagger-api/swagger-codegen/archive/v2.2.3.tar.gz
swift: https://github.com/apple/swift/archive/swift-3.1.1-RELEASE.tar.gz
swiftformat: https://github.com/nicklockwood/SwiftFormat/archive/0.29.5.tar.gz
syncthing-inotify: https://github.com/syncthing/syncthing-inotify/archive/v0.8.7.tar.gz
taylor: https://github.com/yopeso/Taylor/archive/0.2.2.tar.gz
tbb: https://github.com/01org/tbb/archive/2017_U7.tar.gz
terraform: https://github.com/hashicorp/terraform/archive/v0.10.4.tar.gz
terragrunt: https://github.com/gruntwork-io/terragrunt/archive/v0.13.2.tar.gz
treefrog: https://github.com/treefrogframework/treefrog-framework/archive/v1.18.0.tar.gz
voldemort: https://github.com/voldemort/voldemort/archive/release-1.10.25-cutoff.tar.gz
voltdb: https://github.com/VoltDB/voltdb/archive/voltdb-6.9.tar.gz
wolfssl: https://github.com/wolfSSL/wolfssl/archive/v3.12.0-stable.tar.gz
wrangler: https://github.com/RefactoringTools/wrangler/archive/wrangler1.2.tar.gz
xctool: https://github.com/facebook/xctool/archive/0.3.3.tar.gz
yaml-cpp: https://github.com/jbeder/yaml-cpp/archive/release-0.5.3.tar.gz
zorba: https://github.com/28msec/zorba/archive/3.1.tar.gz

CC @DomT4 this makes the patch ordeal look like nothing.

@ilovezfs
Copy link
Contributor Author

What we know so far: libgit2/libgit2#4343 (comment)

@fxcoudert
Copy link
Member

Either we hope that these sums don't change too often, bite the bullet this time and wait for the next. Or we disable SHA sums for all github tarballs, and start moving all formulas for which there exists an alternative hosting to that.

We could also in theory start mirroring every github tarball, but that seems like a no-no to me (cf: Sisyphus).

@ilovezfs
Copy link
Contributor Author

We could also in theory start mirroring every github tarball, but that seems like a no-no to me (cf: Sisyphus).

That will not actually help, unless we made the mirror the primary, because we don't fall through to a mirror on a checksum mismatch.

@DomT4
Copy link
Member

DomT4 commented Sep 13, 2017

Oh god. This is going to be actual hell.

Or we disable SHA sums for all github tarballs, and start moving all formulas for which there exists an alternative hosting to that.

Tossing any form of integrity verification out of the window seems like a drastic first step to take. Ideally, if anyone is extremely bored, projects need to get nagged to start doing actual release tarballs & have explained to them why this is important.

@ilovezfs
Copy link
Contributor Author

Tossing any form of integrity verification out of the window seems like a drastic first step to take.

I'm just going to assume @fxcoudert was being facetious. 🙉

@DomT4
Copy link
Member

DomT4 commented Sep 13, 2017

😆 In my defence, with how busy this week is proving to be, if you tell me that you're actually a 🐐 I'd probably take it at face value for at least an hour or so before thinking "Wait a second".

@ilovezfs
Copy link
Contributor Author

I am actually a 🐐.

@DomT4
Copy link
Member

DomT4 commented Sep 13, 2017

The team used to joke Xu was very possibly a non-malicious botnet because of how quickly high-quality solutions & ideas were churned out and implemented, so, rule nothing out I guess 🙈.

@peff
Copy link
Contributor

peff commented Sep 13, 2017

Either we hope that these sums don't change too often, bite the bullet this time and wait for the next.

Historically, they don't change that often, and certainly we (GitHub) don't plan to make gratuitous changes. We're discussing the possibility of providing byte-stable tarballs automatically (by cementing the archives, bugs and all, at the time a tag is uploaded).

In the meantime, what can we do to help resolve the pain of this particular change? If you provided a list of archive URLs, would it be helpful to get back a list of the updated hash for each case? That would allow a mass-update of the homebrew formulas.

@fxcoudert
Copy link
Member

fxcoudert commented Sep 13, 2017

We have 1498 URLs matching github.com/.*/archive/ in homebrew-core and homebrew-science), so I guess that's the extent of our potential problem.

But from where I'm sitting, not all of them are returning hashes that differ from the stored values. For example, https://github.com/01org/hyperscan/archive/v4.5.2.tar.gz still gives the value of 1f8fa44e94b642e54edc6a74cb8117d01984c0e661a15cad5a785e3ba28d18f5 we have on record, while https://github.com/airspy/host/archive/v1.0.9.tar.gz currently yields (for me) 967ef256596d4527b81f007f77b91caec3e9f5ab148a8fec436a703db85234cc which is different from our expectation of 358fea19f90bde13babc57ee7fdefeff3d8d8f5d629b0891734c5d4e811e8e6b.

To do a mass update we need things to have stabilised.

@ilovezfs
Copy link
Contributor Author

@peff See #18048

In the meantime, what can we do to help resolve the pain of this particular change?

Unfortunately, the major concern is whether the underlying files have changed, which we can only know

a) if we have access to the original tarballs to compare them to the new tarballs
b) if we have some way to reconstruct the old checksum from the current archives

In the meantime, what can we do to help resolve the pain of this particular change? If you provided a list of archive URLs, would it be helpful to get back a list of the updated hash for each case? That would allow a mass-update of the homebrew formulas.

The list is not fixed since I assume not all have been converted to the new format yet.

@ilovezfs
Copy link
Contributor Author

We're discussing the possibility of providing byte-stable tarballs automatically (by cementing the archives, bugs and all, at the time a tag is uploaded).

@peff please, please, please.... That would be wonderful ❤️

@peff
Copy link
Contributor

peff commented Sep 13, 2017

Not all hashes will change. Most of the changes are due to the bugfix in git/git@22f0dcd, so only tarballs with filenames greater than 100 characters are affected.

You're right that there may also be some caching in effect. We can flush those caches, which give us a stable state.

a) if we have access to the original tarballs to compare them to the new tarballs
b) if we have some way to reconstruct the old checksum from the current archives

You can verify the change by running a version of Git both with and without that patch (i.e., git revert 22f0dcd9634a818a0c83f23ea1a48f2d620c0546 and build the result). For instance:

$ git clone https://github.com/smira/aptly
$ cd aptly
$ /path/to/reverted/build/bin-wrappers/git archive --format=tar.gz --prefix=aptly-1.1.1/ v1.1.1 | sha256sum
92aa5caa12d756cb7469fa5772a03d7631b73d655b7329408a4d597ee8fb0ba4  -
$ /path/to/modern/git/bin-wrappers/git archive --format=tar.gz --prefix=aptly-1.1.1/ v1.1.1 | sha256sum
f7bc97e46cbff2f194af2c09db099d252f290a2ea90c251adea115e6d66cf31d  -

which matches what's in #18048. A few caveats, though:

  • the prefix is obviously important, and comes from the canonical repo name. So if a repo is renamed, GitHub will issue a redirect from the old name, but the resulting tarball will have the new prefix in it. airspy/host is an example of this, and probably has been broken since it was renamed in March.

  • note that the v is stripped when producing the prefix name from the tag name

  • this is calling gzip -cn under the hood. Which is pretty stable, but it's possible your local gzip may produce slightly different output, wrecking the hash

@ilovezfs
Copy link
Contributor Author

It looks like git will use brew's version of gzip if it happens to be installed, and that produces the correct checksums.

iMac-TMP:aptly joe$ /usr/local/opt/git-reverted/bin/git archive --format=tar.gz --prefix=aptly-1.1.1/ v1.1.1 | shasum -a256
92aa5caa12d756cb7469fa5772a03d7631b73d655b7329408a4d597ee8fb0ba4  -
iMac-TMP:aptly joe$ /usr/local/opt/git/bin/git archive --format=tar.gz --prefix=aptly-1.1.1/ v1.1.1 | shasum -a256
f7bc97e46cbff2f194af2c09db099d252f290a2ea90c251adea115e6d66cf31d  -

So this looks promising.

@ilovezfs
Copy link
Contributor Author

For stable, I have verified all of the original checksums could be reconstructed except for

  1. actual retags:

hh: dvorka/hstr#231
ndpi: ntop/nDPI#446
openvdb: https://github.com/dreamworksanimation/openvdb/issues/172
ssdb: ideawu/ssdb#1139

So far ssdb is the only upstream to acknowledge the retag.

  1. kops, which is likely somehow related to the problem described in Was 1.6.0 retagged? kubernetes/kops#2630 and Github source code archives are not immutable (?) kubernetes/kubernetes#46443. I have informed upstream here: Was 1.6.0 retagged? kubernetes/kops#2630 (comment)

  2. kubernetes-cli@1.3, which was already known to be wrong due to those same two issues (Was 1.6.0 retagged? kubernetes/kops#2630 and Github source code archives are not immutable (?) kubernetes/kubernetes#46443) as noted here: Github source code archives are not immutable (?) kubernetes/kubernetes#46443 (comment)

So I have merged #18048 and reverted (in #18053) the updates for hh, ndpi, and openvdb pending responses from upstream, and will not revert the others since they're accounted for.

On to devel next …

@ilovezfs ilovezfs mentioned this issue Sep 14, 2017
4 tasks
@ilovezfs
Copy link
Contributor Author

The devel specs for freerdp and hub (irony, much?) needed to be fixed:
#18055

I was able to reconstruct the original checksum for both.

The remaining mismatches will likely be in resource blocks.

@ilovezfs ilovezfs mentioned this issue Sep 14, 2017
4 tasks
@ilovezfs
Copy link
Contributor Author

Two checksums that were fine yesterday have gone bad today. I'm fixing them in #18072.

povray: https://github.com/POV-Ray/povray/archive/v3.7.0.3.tar.gz
teleport: https://github.com/gravitational/teleport/archive/v2.2.7.tar.gz

@peff
Copy link
Contributor

peff commented Sep 14, 2017

@ilovezfs I did a complete run over all of the GitHub archive URLs I could find in homebrew-core, and picked up a few stragglers. The results are in #18073, along with the scripts I used (they clone each repo, so unless you're running them on a fast connection, it will take a while).

@peff
Copy link
Contributor

peff commented Sep 14, 2017

I did a complete run

Note that this just looked for changes due to the Git patch. I didn't look for retags, renames, etc, that would cause the formula hashes to be invalid.

@sjackman
Copy link
Member

You lot are all amazing. Thanks for your fast and hard work to resolve this issue so quickly.

@ilyavaiser
Copy link

brew install hh

Error: SHA256 mismatch
Expected: c4995e7041dc66e2118f83bd4c6c7f4cff5b4c493ca28bd7e4aef76edeff71ba
Actual: 384fee04e4c80a1964dcf443131c1da4a20dd474fb48132a51d3de0a946ba996

@ilovezfs
Copy link
Contributor Author

@ilyaskorik yes that is an actual retag, which I've reported upstream, as noted here: #18044 (comment)

@ilyavaiser
Copy link

@ilovezfs But I can not install this program

@ilovezfs
Copy link
Contributor Author

@ilyaskorik yes, you'll need to wait for upstream to respond to dvorka/hstr#231 or risk changing the checksum yourself.

@boegel
Copy link

boegel commented Sep 21, 2017

@peff Do you mind clarifying something?

Not all hashes will change. Most of the changes are due to the bugfix in git/git@22f0dcd, so only tarballs with filenames greater than 100 characters are affected.

With that in mind, why is the tagged 'release' https://github.com/airspy/host/archive/v1.0.9.tar.gz affected, the filename there is way shorter than 100 characters... Am I overlooking something?

In the long run, what is the best alternative here to avoid being hit from possible future changes in the generated tarballs?
Do a git clone (or download & unpack) of the tagged version and create a tarball yourself (via tar cfvz)? I guess that wouldn't be reliable either to generate a tarball that has the same checksum on different systems, for pretty much the same reasons that cause this issue in the first place...

@peff
Copy link
Contributor

peff commented Sep 21, 2017

With that in mind, why is the tagged 'release' https://github.com/airspy/host/archive/v1.0.9.tar.gz affected, the filename there is way shorter than 100 characters... Am I overlooking something?

@boegel The airspy/host hash was not affected by any change to the tarball-generating code. Its contents changed because the repository was renamed, and we use the canonical repository name as the tarball prefix. We provide an HTTP redirect from the old name, though you probably would want to update to the new URL along with the new hash. E.g.:

$ curl -w '%{redirect_url}' -o /dev/null https://github.com/airspy/host/archive/v1.0.9.tar.gz 
https://github.com/airspy/airspyone_host/archive/v1.0.9.tar.gz

In the long run, what is the best alternative here to avoid being hit from possible future changes in the generated tarballs?

Right now, I think there are basically two options:

  1. Projects can create their own tarball using git archive (or tar, or whatever) and then upload them as an artifact to the Releases page. These are stored byte-for-byte indefinitely. Many projects already do this because they run autoconf or similar to generate their release tarballs. The big downsides are:
  • The UI on the Releases page still lists the auto-generated tarball (albeit at the bottom), making it potentially confusing for visitors.

  • This is an extra step that is borne by the project maintainer. But it's the packager who deals with the fallout from changing hashes. So the incentives are ,isaligned. And my understanding is that in general it's hard for packagers like homebrew to convince project maintainers to follow specific workflows.

  1. Verification doesn't necessarily have to be over the byte-for-byte tarball contents. It could be over some canonicalized representation (e.g., concatenating the filenames and contents in order and taking the sha256sum of that). The problems there are:
  • A convenient tool doesn't already exist (though it would not be too hard to come up with one).

  • It increases the attack surface for impersonating an official tarball, since an attacker has more leeway to change the bytes in a way that keeps their tarball matching the canonical hash, but which may behave differently in practice.

  • Likewise, it increases the attack surface of the overall system. Right now nothing gets fed to tar until its hash is checked. If your canonical-hash-checker uses tar under the hood, then an attacker may target bugs in tar even before the hash is checked. You can say the same for existing hash checkers, of course, but it's a lot more likely to have bugs in something as complicated as tar versus sha256sum.

    I don't know if those are compelling downsides or not.

That's a bit of a brain dump on the subject. There may be other options that I haven't thought of, too.

@ilovezfs
Copy link
Contributor Author

We're discussing the possibility of providing byte-stable tarballs automatically (by cementing the archives, bugs and all, at the time a tag is uploaded).

@peff is this still in the cards as a possibility?

@peff
Copy link
Contributor

peff commented Sep 21, 2017

@ilovezfs Sorry, I don't have an update. It's not something I'd be working on personally, but my understanding is it's still being explored.

@ilovezfs
Copy link
Contributor Author

@peff OK, thanks :)

@boegel
Copy link

boegel commented Sep 21, 2017

@peff Thanks for the extensive reply, much appreciated!

@tgamblin
Copy link

tgamblin commented Sep 21, 2017

@peff: Thanks from me for the detailed response, as well.

I think ensuring byte-stable tarballs, either by saving them or using something like Debian's special tar arguments, is the way to go here. We were hit with this in Spack as well, and I'm having trouble deciding exactly how to resolve this, since it's ambiguous whether there are going to be any guarantees on tarballs in /archive/ links going forward.

If you do not end up guaranteeing anything about /archive/ links, I'll probably convert our 7-800 affected packages to use /release where possible, and use git clone for specific commits where there is no stable release. I may also try to convince some package maintainers to post actual releases, but I agree strongly with all the downsides to that which you mentioned, particularly:

my understanding is that in general it's hard for packagers like homebrew to convince project maintainers to follow specific workflows.

It would be good if you put a warning on the release page about the /archive/ links so that other folks aren't encouraged to rely on their hashes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants