Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work Towards Reproducible/Reproducible Builds (Research Needed?) #3927

Closed
sarciszewski opened this issue Apr 11, 2015 · 13 comments
Closed

Work Towards Reproducible/Reproducible Builds (Research Needed?) #3927

sarciszewski opened this issue Apr 11, 2015 · 13 comments

Comments

@sarciszewski
Copy link

This is best illustrated by example:

#!/usr/bin/env bash

wget -O hosted.phar https://getcomposer.org/download/1.0.0-alpha9/composer.phar

git checkout 1.0.0-alpha9
bin/compile
xxd composer.phar > ver1.hex
xxd hosted.phar > ver2.hex
diff ver1.hex ver2.hex > shouldbezero.diff

This script produces a 8.9 MB file.

What can be done to make composer.phar builds reproducible?

@alcohol
Copy link
Member

alcohol commented Apr 13, 2015

What is the actual issue here?

@sarciszewski
Copy link
Author

The answer to your question was a brief google search away.

I was attempting to verify the .phar available is equivalent to one built from the source, and there were a lot of differences. Ideally, grabbing the .phar and then building the .phar from source should result in identical .phar files. But it did not. Building from source does not give you the same thing you get when you attempt to download a .phar from the Composer website.

The issue here is that, unless builds are deterministic, if getcomposer.org is compromised and the deliverables are trojaned, it will be very hard to detect. If builds were reproducible from source, then one simply has to checkout the release tag, build the .phar, and compare to what's being served.

As a safeguard to the entire PHP developer community, I believe investigating making composer.phar build, byte-for-byte, from the source no matter who builds it is a worthwhile measure.

@sarciszewski
Copy link
Author

As a follow-up, this is an open-ended issue for the community.

Unlike the cryptographic signature issue I opened last year and @padraic has been trying to advocate for, I don't know if a solution exists.

That said, if only one project can tackle deterministic builds and the goal is securing PHP developers the world over, the best candidate is Composer.

@sarciszewski sarciszewski changed the title Reproducible Builds Work Towards Reproducible/Reproducible Builds (Research Needed?) Apr 13, 2015
@alcohol
Copy link
Member

alcohol commented Apr 13, 2015

The phar can sometimes lag a few minutes behind the master branch (or more even, not sure on that to be honest). It's still an alpha product, constantly changing. Why would you expect it to result in identical builds? Also, there are bound to be differences since there are artifacts in the code that relate to when the build was run (for example, a timestamp and possibly also a commit hash). So they will never be 100% identical. See Compiler.php for specifics.

@sarciszewski
Copy link
Author

The phar can sometimes lag a few minutes behind the master branch (or more even, not sure on that to be honest). It's still an alpha product, constantly changing.

Quoting my first post:

wget -O hosted.phar https://getcomposer.org/download/1.0.0-alpha9/composer.phar
git checkout 1.0.0-alpha9

I wasn't checking the master branch.

@alcohol
Copy link
Member

alcohol commented Apr 14, 2015

And you also didn't read my full reply. Maybe take the time to do so before replying wastefully?

@sbuzonas
Copy link
Contributor

The build timestamp in the version output that alcohol mentioned is the big one I can think of. The signature is the second. Then take in to consideration what a PHAR is, a compressed archive of files. Each file has a creation time, modified time, access time, uid, gid, permissions, etc... I don't know how much of this metadata is persisted when the PHAR is built, but I know some of it is... Your best bet is to extract the PHAR, set the release date constant back to the placeholders in src/Composer/Composer.php and diff your directories.

@paragonie-scott
Copy link

We tested 1.0.0-alpha9 with our new pharaoh auditing utility on the .phar downloaded from https://getcomposer.org and the one built from source.

The results are available here: https://gist.github.com/paragonie-scott/ccb86b34ff0577d229bc

@alcohol

Why would you expect it to result in identical builds?

It's not expected behaviour, it's requested behaviour. Deterministic builds are a highly desirable property to prevent targeted malware attacks. If a skilled analyst can audit the source code, then verify that the deliverable is identical to what they get when they build from source, there can be a reasonable assurance that the .phar deliverables have not been tampered with. Even in the absence of GPG or OpenSSL signatures.

Also, there are bound to be differences since there are artifacts in the code that relate to when the build was run (for example, a timestamp and possibly also a commit hash).

Pharaoh extracts the one you provide and the one we build and compares them with the git diff utility, thereby making the timestamps and commit hashes moot.

The artifacts in the code are precisely what can be addressed in a stable release to make the builds deterministic.

@slbmeh

The signature is the second.

To the best of my knowledge, Composer doesn't actually employ an asymmetric cryptographic signature (e.g. OpenSSL) in the PHAR building process, so building from source ought to produce the same signature (because hash-functions are deterministic) if the underlying code is identical.

Your best bet is to extract the PHAR ... diff your directories.

This excerpt is precisely what v0.1.1 of Pharaoh does. 👍

On Topic

EDIT: Actually, I misread the diff. There are changes outside the autoloader.

We still believe that small tweaks to the code to make builds deterministic would allow for automated threat detection and prevention by independent third parties. Which is sort of the entire goal on our end. Therefore, I'd like to request that this thread stay open for future discussions along this vein.

@alcohol
Copy link
Member

alcohol commented Apr 16, 2015

I understand you tested the alpha release specifically, but my point is that most users simply download the 'latest' snapshot. This snapshot points at the master branch and could potentially lag behind a commit (or several) depending on when you download it. The 'version' is stored inside the phar archive, which in the case of a non-tagged release is the sha of the commit. So there are several factors that you would always have to take into account.

I think this issue will probably stay irrelevant until composer heads towards a more stable release cycle. I wouldn't count on that in the very near future though. Just my thoughts.

@sbuzonas
Copy link
Contributor

@paragonie-scott you're right, the normal signature for a phar is deterministic. I mentioned it because it would cause a hex dump to be a good bit more different with even the slightest difference in the contents.

Looking at the diff I think the only difference is that they were installed with two different versions of composer. I initially thought your build was phr_6bTGzo... but at second pass I believe it to be more likely to be phr_471Znr... The build in phr_6bTGzo was generated with a version of composer prior to the .hh extension updates to the autoloader which is much older than 1.0.0-alpha9.

@Seldaek
Copy link
Member

Seldaek commented May 1, 2015

Sooo... there were a bunch of various issues making the phar file vary at every build, which I fixed in the commits linked above!

Now the most fun of the issues is that the phar extension stores a unix timestamp for each file in its file manifest, and after reading their whole spec and then diving into the source it turns out that timestamp isn't related to the file at all, nor configurable, it's just time() for every file. As time() isn't quite reproducible given the time-space continuum and all that, the only way it seems was to patch the phar after creation, and then update its signature.. which I did in a tiny new library https://github.com/Seldaek/phar-utils/blob/master/src/Timestamps.php#L33

I now get the same phar output on linux and windows running bin/compile on both from the same git commit, but if you wanna check it out and confirm you're most welcome to do so.

Obviously this whole mess is not applicable to the previous releases, but upcoming ones should hopefully be good, and so are the dev snapshots by the way.

@Seldaek Seldaek closed this as completed May 1, 2015
@Seldaek Seldaek added the Bug label May 1, 2015
@Seldaek Seldaek added this to the Backwards Compatible milestone May 1, 2015
@sarciszewski
Copy link
Author

Sooo... there were a bunch of various issues making the phar file vary at every build, which I fixed in the commits linked above!

👍 Awesome!

I wrote and published a tool called Pharaoh if you'd like to use it to perform the meaningful comparisons. (Timestamps are benign; I was interested in the phar stub and the file contents.)

I'll check this out this weekend and publish my findings :)

@Seldaek
Copy link
Member

Seldaek commented May 2, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants