New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give guidance on reproducible builds #1865
Comments
|
I think it would be a bad idea to water down the reproducibility criterion by permitting certain classes of differences. To put it glibly: either a checksum matches or it doesn't.1 Furthermore, eliminating timestamp pollution from the artifacts of a complex build process is often the absolute lowest-hanging fruit towards getting a stable reproducible environment running. There are many more tedious aspects to getting all the bits lined up just right in a way that can be replicated perfectly by others. So calling out timestamps as being explicitly excluded from matching criteria wouldn't much help anyone towards being able to claim build reproducibility for their project. I do sympathize with projects who don't, themselves, publish binary artifacts for various reasons2. However, I think this could be addressed in other ways, such as by allowing community-driven reproducibility projects to confer gold status through some sort of trusted consensus mechanism. Free GH actions minutes for OSS projects would go a long way towards providing ready-made infrastructure for collaborative "build verification" services for various platforms. Build reproducibility is becoming a cornerstone of security (see the recent USDOD Securing the Software Supply Chain: Recommended Practices for Developers). I think it should remain as part of the Also, I do think that a watered-down goal of "an attempt at reproducibility" with some exceptions might make a good addition to the Footnotes
|
I think we should differentiate between For projects that only distribute source code, there is no binary that I don't think upstream projects that are known to produce binaries/artifacts that are difficult to secure further down the supply-chain should be allowed in Gold Tier.
Normalizing timestamps in a build is a fairly trivial issue, it's much easier to just fix the differences there instead of having to write programs that try to tell benign differences and underhanded backdoors apart with 100%-reliability. Needing manual intervention to inspect diffs should be the exception, not the norm for software with the |
|
@marcprux hit a lot of important points, thanks! The problem of doing something "bit-for-bit identical except for ..." presents pragmatic challenges.
In short, it is trivial to compare two artifacts, it presents a whole world of difficulties to compare only parts of two artifacts. I would strongly caution against using "reproducible builds" in any way other than https://reproducible-builds.org/docs/definition/ which really comes down to bit-for-bit reproducible without exception. |
|
The projects that I use from the Apache Software Foundation, such as Apache Maven and Apache NetBeans, publish a "convenience binary" along with the source release, all under the Apache name. Apache NetBeans even publishes a Snap package binary. The Maven build is reproducible, but the NetBeans build has quite some way to go before being reproducible. I have found it surprisingly, and frustratingly, difficult to get changes related to reproducible builds accepted by upstream projects. Holding the Apache Software Foundation to a different standard than other open-source projects just makes that even more difficult. It would remove my incentive to make the changes and one of their incentives to accept them. I would prefer the meaning of reproducible builds to remain bit-for-bit identical, including timestamps. Even for organizations that truly publish only a source release, one could argue that they should have the gold badge only if that source can be built in a reproducible manner. |
I will second this, @vagrantc. My own App Fair process creates verifiably reproducible iOS apps, but it is a constant struggle against ever-changing versions of the tools generating indeterminate output in insidious new ways (e.g., due to changes in Xcode's compiler parallelization). My main motivation for spending all these hours on tedious devops debugging is the stretch goal that these apps eventually achieve gold certification, and thereby serve as paragons of trust to the mobile community. |
|
As a strong supporter of bit-for-bit integrity without compromises: when developers report challenges producing software to adhere to some evaluation criteria (be that testability, security, performance or other), often it helps to introduce tooling improvements so that flaws (and opportunities) can be detected earlier in the assembly process (for example: the "shift security left" mantra, and similarly with continuous integration in general). I have a sense that the challenges experienced by many developers are a result of the fact that we generally have to inspect the output of builds (diffs - sometimes binary) to identify where non-reproducible elements have appeared, and then perform sometimes-mentally-challenging detective work to theorize and evaluate what could have caused those artifacts to appear. Hermetic build environments that can detect changes as soon as they're introduced during assembly could - I think - be an area where improved tooling might help to stem the introduction of non-reproducible elements early during development, in a way that could be largely-ecosystem-agnostic, and help to win developer mindshare. There could be practical challenges implementing fail-fast hermetic builds (are filesystem reads/writes the unit of integrity? is language-level and/or IDE-level support required? how would ephemeral and tempfiles be handled?) but I think they're manageable. And similar to test-driven-development: not everyone will want to adopt early-detection since it would add development friction - but for those who understand the value of it as an investment, the benefits should be clear. (sorry for sidetracking a bit: again I would reiterate that there shouldn't be exceptions for timestamps, because it's not clear what even is simply a timestamp at a binary level, and it would open the door to the very risks that reproducible builds are intended to solve, and there are unanswered questions about how integrity verification could be performed on content that fundamentally differs -- all points that others have alluded to. however I want to state both my support and suggestion that there may be solutions to address concerns) |
I think this isn't a good idea because it can lead to the dangerous and false assumption that the rebuild with only a different embedded timestamp can be considered identical in behaviour. But any binary could change its behaviour when there's some specific embedded timestamp. Yes, that would be visible in the source, but might be intentionally hidden as well. So while, when we get to the "only timestamp is different" level of almost-reproducibility, it's easy to go the last step (easiest: replace the timestamp in the binary you just built with the other one), this step is just as crucially important as all the other ones, so it can't get any special treatment. (The Android App world has a related problem with it's embedded signatures, which you can never reproduce except by copying the signature from the original binary into your rebuild as a last step. But without this step and a different signature the app is expected to behave differently in many scenarios.) |
|
Hermetic build environments that can detect changes as soon as
they're introduced during assembly
An option that should work until such environments are available would
be to compare the two build trees after a build. Those will have files
that aren't copied into the build artefacts, but the comparison could
be restricted to files that typically end up in the build artefacts.
That should help narrow down the source of non-determinism.
practical challenges implementing fail-fast hermetic builds
Essentially this would mean instrumenting all the tools used during a
build to record data and related metadata and record the chain of
transformation connecting the source files to the build artefacts.
The program receiving the instrumentation would then terminate the
build when it receives data different to a previous build.
That would have a lot of false positives though, since input that is
non-deterministic might not get represented in the build artefacts.
The build tracing feature would be useful for other situations too.
I think that I have seen a paper about a build tracing tool, but that
used ptrace rather than build instrumentation within tools and I
haven't been able to track down the paper.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
|
While I understand the desire for reproducible builds, in case such as Java where timestamps are introduced in to the zip-file archives (.jar, .war, and .ear files for those not familiar with Java), blasting the timestamps so they are all set deterministically set actually can cause useful information to be lost. Case in point: Soon after Oracle acquired Sun Microsystems, pretty much every one of Oracle's patch release notes for Java (including some of their corresponding CVE descriptions) were of the intentionally vague form of "multiple unspecified vulnerabilities were patched" or some such BS). My management would ask me, "Would you please analyze the patches and tell us if there's anything we urgently need to patch?" (This was way before SCA tools BTW.) So I would extract all the .class files from (typically) the rt.jar and look at the modification timestamps to see which had been updated since the last patch release we were using. Then I'd de-compile those .class files and do the same for the .class files from corresponding jar from the previous patch release and then finally diff the two versions to see what Oracle had actually fixed. (Don't miss that work at all!) However, had those timestamps all been identical because of deterministic reproducible builds, it would have made that task take a hundred fold times or so longer. So while there are times when deterministic, reproducible builds might be useful (they never will be unless people decide to verify all of that in their CI/CD pipelines, which I think most companies will be reluctant to do because of the build time resource commitment involved), IMO, for most cases, it brings very little added value. Just my $.02. |
Having reproducible builds does not preclude incremental updates to Java archives. It's just that the dates of the old and new class files would be meaningful, such as their separate release dates. OpenJDK builds don't use such incremental updates anymore, but they could, and they could do so in a reproducible manner, allowing your detective work to go on as before. Reproducible builds is about blasting away all the useless, meaningless differences: the timestamps of files created during the build, the unsorted order of files in their directories, or the random build paths used in a transient container. When the useless differences are removed, the meaningful differences can be found.
Oh, but its value to OpenJDK is already apparent, even though its build has been reproducible only since May. For just one example, this old Javadoc bug, only tangentially related to reproducible builds, would have been impossible to find, and its fix impossible to verify, without the easy ability to create bit-for-bit identical builds. |
If the timestamps are not deterministic they could very well be entirely arbitrary; you might end up with timestamps of whatever checkout the build of those class files happened to be performed on, whatever timestamp the developer happened to use at the time, whatever wonky clock was used, which would actually prevent you from being able to compare the timestamps in the way you actually described... Clamping the timestamps to the last source change or some other meaningful timestamp will more reliably get you the feature you described, presuming the other files actually retain meaningful timestamps (last modification in VCS, for example, rather than whatever happened to be the on-disk time), and prevents files generated during the build from needlessly differing. And if they don't preserve meaningful timestamps, then you're no worse off that you were. No need to blindly reset them if the process otherwise maintains meaningful timestamps; embedding the current clock time will nearly always require a maximally detailed process of comparison.
Nothing is useful unless people actually try to do it, true. If we are talking about a best practice gold standard, well, let us not set the sights too low either. Recognizing that some things are harder and take more effort, and demonstrating that "this project follows all known best practices" vs. "this project follows many best practices" vs. "this project follows some best practices" should be reflected in the levels. |
|
Just chiming in here to discourage any relaxation of the gold standard. the gold standard should be clear: bit-for-bit identical reproducibility. Please do not carve out subtle exceptions for variable timestamps. For a project that distributes only source code artifacts, i still think it's worth asking during the review whether generated artifact used by the end user can be built reproducibly. Obviously, we don't want to require source-only software projects to distribute binaries, but presumably the developers do actually have some practice in building some user-facing artifacts. Such a project should be able to concisely describe a particular toolchain and set of compilation/configuration options and dependencies that are known to provide a reproducible build that covers a substantial portion of the codebase. |
Some projects have raised concerns about challenges meeting the
build_reproduciblegold criterion. The purpose of this criterion is to counter malicious builds, as happened in SolarWinds' Orion, by enabling verifiable reproducible builds. We still want to counter the attack, but we may be able to relax the requirement slightly while still countering the attack:So under the `build_reproducible' gold criterion, modify:
Change the second sentence to read:
Change "result" to "built result", and replace the final period with:
The text was updated successfully, but these errors were encountered: