Skip to content

Make build reproducible#103

Merged
ddekany merged 14 commits intoapache:2.3-gaefrom
chrisrueger:make-build-reproducible
Dec 27, 2023
Merged

Make build reproducible#103
ddekany merged 14 commits intoapache:2.3-gaefrom
chrisrueger:make-build-reproducible

Conversation

@chrisrueger
Copy link
Contributor

  • e.g. all files now have fixed timestamp of '02-01-1980 00:00' and fixed permissions
  • furthermore OSGi MANIFEST.MF has no timestamped attributes

Before this change:

unzip -l freemarker-gae-2.3.33-SNAPSHOT.jar 
Archive:  freemarker-gae-2.3.33-SNAPSHOT.jar
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  12-22-2023 19:25   META-INF/
     5935  12-22-2023 19:25   META-INF/MANIFEST.MF
    11358  10-08-2022 22:17   META-INF/LICENSE
        0  12-22-2023 19:25   freemarker/
        0  12-22-2023 19:25   freemarker/cache/
     1025  12-22-2023 19:25   freemarker/cache/AndMatcher.class

After this change:

unzip -l freemarker-gae-2.3.33-SNAPSHOT.jar
Archive:  freemarker-gae-2.3.33-SNAPSHOT.jar
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  02-01-1980 00:00   META-INF/
     5833  02-01-1980 00:00   META-INF/MANIFEST.MF
    11358  02-01-1980 00:00   META-INF/LICENSE
        0  02-01-1980 00:00   freemarker/
        0  02-01-1980 00:00   freemarker/cache/
     1025  02-01-1980 00:00   freemarker/cache/AndMatcher.class

Also there are differences in the MANIFEST.MF

image

Attached 2 files before.txt and after.txt to compare the outputs.
MANIFEST.MF is the printable output from the jarviewer of bndtools which produces a nice readable output.

after.txt
before.txt

- e.g. all files now have fixed timestamp of '02-01-1980 00:00'  and fixed permissions
- furthermore OSGi MANIFEST.MF has no timestamped attributes
* this makes it easier to build e.g. when no JDK16 is installed. It installs it automatically (see https://docs.gradle.org/current/userguide/toolchains.html#sub:download_repositories).
see also the discussion here apache#79 (comment)

Feel free to exclude this commit if this should be in a separate PR
@chrisrueger
Copy link
Contributor Author

@ddekany those changes are an attempt for a reproducible build. (see discussion #79 (review) )

One nice side effect could be to show up in this list at https://github.com/jvm-repo-rebuild/reproducible-central

Feel free to cherry pick whatever you see fit. It tried to describe what the change does in the description above.

@chrisrueger chrisrueger marked this pull request as ready for review December 23, 2023 18:58
@ddekany
Copy link
Contributor

ddekany commented Dec 23, 2023

So is it already reproducible with this little change?

Things to consider:

  • gradlew should remain executable.
  • Also, we must not attempt to print the build date in freemarker.core.CommandLine anymore then
  • Regarding download the exact JDK-s (foojay-resolver-convention)... quite brutal, but, I guess, nobody cares about a few tens of megabytes anymore. Not sure how it finds the exact version though (like 16, but which sub-version). Also probably should be disableable.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Dec 23, 2023

So is it already reproducible with this little change?

This change at least makes the (linux) file-timestamps and permissions kinda static and removes timestamps from Manifest.MF. So yes to a large extend it is. I based my suggestion here on this which I was suggested recently as an example for a reproducible build.

Where I am unsure is the timestamps in the local build vs. release build e.g.
Bundle-Version 2.3.33.nightly_20231222T192537Z which contains the timestamp of the build.

But my assumption was that the "real" release build won't have the timestamp.

Maybe you could try to do a preview of a real release build without the timestamps (or tell me how).
In theory your build and my build should look exactly the same if the build has no timestamp in the version.

But I am not an expert. We could start with this. I think more information can be found here also a list of tools to verify it.

  • Regarding download the exact JDK-s (foojay-resolver-convention)... quite brutal, but, I guess, nobody cares about a few tens of megabytes anymore.

Feel free to remove this commit about foojay-resolver-convention if you have concerns. Maybe I shouldn't have included it here.
Was more a suggestion based on my recent comment. Should I revert it?

Only the first commit 1f9800c is related to reproducibility.

  • gradlew should remain executable.

Not sure I understand. I have used ./gradlew jar to build. So I would say, yes it is still executable.

  • Also, we must not attempt to print the build date in freemarker.core.CommandLine anymore then

Not sure I understand how this relates to reproducible. Is this something affection the build result artifact (freemarker.jar)?

@ddekany
Copy link
Contributor

ddekany commented Dec 24, 2023

So there's freemarker.jar:freemarker/version.properties, which contains a buildTimestamp, that we fill for each build (see freemarker/build/FreemarkerVersionService.kt). And that's what said command line utility prints too. For the reproducible build this feature has to be removed in effect.

The version number itself has no date in a release (see in freemarker.jar:freemarker/version.properties), but as per above, the build still won't be reproducible until we stop filling the buildTimestamp.

foojay-resolver-convention: Ah, it only downloads if the given JDK is not already found. But then, my question remains. If the build has to be reproducible, then we had to be more specific than just "JDK 16", right?

gradlew: I meant, it's not executable in the source distribution tar.
tar -tvf apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz gives -rw-r--r-- 0/0 8941 1970-01-02 01:00 gradlew.
But I just realized that that's already broken in the 2.3-gae branch. But I guess if I fix that these, the filePermissions { unix("rw-r--r--") } you added will collide with that.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Dec 24, 2023

So there's freemarker.jar:freemarker/version.properties, which contains a buildTimestamp, that we fill for each build (see freemarker/build/FreemarkerVersionService.kt). And that's what said command line utility prints too. For the reproducible build this feature has to be removed in effect.

The version number itself has no date in a release (see in freemarker.jar:freemarker/version.properties), but as per above, the build still won't be reproducible until we stop filling the buildTimestamp.

Ok thanks for the tip. The buildTimestamp=@timestampNice@ was not the only problem, but basically all timestamps in versions.properties.

I managed to get a reproducible build locally. But I had to manually adjust versions.properties and completely remove all the qualifiers and @-variables from versions.properties. So basically I hardcoded everything in versions.properties to 2.3.33.

I guess this is not how its done, but at least it proves that if the output of versions.properties does not include timestamps, then the build is truly reproducible. I verified it by building two times and then compare the resulting jars with https://try.diffoscope.org/

What do I need to do, to build so the version is 2.3.33 without qualifiers?

I think it basically comes down to "How to to create a versions.properties without any timestamps or at least a fixed timestamp like gradle does (1980-02-01T00:00:00Z)

foojay-resolver-convention: Ah, it only downloads if the given JDK is not already found. But then, my question remains. If the build has to be reproducible, then we had to be more specific than just "JDK 16", right?

You mean more specific to "what exact Java version (incl. build) to use and to do to get the exact same build result"?
I think the people at reproducible build have tried answering that in this .buildspec file.

I need to get a bit more familiar with it. I just found out about it.

gradlew: I meant, it's not executable in the source distribution tar. tar -tvf apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz gives -rw-r--r-- 0/0 8941 1970-01-02 01:00 gradlew. But I just realized that that's already broken in the 2.3-gae branch. But I guess if I fix that these, the filePermissions { unix("rw-r--r--") } you added will collide with that.

How do you create this apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz?
I like to try it out.

@ddekany
Copy link
Contributor

ddekany commented Dec 24, 2023

What do I need to do, to build so the version is 2.3.33 without qualifiers?

version.properties itself is the source of the version number. Before a release, we just modify it to have version=2.3.33 (and the other similiar version properties there), and commit it. But see the comments in version.properties. So, we have to remove buildTimestamp=..., but everything else can stay.

You mean more specific to "what exact Java version (incl. build) to use and to do to get the exact same build result"?

Yes. Like is it Oracle, is it OpenJDK, etc., and then which build exactly. I assume we have to specify that, and from there, the JDK will be pretty much always downloaded at first build (because the user doesn't have exactly the same version).

How do you create this apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz? I like to try it out.

gradlew build, and then see in build/distributions.

* I did that because there are testcases which failed when removing the buildTimestamp completely. Also there is code reading it. To stay backwards compatible we return a fixed date.
* if this should be removed completely then more places needed to be adjusted, which I tried to avoid here to keep it simple
* this does not touch the other artifacts like source distribution and thus avoids problems with files like gradlew not being executable
* because otherwise they are replaced during the build with a timestamp
* now attributes are applied to all jars  e.g. freemarker-gae-2.3.33-sources.jar freemarker-gae-2.3.33-javadoc.jar freemarker-gae-2.3.33.jar
@chrisrueger
Copy link
Contributor Author

What do I need to do, to build so the version is 2.3.33 without qualifiers?

version.properties itself is the source of the version number. Before a release, we just modify it to have version=2.3.33 (and the other similiar version properties there), and commit it. But see the comments in version.properties. So, we have to remove buildTimestamp=..., but everything else can stay.

Ok thanks. I discovered one problem: The tokens like nightly_@timestampInVersion@ in comments of version.properties were replaced too (I think in FreemarkerRootPlugin ). I think that this can easily be forgotten during a release. Thus I changed the comment in 9677cda slightly to nightly_[timestampInVersion] (square brackets [] instead of @@).

That way during a release you only have to adjust the real properties version, mavenVersion, versionForOSGi and do not care about the comments. If you know a better solution to avoid the replacement in comments, I'm glad to know.

Few more words to my latest 3 commits:

c9aa766 : I decided to NOT remove the buildTimestamp for now, because removal made some testcases fail and the buildtimestamp is also referenced in code (Version.java). I wanted to avoid too many changes for this PR here for now, before having a discussion about it.
So for now buildtimestamp is a fixed value now (buildTimestamp=1980-02-01T00:00:00Z). So it is rather meaningless, but at least backwards-compatible for the current code.

17ee92c and 5bcc56f: here I think I fixed the problem related to executability of gradlew which you pointed out.

gradlew: I meant, it's not executable in the source distribution tar.
tar -tvf apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz gives -rw-r--r-- 0/0 8941 1970-01-02 01:00 gradlew.
But I just realized that that's already broken in the 2.3-gae branch. But I guess if I fix that these, the filePermissions { unix("rw-r--r--") } you added will collide with that.

I just applied the permissions and reproducible attributes (file timestamps, ordering) to the contents of the .jar files e.g. in build/libs like freemarker-gae-2.3.33-sources.jar , freemarker-gae-2.3.33-javadoc.jar, freemarker-gae-2.3.33.jar.

IMO these three files are what's need to be reproducible.

You mean more specific to "what exact Java version (incl. build) to use and to do to get the exact same build result"?

Yes. Like is it Oracle, is it OpenJDK, etc., and then which build exactly. I assume we have to specify that, and from there, the JDK will be pretty much always downloaded at first build (because the user doesn't have exactly the same version).

Hmm regarding exact vendor and version. I did a little more investigation.
In short: I have not really found out yet how to specify this in such detail.
It looks like foo-jay works based on gradle toolchain definitions: see https://github.com/gradle/foojay-toolchains?tab=readme-ov-file#matching-toolchain-specifications

I found that you could specify the vendor.
I also checked a few .buildspec of the reproducible central guys files e.g. https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/opentelemetry/instrumentation/opentelemetry-1.14.0-alpha.buildspec which mostly define jdk=11, jdk=17 and the like. Less often I have found more specific definition like jdk=17.0.3. but no vendor like Oracle, Temurin.

The guide e.g. https://maven.apache.org/guides/mini/guide-reproducible-builds.html mentions:

Generally depend on the major version of the JDK used to compile. (Even with source/target defined, each major JDK version changes the generated bytecode)

And I have the feeling that this is the way it is treated.

The .buildinfo file (see here) seems to contain more details, but it seems to be just for documentation how a build has been created... (used to lookup by humans not by tools).

After 3 years of work on Reproducible Builds, it has been found more useful as an internal file format: Reproducible Central and its .buildspec format is more what we need to check that Reproducible Builds results has been achieved. .buildinfo just records a build, be it reproducible or not.

So I don't really know how to proceed. My gut feeling is that we could keep it like it is now.
We could just explain e.g. in the README which JDK you need for a reproducible build (and make a recommendation) and then it is up to the developer bringing it's machine to this state.

We could provide

plugins {
    id("org.gradle.toolchains.foojay-resolver-convention") version "0.7.0"
}

to make it easier to build right away, but document in the README that for a reproducible build a specific JDK is required.

How do you create this apache-freemarker-gae-src-2.3.33-SNAPSHOT.tgz? I like to try it out.

gradlew build, and then see in build/distributions.

Oh, so simple.. Thanks :)

One thing I could not test:
When I adjusted version.properties to 2.3.33 to simulate a release, gradlew clean build and gradlew clean check failed with

Execution failed for task ':verifyReleaseSetup'.
> Package signing is disabled!

Only gradlew clean jar worked. So I just verified the freemarker.jar in build/libs/freemarker-gae-2.3.33.jar.

Maybe you could try it since I guess you have this package signing enabled

@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

Ok thanks. I discovered one problem: The tokens like nightly_@timestampInVersion@ in comments of version.properties were replaced too (I think in FreemarkerRootPlugin ). I think that this can easily be forgotten during a release. Thus I changed the comment in 9677cda slightly to nightly_[timestampInVersion] (square brackets [] instead of @@).

Now that that the timestamp causes complications, we should just change the convention to version=2.3.33-nightly. Pretty much nobody cares about the build timestamp of a snapshot version after all. (Of course it would be also possible to set the filter so that it doesn't replace in comment lines, but that would just adds even more complication.)

c9aa766 : I decided to NOT remove the buildTimestamp for now, because removal made some testcases fail and the buildtimestamp is also referenced in code (Version.java). I wanted to avoid too many changes for this PR here for now, before having a discussion about it.

I say, we just keep it in the Version API, but mark it as deprecated, and always return null. (It was documented that it can return null, and it's quite unlikely that anyone even calls that method, and also fails on null.) That's the price of reproducible builds. The JUnit tests has to be adjusted for this. And all these change belongs to this PR of course.

After 3 years of work on Reproducible Builds, it has been found more useful as an internal file format: Reproducible Central and its .buildspec format is more what we need to check that Reproducible Builds results has been achieved. .buildinfo just records a build, be it reproducible or not.

So I don't really know how to proceed. My gut feeling is that we could keep it like it is now. We could just explain e.g. in the README which JDK you need for a reproducible build (and make a recommendation) and then it is up to the developer bringing it's machine to this state.

Ironically, we did have the exact JDK version in the META-INF of freemarker.jar, but now that was removed because of reproducible builds, which makes it harder to actually reproduce the build for someone else. Anyway... We could put this information automatically into the source artifact (that one we publish to Maven Central, and also into the source release), and the we can tell people to look for the information there, if they need to reproduce the build, and it didn't match at first attempt.

gradlew build, and then see in build/distributions.

Oh, so simple.. Thanks :)

One thing I could not test: When I adjusted version.properties to 2.3.33 to simulate a release, gradlew clean build and gradlew clean check failed with

Execution failed for task ':verifyReleaseSetup'.
> Package signing is disabled!

Yeah, releases must be signed. (If you have gpg installed anyway, that's the easiest.) Please check the included gradle.properties to set that up. (Also I add pointers for building distribution artifacts into the README.)

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Dec 26, 2023

Now that that the timestamp causes complications, we should just change the convention to version=2.3.33-nightly.

Ok, I will change that accordingly.

I say, we just keep it in the Version API, but mark it as deprecated, and always return null. (It was documented that it can return null, and it's quite unlikely that anyone even calls that method, and also fails on null.) That's the price of reproducible builds. The JUnit tests has to be adjusted for this. And all these change belongs to this PR of course.

Ok, I will work on that.

Ironically, we did have the exact JDK version in the META-INF of freemarker.jar, but now that was removed because of reproducible builds, which makes it harder to actually reproduce the build for someone else.

I just tested to make that possible again:
In osgi.bnd I just removed the attribute -noextraheaders: true which I added (or set it to false)

Then the following attributes are added back to MANIFEST.MF

image

I think that way we have it close to as it was before, but without timestamps.

We could still think about an additional buildinfo file in the future.

Version.getBuildDate() now always returns null because we do not store the buildTimestamp anymore for reproducible builds
@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

But because we don't download a specific JDK (although even if we did, one wonders if that will be possible few years later), I think it's better if the information is not there. Because, in your case, if someone rebuilds this, it very likely will not be Eclipse Adoptium, while here's a realistic chance that the 16 they use generates the same bytecode. That's why I though that it's just 16 then, and we put the information about the exacy JDK variant elsewhere, like into the source artifact and source distro tgz.

@chrisrueger
Copy link
Contributor Author

But because we don't download a specific JDK (although even if we did, one wonders if that will be possible few years later), I think it's better if the information is not there. Because, in your case, if someone rebuilds this, it very likely will not be Eclipse Adoptium, while here's a realistic chance that the 16 they use generates the same bytecode. That's why I though that it's just 16 then, and we put the information about the exacy JDK variant elsewhere, like into the source artifact and source distro tgz.

I see. Just reverted this. So no JDK in the MANIFEST.MF.

I will check how to put

the information about the exacy JDK variant elsewhere, like into the source artifact and source distro tgz

@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

FYI, now you can issue ./gradlew build -Pfreemarker.allowUnsignedReleaseBuild=true (in 2.3-gae) to avoid signing.

* now the source distribution tarball (e.g. apache-freemarker-gae-src-2.3.33.tgz) contains a .buildinfo file containing information about how the source is to be built to get a reproducible build
* based on https://reproducible-builds.org/docs/jvm/
* we can add more information if needed

Example output:

#Effective recorded build environment information
#Tue Dec 26 12:08:57 CET 2023
source.scm.uri=scm\:git\:https\://git-wip-us.apache.org/repos/asf/freemarker.git
build-tool=gradle
build.setup=https\://github.com/apache/freemarker/blob/2.3-gae/README.md\#building-freemarker
java.vendor=Eclipse Adoptium
java.version=17.0.9
os.name=Mac OS X
buildinfo.version=1.0-SNAPSHOT
source.scm.tag=v2.3.33
@chrisrueger
Copy link
Contributor Author

chrisrueger commented Dec 26, 2023

FYI, now you can issue ./gradlew build -Pfreemarker.allowUnsignedReleaseBuild=true (in 2.3-gae) to avoid signing.

Thanks.

Commit 3791ef5 now creates a .buildinfo file in source distribution.

E.g. my local build created this:

#Effective recorded build environment information
#Tue Dec 26 12:16:27 CET 2023
source.scm.uri=scm\:git\:https\://git-wip-us.apache.org/repos/asf/freemarker.git
build-tool=gradle
build.setup=https\://github.com/apache/freemarker/blob/2.3-gae/README.md\#building-freemarker
java.vendor=Eclipse Adoptium
java.version=17.0.9
os.name=Mac OS X
buildinfo.version=1.0-SNAPSHOT
source.scm.tag=v2.3.33

Where would you mention and explain this file?

private final String originalStringValue;

private final Boolean gaeCompliant;
private final Date buildDate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly this has near 0 practical importance, but... as Version has public constructor, and we allow passing in the build date, it would be more round if getBuildDate() still gives back the same value. Even if it will be almost always null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok than I misinterpreted your previous comment.

I say, we just keep it in the Version API, but mark it as deprecated, and always return null. (It was documented that it can return null, and it's quite unlikely that anyone even calls that method, and also fails on null.)

Could you please clarify?

Do you mean we keep Version.java as it was before, but just remove the part in Configuration.java

9a138b6#diff-114f8ea56965362193c291c9a631f7c7860c71c8ec76a94b16f5f801267bd578

Copy link
Contributor

@ddekany ddekany Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Version need not change, we just won't have the build date in version.properties.

@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

I guess .buildinfo should also be part of the source artifact on Maven.

@chrisrueger
Copy link
Contributor Author

I guess .buildinfo should also be part of the source artifact on Maven.

I could do that.
I just noticed the following sentence on the bottom of https://reproducible-builds.org/docs/jvm/

Notice that ${artifactId}-${version}-sources.jar files published in Maven repositories are not buildable sources, but sources for IDEs.

@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

in Maven repositories are not buildable sources, but sources for IDEs.

Yes, but I guess, if someone ever wants to check a build, they check against a jar on Maven Central, and that's published together with source artifact. The src distro is not necessarily that tightly coupled. But it's a minor thing really.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Dec 26, 2023

in Maven repositories are not buildable sources, but sources for IDEs.

Yes, but I guess, if someone ever wants to check a build, they check against a jar on Maven Central, and that's published together with source artifact. The src distro is not necessarily that tightly coupled. But it's a minor thing really.

Ok. Should I put it in the root of the jar or META-INF ? Currently there is nothing in the root.

EDIT: in root it would be

image

* then this would also be in the sources.jar in maven central
@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

The root is fine.

* as discussed we keep the buildDate in Version.java but deprecate the getBuildDate() method.
* but we keep the new behavior that Configuration.java does not pass the buildDate anymore to the constructor since it we removed it from version.properties
@chrisrueger
Copy link
Contributor Author

The root is fine.

Ok I just commited this.

I also reverted the Version.java (see comment)

Let me know if you had something else in mind.

@chrisrueger
Copy link
Contributor Author

Ok, I am done for today. Let me know if you see more issues.
Otherwise this is ready to merge or cherry pick.

@ddekany
Copy link
Contributor

ddekany commented Dec 26, 2023

I will merge this (then add stuff to README and version history). Can you send me the exact e-mail address you were using for the Apache ICLA (assuming you have an ICLA)?

@chrisrueger
Copy link
Contributor Author

Apache ICLA

Sent you an email

@ddekany ddekany merged commit af7054a into apache:2.3-gae Dec 27, 2023
@ddekany
Copy link
Contributor

ddekany commented Dec 27, 2023

Did some cleanup/docs after merge: 663925c

@chrisrueger
Copy link
Contributor Author

Great thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants