Ensure reproducible builds for daffodil release candidate container#1254
Conversation
Update the daffodil release candidate container README to us a git URL to the main daffodil branch to download the container source. This ensures release candidate builds use the latest source and do not include any local changes that could affect reproducibility. Also add documentation how to to build, test, and remove local changes since using a git URL makes that a bit more difficult. DAFFODIL-2907
| To build or update the Daffodil release candidate container image: | ||
|
|
||
| podman build -t daffodil-release-candidate /path/to/daffodil.git/containers/release-candidate/ | ||
| podman build -t daffodil-release-candidate https://github.com/apache/daffodil.git#main:containers/release-candidate |
There was a problem hiding this comment.
This takes whatever the head of main is. Not a specific tag. Shouldn't this be taking a specific tag?
Or the issue anyway is that the current state of the main branch is also subject to being "not quite the right thing" just as with a local change to the sandbox.
So does this really improve things?
There was a problem hiding this comment.
Correct, it just uses whatever main is at the time of the build. There are two things I think this improves:
-
It makes it less likely to include local changes. If you have the latest daffodil repo but make local changes (e.g. like when I recently tested updating the container for Fedora 40), if I was doing a release it could be possible for me to forget to undo those changes and rebuild. By using a git URL, it at least ensures there are no local changes.
-
It makes it easier for VS Code/SBT releases. The Daffodil VS Code and SBT extensions are in separate repos, so developers of those might not necessarily keep their Daffodil repo up to date, or they might not even have a local clone of the Daffodil repo at all. So they need to either make a fresh clone or remember to update their clone prior to building the container. Using a git URL avoids all this entirely. They just run
podman buildand it will fetch the latest source and build it. Note, I'm not suggesting anyone has done this, it just minimizes one potential source for errors.
You are right that this doesn't do anything to really know which version of the container was actually used for a specific build. But we already have that problem today. This didn't try to solve that, just tried to make things a little easier and avoid local changes.
I would eventually like this to all go away at some point. For example, some projects at ASF have infrastructure set up so that you just push a tag and that triggers everything on some build infrastructure. So ideally pushing a tag would cause a GitHub actions to build and run the container from that tagged commit. No need to have podman or build a container or anything. It would all happen remotely. Then we would know the tagged commit was used for building the container and building daffodil. That's a bit more effort though, especially since things like signing keys and SVN passwords are needed for the GitHub actions to do everything it needs to do. I think it's possible in theory, but a decent amount of work.
There was a problem hiding this comment.
What is example(s) of such an ASF project that has this "just push a tag" mechanism working?
There was a problem hiding this comment.
I don't know of a good way to find all of them, but searching through the INFRA Jira for "GPG_SIGNING_KEY" (I believe this is the secret that infra adds to GitHubActions), it looks like it's mostly projects that are part of Apache Logging.
Infra also has documentation about Automated Release Signing and what they require before they will allow it.
The biggest issue is reproducibility--our builds are close to reproducible, and they are close enough that it's pretty easy to verify the differences are expected, but they aren't 100% bit-for-bit the same.
Note that if we only built/released source artifacts we could get it to work without a problem (those are already reproducible). But the binaries are much harder, and personally I think should be built as part of the release process, even though ASF doesn't require it. I've opened a PR with sbt-native-packager to fix our zip binaries: sbt/sbt-native-packager#160. The windows installer binaries I don't think we can ever make fully 100% reproducible. We'll have to figure out if ASF can make an exception if we ever want to go this route.
Also, ASF Infra is working on artifacts.apache.org which is something they are working on to ease the whole release process. I think it's still TBD how much that will affect our release process (e.g. will it include signing or not), but that's something to consider too. Though, I think that's a ways off.
|
This change to using the URL doesn't work for me: But if I use the original command line it works: |
|
Might be a version of podman that's too old? I have 4.9.4 and it works for me, maybe git URLs is a relatively new feature? |
|
Found this commit that added support for Git URLs to support branch and subdirectory: containers/buildah@5d9d28b That was merged in v1.27.0 in 2022, so you need a somewhat new version. Note that command you use works fine, just make sure there are no local changes and you've have the latest main branch. We may want to revert this change if v1.27.0 isn't widely distributed or if it's some other bug. |
|
podman --version says it is version 3.4.4. This is what Ubuntu 22.04 (very recent) installs when you issue the regular podman install using apt. Per |
|
Yeah, we might have to stick with not using a github url. What version of of |
|
|
|
Yep, looks like Ubuntu is a few versions behind where buildah added support for git URLs with directories (v1.27). So maybe we should undo this commit. In theory you could use replace "podman" with "docker" in all the commands and everything should just work. I think docker has had this git URL directory support for much longer than podman, so it's probably supported in whatever version of docker you might have. Though, I haven't actually tested docker so who knows if there are subtle differences. Probably safest to just use the directory instead of git url. |
Update the daffodil release candidate container README to us a git URL to the main daffodil branch to download the container source. This ensures release candidate builds use the latest source and do not include any local changes that could affect reproducibility.
Also add documentation how to to build, test, and remove local changes since using a git URL makes that a bit more difficult.
DAFFODIL-2907