-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Dockerfile for release automation #13250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5e768e5 to
4da5c97
Compare
|
Having a dockerfile that can do identical tarballs to the one we ship is a sensible idea as it makes it easier for anyone who feels like it to reproduce them. I don't think there is any particular need to do releases "in the cloud" as it does not actually add value or protection to the process. As long as the tarballs can be reproduced, they can be verified to be built from a known git repository state and then it does not matter where they were generated. |
Got it, that makes sense! I can remove the github action integration if you don't want the github action to create and upload those tarballs. The only problem I can see is if the Dockerfile is not part of your workflow or part of some sort of automation that it it's very easy to forget to keep it up to date. Do you have ideas how we could make sure we don't forget about it? Then I did the following experiment: I went to the release page and downloaded the latest docker build -t curl/curl .
docker run --rm -v (pwd):/usr/src -w /usr/src curl/curl autoreconf -fi
docker run --rm -v (pwd):/usr/src -w /usr/src curl/curl ./configure --without-ssl --without-libpsl
docker run --rm -v (pwd):/usr/src -w /usr/src curl/curl make -j8
docker run --rm -v (pwd):/usr/src -w /usr/src curl/curl ./maketgz 8.7.1Here is an overview of what a recursive diff is showing me just following these simple steps: $ diff -qr local remote
Files local/curl-8.7.1/CHANGES and remote/curl-8.7.1/CHANGES differ
Files local/curl-8.7.1/docs/curl-config.1 and remote/curl-8.7.1/docs/curl-config.1 differ
Files local/curl-8.7.1/include/curl/curlver.h and remote/curl-8.7.1/include/curl/curlver.h differ
Files local/curl-8.7.1/include/curl/Makefile.in and remote/curl-8.7.1/include/curl/Makefile.in differ
Files local/curl-8.7.1/ltmain.sh and remote/curl-8.7.1/ltmain.sh differ
Files local/curl-8.7.1/packages/Makefile.in and remote/curl-8.7.1/packages/Makefile.in differ
Files local/curl-8.7.1/packages/vms/Makefile.in and remote/curl-8.7.1/packages/vms/Makefile.in differ
Files local/curl-8.7.1/projects/Windows/VC14/lib/libcurl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14/lib/libcurl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14/src/curl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14/src/curl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.10/lib/libcurl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.10/lib/libcurl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.10/src/curl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.10/src/curl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.20/lib/libcurl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.20/lib/libcurl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.20/src/curl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.20/src/curl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.30/lib/libcurl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.30/lib/libcurl.vcxproj differ
Files local/curl-8.7.1/projects/Windows/VC14.30/src/curl.vcxproj and remote/curl-8.7.1/projects/Windows/VC14.30/src/curl.vcxproj differ
Files local/curl-8.7.1/scripts/Makefile.in and remote/curl-8.7.1/scripts/Makefile.in differ
Files local/curl-8.7.1/src/tool_hugehelp.c and remote/curl-8.7.1/src/tool_hugehelp.c differ
Files local/curl-8.7.1/tests/data/Makefile.in and remote/curl-8.7.1/tests/data/Makefile.in differ
Files local/curl-8.7.1/tests/http/Makefile.in and remote/curl-8.7.1/tests/http/Makefile.in differ
Files local/curl-8.7.1/tests/Makefile.in and remote/curl-8.7.1/tests/Makefile.in differWhat I can see in there
and this just looking though the first five or so differences. I don't think I can get to the bottom of all of those differences and I for sure can not judge why they're different. I believe some could be from slightly different tools or how we invoke them. I would appreciate your help here in getting to the bottom of this. |
|
|
The autotool related differences I assume is because your image does not use the exact same set that I used when made the release. |
Possibly also Speaking of 8.7.1, also |
|
Thanks folks! I made some progress here and found two date-related issues
Looking at the github release for 8.7.1 it was on 2024-03-27 (~10am ish) https://github.com/curl/curl/releases/tag/curl-8_7_1 In the release tarballs I can find a datestamp from 2024-03-26 and other datestamps seem to be correctly set to 2024-03-27 My hunch is that
This is in no way critical I just wanted to leave it here since it makes automatically comparing the diffs tricky. Then I looked into the differences in the djh@rf /t/curlpkgs> diff {local,remote}/curl-8.7.1/include/curl/Makefile.in
462c462
< echo ' cd $(top_srcdir) && $(AUTOMAKE) --gnu include/curl/Makefile'; \
---
> echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign include/curl/Makefile'; \
464c464
< $(AUTOMAKE) --gnu include/curl/Makefile
---
> $(AUTOMAKE) --foreign include/curl/MakefileI checked the debian bookworm docker image; here are the versions installed
Comparing to the mail https://curl.se/mail/lib-2024-03/0062.html
The versions seems to match except for the perl and git binaries but those should not make a difference here. What I can see is diffs like < version: $progname $scriptversion Debian-2.4.7-5
---
> version: $progname $scriptversion Debian-2.4.7-7so I believe even the slight differences in
I don't know why the Microsoft files are changing and I don't have the experience to look into this. Summary: there are still some slight differences between the release tarballs and what I can re-create now with this docker build environment; some we can explain (see above) and some is still unexplained (Microsoft builds, mismatch in datestamp). The autotools generated code seems to differ even though the versions have the same major/minor/patch versions, making it tricky to do any automated comparisons. |
Can you tell which files these are and how are they changing? It is their line ending perhaps? |
|
The Microsoft related files I can see in my Here is
I don't know how these files are getting generated, if the entries there can be of different order, or if the build command differs. |
|
It seems indeed that There is a generator logic in |
|
This script sorts the lists. I have not figured out why it would create output not sorted... |
This file is included in the tarball by mistake. The date in this file is the date when it was generated, the day before the release. I will make a PR to remove it from the dist, it should get generated in the build. |
The markdown file is already there and the .1 file gets generated in the build. Ref: #13250
|
@daniel-j-h maybe you can submit the maketgz fix separately, to allow us to gradually and step by step fix the nits you have identified |
|
Got it! Can do: #13280 |
|
Ah, the image needs |
The SOURCE_DATE_EPOCH env var is needed to date-stamp releases properly with the release date, when re-creating official releases. Ref: curl#13250
41cb10a to
090a150
Compare
Good catch!! I just added it to the docker environment and re-build the release tarballs. It looks like there are still unsorted lists in there, though. For example compare the files in this list
These differences still exist even after re-building everything with On a positive note: if we figure out the Microsoft file differences then the only differences are due to the slight mismatch in tools outlined in #13250 (comment) - I'm wondering, are you on debian bookworm and using the packages from apt? Or did you install specific packages locally yourself from source? There I can see three ways forward: either we track your specific setup in this Dockerfile, or we standardize on tools from a distribution such as debian bookworm, or maybe you are willing to do releases in the self-contained docker environment. |
We should figure that out and make them stable.
I'm on Debian unstable/sid and I use packages from apt. but...
I am prepared to use this dockerfile when releasing tarballs going forward to ease the process for people who want to reproduce them. I can probably also make the daily snapshot builds use the same thing. |
This target generates the MSVC project files. This change removes the extra sorting and instead makes the script use the order of the files as listed in the variables - which are mostly sorted anyway. This is an attempt to make the project file generation more easily reproducible. Ref: #13250
|
#13294 is a take to simplify the project file generation in the hope that the sorting was a reason for the diff. It was nonetheless unnecessary. |
This target generates the MSVC project files. This change removes the extra sorting and instead makes the script use the order of the files as listed in the variables - which are mostly sorted anyway. This is an attempt to make the project file generation more easily reproducible. Ref: #13250 Closes #13294
|
Sorry for the slow responses I've been out sick these days, hope to get better by the weekend. Some quick responses below
There are two sides to this
https://docs.docker.com/reference/dockerfile/#user The second step is optional and is only needed because some programs expect an actual user and group (and e.g. home dir) to exist when they see the uid and gid that you pass in. And if you want to map your host user and group to the container's user and group it's best to add |
|
This assumes that the docker image used is not compromised itself, either via the distro or via the docker image maintainers. If possible, it would be nice to be able to generate the tarball using different distros dockerized by different groups of people, e.g. Debian and Alpine, or perhaps Fedora, and compare them. Not sure if a tarball can be made reproducible like that though. |
autoconf, automake and libtool are most likely the tools which versions are most relevant. The others most probably not at all. I also don't want a "trust competition". Reproducible builds are primarily interesting and important in the short term: to allow users to easily verify the last few releases (say within the last year or so). An obvious advantage of using Debian as a base for this is familiarity, trusting the brand, its processes and that they also patch packages for functionality. Going with a separate environment means leaning on someone else's process. Possibly a less familiar one. |
|
and also: we have removed the need for awk in maketgz (since #13311) so |
A breaking change resulting from building from latest Debian packages could come at any time. Maybe in 5 years, maybe in a week. Also given Debian does not sign their container images or produce reproducible digests, we have no easy way to know if Debian in the container is actually the Debian we expect it to be. My arguments here would be far stronger if we were talking about generating deterministic binaries. That said, binaries (and the archives that generate them) can stick around and see some level of use for decades. There are (sadly) many mission critical embedded devices being used today with very old versions of curl on them and we can expect versions installed today will likewise be used many years in the future. When it comes to security auditing and forensics it becomes relevant to recreate the exact supply chains of binaries created years ago, that resulted in recent harm, to rule out or identify supply chain attacks or obscure vulnerabilities that only show up depending on the tools used to build them. My efforts here and elsewhere are about trying to get widely used projects to trend towards releasing using methods that are always predictably deterministic and long-term-auditable. That of course does not have to be with Stagex, though with my bias hat firmly in place, I can say it does make it much simpler to audit and maintain IMO. If using Debian is a hard requirement, I would suggest using scripts like https://github.com/reproducible-containers/repro-sources-list.sh or the setup I wrote here in https://git.distrust.co/public/toolchain to produce a set of locked hashes of a given snapshot of debian dependencies, then download and install these exact snapshots as I describe above. This path is complex, slow, difficult to maintain, very disk heavy, and definitely a hack to force Debian to do something it was not meant do do, but it does -work- and is my go-to when projects mandate Debian out of familiarity or other reasons. We did have this debian-locking approach to determinism audited by Cure53 as we used it in AirgapOS, though we are now moving that project to stagex to reduce maintenance burden: https://git.distrust.co/public/airgap/src/branch/main/audits/cure53-2020.pdf |
Those who ship binaries might appreciate reproducible source tarballs, but they need to take care of the binary producing part themselves.
Sure, that's ideal. But I realize we may need to make priorities and take decisions. Making it dead easy to verify source packages twenty years later is not something I think it's worth spending many brain cells on. Diminishing returns and all that. The people who are stuck on ancient versions are also probably the ones least likely to actually want to verify these things as they are clearly not very concerned...
I don't believe it is. I think we are still in a process where we asses our options and their pros and cons. |
|
Considering the (continuous) efforts necessary for keeping the pre-built autotools bits deterministic (and then verifying them), I'd risk saying that a more efficient alternative is to offer a source tarball that doesn't contain pre-generated autotools files. Those would require the builder to run A trivial way to do this is to GPG sign the tarballs created automatically by GitHub: This has the downside of missing pre-generated manual files, and building those locally requires Perl. Fixing this needs a little bit more involved solution, but much less so than the dance with reproducible OS environments, IMO. |
|
I agree that we could remove the need for a lot of this by not generating anything at all for releases. I just don't think it would benefit our users.
Sacrificing build convenience for easier tarball verification would go against what we know plenty want in favor of something virtually nobody has asked for. |
They are (for now). Last year this broke for a few days due an update they did. The fallout was large enough to have it reverted. They then guaranteed it for 1 year, meaning: Not guaranteed, but they still are reproducible. https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes/ The next best thing is cloning a specific Git hash. Which needs Git of course, and using SHA1 hashes known for collisions. Also easy to mess-up by using the unencrypted git:// protocol. Is it possible to double-check a Git clone, against a known state?
I understand. Adding that it's probably expected that cmake, or autotools might be installed already on many systems, just like (GNU) make is. Or as easy to install as the latter. But yeah, might take time till some of these becomes as ubiquitous. (Maybe worthy a survey question?) |
The people who are appreciative of using configure on older systems because it doesn't require installing anything, may be people on operating systems that don't have a C++ compiler, so no cmake. They may have a non-GNU make, or a GNU make from two decades ago. They may not have autotools at all, or have autotools from two decades ago that can't be used to regenerate the curl configure.ac -- a major stated advantage of autotools is that you can make a dist tarball on bleeding edge distros, but still have that work on very old systems without modern autotools, as long as it has a posix shell make and general utilities. Continuing to offer autotools but requiring users of autotools to generate the configure script themselves seems like a very mixed message. |
Out of curiousity, is there any other autotools compatibility requirement for Line 26 in 49f83c3
It seems a little arbitrary to take for granted both a compatible make tool and C compiler, but not a compatible autotools. Though I admit to have no data or even anecdotal info about this. I'd have expected autotools to have lived together with make and C compilers way back. CMake is much newer of course, and the minimum required 3.7 (2018-12-03), can be a limitation indeed. (even without this limitation, CMake really only seems to be going back a decade from now as 'ubiquitous'.) Either way, I understand that pre-packaging autotools stuff is something expected, period. Also those worried about the state of pre-packaged stuff (and have a compatible autotools), can always |
Both make and C are much older than autotools -- and are required by POSIX. However, I'll admit I didn't check what the minimum autotools requirement for curl is before commenting. I wonder if it's still tested against that. 🤔 xc-am-iface.m4 does offer support for "versions of automake older than 1.14" though it's not clear what the actual minimum is (nor, indeed, whether that is tested either). |
|
However, I'll admit I didn't check what the minimum autotools requirement for
curl is before commenting. I wonder if it's still tested against that. 🤔
INTERNALS.md says these are the minimums:
- GNU Libtool 1.4.2
- GNU Autoconf 2.59
- GNU Automake 1.7
- GNU M4 1.4
But, I doubt people actually test with these very often. The linux-old CI
environment (Debian stretch) contains these:
- GNU Libtool 2.4.6
- GNU Autoconf 2.69
- GNU Automake 1.15
- GNU M4 1.4.18
Those are all newer (mostly much newer) than the documented minimum, so it's
probably worth adding an autoconf build there to at least help ensure
compatibility with somewhat older versions.
|
|
Thanks for the minimums info. Dates for these:
I wonder what might be the oldest GNU C (or other brand) compiler able to compile curl. 3.0.0 to 3.3.0 were released in the above timeframe. 2.9.5 is from 1999-07-29. It'd be an interesting experiment to test for the autotools minimums (linux-old CI already revealed two CMake build issues). If there exists any re-usable online infrastructure for that. |
A C89 compliant compiler should still be able to build curl even if the compiler is from the 90s. We stick to C89 partly because of that. And if it does not due to some mistake somewhere, it can't be very important since nobody has reported it... |
It is not arbitrary. It is a design choice and how autotools always worked. Users everywhere have always been able to install software widely without having autotools themselves. You install autotools only when you want to develop software, while if you want to just build and install software autotools may not be installed (or up-to-date). Thus, suddenly asking that people should install and use autotools is wrong and will cause a lot of friction. Update: configure itself is probably also more portable and functional than the autotools themselves. So you can install curl using configure on systems where you might not be able to easily use autotools. |
090a150 to
5bf7639
Compare
|
Hey @bagder I just made some changes here
To build to run This gets you into the container based on the image you built in the previous step. Because we created a dedicated user we work in its home directory and map the host's user and group ids to the container's user. Like I said in https://github.com/curl/curl/pull/13250/files#discussion_r1559389357 it's not perfect but I'd still love your feedback here and if it's worth getting this in. Wanna give it a final look? |
|
I'm using the following script to build a test release using this Dockerfile and it works fine: version="${1:-}"
if [ -z "$version" ]; then
echo "Specify a version number!"
exit
fi
user="$(id -u):$(id -u)"
make distclean
docker build \
--build-arg SOURCE_DATE_EPOCH=$(date -u +%s) \
--build-arg UID=$(id -u) \
--build-arg GID=$(id -g) \
-t curl/curl .
run="run --rm -it -u $(id -u):$(id -g) -v $(pwd):/usr/src -w /usr/src curl/curl"
docker $run autoreconf -fi
docker $run ./configure --without-ssl --without-libpsl
docker $run make -sj8
docker $run ./maketgz $version |
|
Thanks! |
|
The tarball generator script using this Dockerfile is in #13388 |
Hey @bagder 👋 I've seen your post on mastodon on how to reproduce the release tarballs.
I wanted to bounce this off of you as an idea
This is far from reproducible builds but maybe at least it could be a cheap way forward for reproducible release tarballs.
Here is the github action running in my fork
https://github.com/daniel-j-h/curl/actions/runs/8499306747/job/23279970815
What's left to do here is
Refs