Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile for release automation #13250

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
41 changes: 41 additions & 0 deletions Dockerfile
@@ -0,0 +1,41 @@
# Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
#
# SPDX-License-Identifier: curl

# Self-contained build environment to match the release environment.
#
# Build and set the timestamp for the date corresponding to the release
#
# docker build --build-arg SOURCE_DATE_EPOCH=1711526400 --build-arg UID=$(id -u) --build-arg GID=$(id -g) -t curl/curl .
#
# Then run commands from within the build environment, for example
#
# docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/usr/src -w /usr/src curl/curl autoreconf -fi
# docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/usr/src -w /usr/src curl/curl ./configure --without-ssl --without-libpsl
# docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/usr/src -w /usr/src curl/curl make
# docker run --rm -it -u $(id -u):$(id -g) -v $(pwd):/usr/src -w /usr/src curl/curl ./maketgz 8.7.1
#
# or get into a shell in the build environment, for example
#
# docker run --rm -it -u $(id -u):$(id -g) -v (pwd):/usr/src -w /usr/src curl/curl bash
# $ autoreconf -fi
# $ ./configure --without-ssl --without-libpsl
# $ make
# $ ./maketgz 8.7.1

# To update, get the latest digest e.g. from https://hub.docker.com/_/debian/tags
FROM debian:bookworm-slim@sha256:993f5593466f84c9200e3e877ab5902dfc0e4a792f291c25c365dbe89833411f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive a docker newbie, but where does the data for this line come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find it when you pull the image:

docker pull debian:bookworm-slim
bookworm-slim: Pulling from library/debian
8a1e25ce7c4f: Pull complete
Digest: sha256:ccb33c3ac5b02588fc1d9e4fc09b952e433d0c54d8618d0ee1afadf1f3cf2455
Status: Downloaded newer image for debian:bookworm-slim
docker.io/library/debian:bookworm-slim

But also probably using docker manifest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately one might be able to use the existing curl-dev-debian image as a base (in curl ghcr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the practical difference ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this digest by going to

https://hub.docker.com/_/debian

checking tags

https://hub.docker.com/_/debian/tags

and filtering for bookworm-slim


I didn't know there is already a curl docker image; I read the original mail listing tools and operating system and simply went for debian bookworm. Happy to change this.


RUN apt-get update -qq && apt-get install -qq -y --no-install-recommends \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to deterministically install the same package versions I think we need to drop the apt-get update here? Otherwise you'll get whatever version was latest in bookworm at the time you run the docker build command. Which is not reproducibile.. Instead update the FROM clause whenever we want to bump the versions for reproducibility.

Copy link

@lrvick lrvick Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This still relies in some non-reproducible packages in debian.
  2. You are going to still need tools that don't ship with the minimal debian container
  3. Packages installed via apt-get update get purged regularly so you would need to use the debian archive
  4. The debian archive is very slow, very unreliable, and very bandwidth constrained. You also need to write a lot of custom tooling to hash-lock exact package versions or apt -will- try to grab whatever the mirror hands it.
  5. Signatures on packages pinned today will expire and break apt

I have gone all the way down this road, and realized Debian is just a non-starter for practical/maintainable reproducible builds in a container.

I am working on another PR with a stagex-based Containerfile that is deterministic today (and in the future) and full source bootstrapped.

Copy link

@lyxell lyxell Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more concrete here? It would be fine if the tool versions differ as long as they still produce the same tarball. Which of the steps taken to produce the tarballs might give different results when the tool versions differ?

Dropping apt-get update will not work since the apt cache is empty by default for the Debian Docker images.

Copy link

@lrvick lrvick Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different version of automake, autoconf, tar, zip, bzip2 etc could all end up causing different results in the future.

I just pushed a draft PR with an example that uses a "from scratch" container and an explicit set of fixed/deterministic/multi-signed dependencies:

#13338

Does not have zip support (until next week as we need to do a release cycle to package it) but otherwise should be good to go, and can be easily upgraded to build any deterministic binaries of curl if desired as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something along the lines of this?

FROM debian:bookworm-slim@sha256:993f5593466f84c9200e3e877ab5902dfc0e4a792f291c25c365dbe89833411f

RUN apt-get update -qq && apt-get install -qq -y --no-install-recommends \
    build-essential=12.9 \
    make=4.3-4.1 \
    autoconf=2.71-3 \
    automake=1:1.16.5-1.3 \
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only package versions that matter at all here are the versions of the autotools packages (autoconf, automake, libtool...).

Simply print the debian (apt) versions of this exceedingly small number of packages to a reproducible-manifest.txt inside the tarball before you tar it up. Problem solved.

@lrvick As far as rebuilding all packages from scratch goes, it appears you have reinvented the Gentoo Linux distro except NIH. Which, incidentally, also allows you to record an exact git repository hash of the Gentoo tree and guarantee all packages in your container use that and nothing else.

But all this is the mootest of moot points, because if you want to guarantee that everyone is using the exact same docker container to reproduce the tarballs, then you simply publish the container you used -- as mentioned above. This is very reproducible as you're uploading @bagder's actual (virtual) laptop used to make the release, and anyone can download @bagder's (virtual) laptop and use it to create the tarball (and check that it's the virtual laptop created in CI).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decision has already been made here so I am merely responding to some potential misconceptions above for the sake of anyone following this thread.

Simply print the debian (apt) versions of this exceedingly small number of packages to a reproducible-manifest.txt inside the tarball before you tar it up. Problem solved.

That will require the user that wishes to verify to track down all those versions, and all their dependencies, and assemble them together again. Version numbers are not good enough, as those keys expire and are rotated often. You would at a minimum need to publish the entire installed apt tree in the container with /hashes/ of every .deb file so they can be retrieved from the Debian archive and directly installed in a reproduction container. This will be very arduous for a user still without any scripted help, but it would at least be a path. I have written a few tools to help with this mentioned elsewhere. It is a terrible reproduction path, but it does work.

@lrvick As far as rebuilding all packages from scratch goes, it appears you have reinvented the Gentoo Linux distro except NIH.

Gentoo linux is not reproducible, quorum-signed, full-source-bootstrapped, or OCI native. It is true that we do in fact compile everything from source, but we publish our binaries and expect users to use those, so that makes us closer to traditional linux distros like Debian than Gentoo.

The only other deterministic full-source-bootstrapped Linux distro that exists to my knowledge is GNU Guix.They however do not publish reproducible container images, and do not do quorum signing. They are also glibc based and optimized for end user desktop use, which comes with dramatically higher attack surface and complexity than what is required with a goal of simply securely building software. Still, Guix would be the best distro to compare us to.

If you are not familiar with full-source-bootstrapping, Guix has a great writeup as they were the first distro to do it. Stagex was the second.

https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/

If you want to guarantee that everyone is using the exact same docker container to reproduce the tarballs, then you simply publish the container you used

Agreed. That is the reason to use containers, but the context of this are trying to minimize the chance of supply chain attacks. We are trying to solve for trust.

If the binary container image is compromised by the person with the publishing API key, then anything downstream from that container image is also compromised. See the XZ attack and Ken Thompson's Reflections on Trusting Trust.

You can however pull down the "musl" and "autoconf" binary stagex container images and directly use them together. You can also verify multiple people built them for you and got the same exact hash, via their PGP signatures so you don't have to build it if you trust they are not colluding. You also can -choose- to build them yourself and add your own signature if you don't trust those folks.

With Debian OTOH, they build their published docker images in a non-deterministic way and do not sign them, so you have no evidence it is free of tampering. Very much like the XZ situation. Using Docker Debian unfortunately just moves the problem up one layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is completely irrelevant, since the curl project is trying to solve reproducible curl release tarballs, not reproducible linux distros as general-purpose computing platforms.

That will require the user that wishes to verify to track down all those versions, and all their dependencies, and assemble them together again. Version numbers are not good enough, as those keys expire and are rotated often. You would at a minimum need to publish the entire installed apt tree in the container with /hashes/ of every .deb file so they can be retrieved from the Debian archive and directly installed in a reproduction container.

No, again, version numbers of the autotools set of packages alone (and not any dependencies) are sufficient since they allow one to determine the correct version of autotools packages to install from ANY source, modulo debian-specific patches publicly recorded in the Debian VCS for that small handful of packages.

No hashes necessary, no dependencies necessary, no access to .debs necessary.

It is slightly more awkward to install a local autotools toolchain with debian patches on your choice of personally trusted computing systems than to install a prebuilt .deb, but not much harder.

Also "as those keys expire and are rotated often" does not sound like a practical objection to me. If you use old stuff signed by crypto keys which are old, then naturally you backtrace the trust source for those keys, and subsequently backdate the verification routine to respect keys that claim to have expired in the past, but from the perspective of what you're reproducing, in the future.

If you are not familiar with full-source-bootstrapping, Guix has a great writeup as they were the first distro to do it. Stagex was the second.

Sure I'm familiar with it. I don't think that's the concern the curl project has. It will also be more interesting to me personally when it no longer relies on untrusted binaries such as idk, the docker daemon, guile-bootstrap, or any other potentially malicious tools used to host the computing platform that runs the full-source bootstrap.

If the binary container image is compromised by the person with the publishing API key, then anything downstream from that container image is also compromised. See the XZ attack and Ken Thompson's Reflections on Trusting Trust.

I've seen both quite well, thanks. Assuming a non-compromised docker daemon, it really is not hard to rebuild the container image using, say, https://snapshot.debian.org/ and test that your binary container image produces the same tarball. Again, the container doesn't have to be byte-for-byte reproducible, it simply has to be some form of instructions that others can follow to yield the same... curl release tarball.

No need to trust curl devs to publish a non-compromised spinoff container.

With Debian OTOH, they build their published docker images in a non-deterministic way and do not sign them, so you have no evidence it is free of tampering. Very much like the XZ situation. Using Docker Debian unfortunately just moves the problem up one layer.

Sounds like a market opportunity for someone to build deterministic debian docker images that are signed.

The underlying .deb packages are reproducible: https://tests.reproducible-builds.org/debian/reproducible.html

(Feel free to run your own builders to verify that!)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "as those keys expire and are rotated often" does not sound like a practical objection to me. If you use old stuff signed by crypto keys which are old, then naturally you backtrace the trust source for those keys, and subsequently backdate the verification routine to respect keys that claim to have expired in the past, but from the perspective of what you're reproducing, in the future.

If keys are revoked because they used weak, vulnerable, algos or are already compromised, or they were by a maintainer that is no longer trusted (like the xz maintainer) then those keys could be used to sign a malicious alternative to the package you are actually requesting, with the same version number, and apt would not know the difference. This is why hash locking is an absolute necessity when using old packages where the signatures may no longer be trustworthy.

It really is not hard to rebuild the container image using, say, https://snapshot.debian.org/

This actually is surprisingly hard. Debian shapshots are hosted by a single deployment that is most of the time crippled to dialup speeds in my experience, and the experience of virtually every user that has used projects of mine that relied on it. Also, for the above reasons about not being able to trust expired/revoked keys, you have to pull the .deb files manually without installing, hash verify them, then install from that folder as a local mirror. For these two reasons, in the many reproducible build projects I worked with based on Debian, we all ended up forced to use our own git LFS mirrors of .deb packages which was a major pain.

It is slightly more awkward to install a local autotools toolchain with debian patches on your choice of personally trusted computing systems than to install a prebuilt .deb, but not much harder.

To be fair to this point, for your very specific use case here, you could probably get away with using generic latest debian and then build your own autotools, and then use your own autotools to build the tar. That is of course if any signed/trusted/reproducible debian base images actually existed :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If keys are revoked because they used weak, vulnerable, algos or are already compromised, or they were by a maintainer that is no longer trusted (like the xz maintainer) then those keys could be used to sign a malicious alternative to the package you are actually requesting, with the same version number, and apt would not know the difference. This is why hash locking is an absolute necessity when using old packages where the signatures may no longer be trustworthy.

Your logical fallacy of the day is: "Moving the goalposts".

Now that we have subset the original stated problem from "it is policy to regularly expire and rotate keys" down to "the key turned out to be weak or was actively compromised", it turns out the problem is very manageable and everyone has been managing precisely this for years.

This actually is surprisingly hard. Debian shapshots are hosted by a single deployment that is most of the time crippled to dialup speeds in my experience, and the experience of virtually every user that has used projects of mine that relied on it.

It does indeed sound very unfortunate that the Debian project lacks robust resources and using Debian services is slow.

Someone should sponsor infrastructure or something... however, "reproducibility is slow because slow server" is not exactly proof that it is unreproducible.

Also, for the above reasons about not being able to trust expired/revoked keys, you have to pull the .deb files manually without installing, hash verify them, then install from that folder as a local mirror. For these two reasons, in the many reproducible build projects I worked with based on Debian, we all ended up forced to use our own git LFS mirrors of .deb packages which was a major pain.

This sounds very suspicious because it is not how the Debian repository format works. I don't know whether you've studied the Debian policy documentation for this, but a Debian repository consists of a cryptographically signed root manifest that securely (we hope) verifies sha256/512-hashed sub-manifests, which in turn record a sha256/512-hashed set of .debs.

As long as you can securely verify the root manifest you have a full chain of trust. If for some reason you determine that the PGP signature isn't reliable for this purpose, you can save the sha512 or blake2 hash of the root manifest, and guarantee that unless an attack is found that permits simultaneously forging the md5, sha1, sha256, and sha512 for the same malicious file, the single file you saved hashes for has covered all .debs for that Debian release.

You should probably be able to just sideload your own known-good release file and apt will just use that and recursively handle all the rest for you. And there are various caching proxies you could use to accelerate it with a local mirror.

That is of course if any signed/trusted/reproducible debian base images actually existed :/

I'll reiterate my philosophical musing about how it would be great if someone who was passionate about this and interested in working with existing communities would offer to help make this a reality. Again, the packages that go into an image are reproducible, so most of the work is already done and the rest is just "whatever is needed regardless of distro".

build-essential make autoconf automake libtool git perl zip zlib1g-dev gawk && \
rm -rf /var/lib/apt/lists/*

ARG UID=1000 GID=1000

RUN groupadd --gid $UID dev && \
useradd --uid $UID --gid dev --shell /bin/bash --create-home dev

USER dev:dev

ARG SOURCE_DATE_EPOCH
ENV SOURCE_DATE_EPOCH=${SOURCE_DATE_EPOCH:-1}