Skip to content

mk-ca-bundle.pl: make generated timestamps deterministic#20528

Closed
vszakats wants to merge 17 commits intocurl:masterfrom
vszakats:mkbundleref
Closed

mk-ca-bundle.pl: make generated timestamps deterministic#20528
vszakats wants to merge 17 commits intocurl:masterfrom
vszakats:mkbundleref

Conversation

@vszakats
Copy link
Copy Markdown
Member

@vszakats vszakats commented Feb 5, 2026

With default invocation, make generated file timestamps deterministic
by looking up (via the GitHub API) the last commit that modified
certdata.txt, along with its commit timestamp.

Also:

  • show the URL used to download certdata.txt from.
  • make ca-bundle.crt timestamp match certdata.txt's.

  • fail if the stable timestamp could not be determined, and therefore the output is not reproducible. [another time]
  • maybe just omit the timestamp from the comment when using -r? [NO, but also skip adding -r for now]
  • make the timestamp of the output the same as the input's?

@github-actions github-actions bot added the script label Feb 5, 2026
@vszakats vszakats changed the title mk-ca-bundle.pl: add option to use specific ref, with deterministic t… mk-ca-bundle.pl: add option to use specific ref, with deterministic timestamp Feb 5, 2026
@vszakats vszakats added the TLS label Feb 5, 2026
@vszakats vszakats marked this pull request as draft February 5, 2026 14:36
@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 5, 2026

It struck me that it could be an idea to change the default behavior to first check which ref is the latest in the particular tree to check, and then get that ref.

That would make us always get certs by ref and thus also get a stable date.

Does it make sense?

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Feb 5, 2026

It'd be a useful feature to detect upstream updates. Though for a deterministic
build, it's still useful (or rather: necessary) to pick a specific hash that's pinned
to the build.

I'm not sure how to detect the latest commit for a specific file in a specific
branch though. I guess there should be an API for it.

@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 5, 2026

Though for a deterministic build, it's still useful (or rather: necessary) to pick a specific hash that's pinned
to the build.

Absolutely, and I'm not arguing about that. I was just thinking that by switching to always getting a ref, we would get a date for all fetches. The difference would only be which ref we get.

I'll take a look as well.

@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 5, 2026

I'll take a look as well.

Meh, I can't figure it out but the API docs seems to imply it should be possible.

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Feb 5, 2026

This seems to be working at first sight:

curl 'https://api.github.com/repos/mozilla-firefox/firefox/commits?path=/security/nss/lib/ckfw/builtins/certdata.txt&sha=release'

@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 6, 2026

So with jq the last commit to the certdata file in the release branch:

curl 'https://api.github.com/repos/mozilla-firefox/firefox/commits?path=/security/nss/lib/ckfw/builtins/certdata.txt&sha=release' | jq '.[0].sha'

@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 6, 2026

I learned jq has a "first" builtin which makes it even simpler looking:

curl 'https://api.github.com/repos/mozilla-firefox/firefox/commits?path=/security/nss/lib/ckfw/builtins/certdata.txt&sha=release' | jq first.sha

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Feb 6, 2026

I'm trying to figure out what would be a nice way to integrate it into c4w,
with provisions for curl-container. certdata date/timestamp is the source
of pain. It either needs a GitHub-specific API all to figure it out, or a way
to omit it or set it manually via an option in mk-ca-bundle.pl.

psl is similar, but we can avoid the GH API call by downloading the
commit-specific source tarball, which has the correct timestamp in it.
Works with Codeberg/Gitea/GitLab too. Not viable for certdata, because
the tarball is 1GB.

Version detection can be done with that one curl call, and this already
works in c4w for psl. Also for certdata extended with the path/ref query
options.

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Feb 8, 2026

I ended up solving it within c4w, by reusing the existing bump/download
logic and adding the timestamp retrieval for certdata.txt. Then call
mk-ca-bundle.pl -n, which sets the header timestamp based on the
input certdata.txt file:
curl/curl-for-win@d23f024

For curl-container it'd still be useful to implement some of this within
mk-ca-bundle.pl. Details depend on how it would be best used there.

In c4w, the slight downside is departing from the curl website, which is
a widely used reference source for the CA bundle. Upside is that it makes
the dependency chain shorter and may potentially make updates quicker.

The direct way also has the limitation that it lacks support for verifying
the payload hash. This is because the payload hash is re-used as the
commit hash to address the Git content. This should be safe, but
perhaps keeping a hash would be useful here too, for good measure? [→ IMPLEMENTED]

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Feb 8, 2026

Actually another benefit of implementing some of this logic within
the script, is that it would make its output more deterministic.
Meaning, everyone running the same revision of this script with
the Mozilla repo being also at the same revision would end up with
a bit-by-bit identical ca-bundle.crt output. (assuming using the
same set of script options (or none) of course.)

@bagder
Copy link
Copy Markdown
Member

bagder commented Feb 9, 2026

everyone running the same revision of this script with
the Mozilla repo being also at the same revision would end up with
a bit-by-bit identical ca-bundle.crt output

I think it is also nicer to get the actual date of the commit/remote file into the generated ca-bundle.crt rather than the time of when the script ran.

@vszakats vszakats marked this pull request as ready for review March 7, 2026 02:06
@vszakats vszakats changed the title mk-ca-bundle.pl: add option to use specific ref, with deterministic timestamp mk-ca-bundle.pl: add option to use specific ref, make timestamps deterministic Mar 7, 2026
@vszakats vszakats changed the title mk-ca-bundle.pl: add option to use specific ref, make timestamps deterministic mk-ca-bundle.pl: make generated timestamps deterministic Mar 7, 2026
@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Mar 7, 2026

Ended up keeping it simple by just making timestamps
stable by retrieving the last certdata.txt commit and its
timestamp.

@vszakats vszakats force-pushed the mkbundleref branch 2 times, most recently from 25d36ca to 7166727 Compare March 9, 2026 11:02
@vszakats vszakats closed this in ca92e20 Mar 16, 2026
@vszakats vszakats deleted the mkbundleref branch March 16, 2026 11:00
@bagder
Copy link
Copy Markdown
Member

bagder commented Mar 27, 2026

@vszakats a downside with this approach has turned up: the date in the mozilla file is potentially a long time from the current date, and we no longer get the download time stamp in there.

The caextract web page now says "This bundle was generated at Feb 11 2026" for the certificate that was new in the bundle on March 19!

I've updated it to say "This bundle was updated by Mozilla at ..." but still. Users on the caextract page did not get this bundle until March 19. I think we should show the latter date, not the date from one month before it was accessible...

@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Mar 27, 2026

I saw that, yes. We can just revert this (curl-for-win doesn't rely on this), or move it behind an option?

I was surprised by such time difference, given that the script looks for the commit date. Does it mean Mozilla pushes to the public repo not in real-time (or close to), but only once a month / occasionally? (But even a small diff between publish and detection by curl.se may cause a different date in the archived bundle filename and the reproducible date within it)

edit: the only line to keep unconditionally is utime() at the end.

vszakats added a commit to vszakats/curl that referenced this pull request Mar 27, 2026
Behind new option `-r` for reproducible output.

Mozilla may push to its repo much later than the commit date, which can
be a source of confusion when using the reproducible timestamp (which is
determined by the commit date) by default.

Also: drop a stray variable assigned, but not used.

Reported-by: Daniel Stenberg
Bug: curl#20528 (comment)
Follow-up to ca92e20 curl#20528
vszakats added a commit to vszakats/curl that referenced this pull request Mar 27, 2026
Behind new option `-r` for reproducible output.

Mozilla may push to its repo much later than the commit date, which can
be a source of confusion when using the reproducible timestamp (which is
determined by the commit date) by default.

Example: https://curl.se/ca/cacert-2026-03-19.pem

Also: drop a stray variable assigned, but not used.

Reported-by: Daniel Stenberg
Bug: curl#20528 (comment)
Follow-up to ca92e20 curl#20528
vszakats added a commit to vszakats/curl that referenced this pull request Mar 27, 2026
Behind new option `-r`, for reproducible output.

Mozilla may push to its repo much later than the commit date, which can
be a source of confusion when using the reproducible timestamp (which is
determined by the commit date) by default.

Example: https://curl.se/ca/cacert-2026-03-19.pem

Also: drop a stray variable assigned, but not used.

Reported-by: Daniel Stenberg
Bug: curl#20528 (comment)
Follow-up to ca92e20 curl#20528
vszakats added a commit to vszakats/curl that referenced this pull request Mar 27, 2026
Behind new option `-r`, for reproducible output.

Mozilla may push to its repo much later than the commit date, which can
be a source of confusion when using the reproducible timestamp (which is
determined by the commit date) by default.

Example: https://curl.se/ca/cacert-2026-03-19.pem

Also: drop a stray variable assigned, but not used.

Reported-by: Daniel Stenberg
Bug: curl#20528 (comment)
Follow-up to ca92e20 curl#20528
vszakats added a commit that referenced this pull request Mar 27, 2026
Mozilla may push to its repo much later than the commit date, which can
be a source of confusion when using the reproducible timestamp (which is
determined by the commit date) by default. Example:

https://curl.se/ca/cacert-2026-03-19.pem vs.
https://github.com/mozilla-firefox/firefox/commits/1a84aee6387d2f9c9531c655edeea4a80aa0fcfa/security/nss/lib/ckfw/builtins/certdata.txt

This feature had no actual user (or a planned one) from within curl at
the moment, and not requested by curl users. curl-for-win does this on
its own, which is the more practical way there since everything (not
just the CA bundle) needs to be reproducible anyway. I surmise this may
be true for most if not all reproducible use-cases.

Another limitation was that it could bump into GitHub's rate limiting,
needing further updates.

Also: code had some unintented leftovers.

Reported-by: Daniel Stenberg
Bug: #20528 (comment)
Follow-up to ca92e20 #20528

Closes #21116
vszakats added a commit to vszakats/curl that referenced this pull request Mar 27, 2026
vszakats added a commit that referenced this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants