Skip to content

feat(dkim): hand-rolled RFC 6376 verifier (PR 2 of email-recovery stack)#3839

Closed
sea-snake wants to merge 21 commits into
dfinity:mainfrom
sea-snake:feat/dkim-verifier
Closed

feat(dkim): hand-rolled RFC 6376 verifier (PR 2 of email-recovery stack)#3839
sea-snake wants to merge 21 commits into
dfinity:mainfrom
sea-snake:feat/dkim-verifier

Conversation

@sea-snake
Copy link
Copy Markdown
Contributor

@sea-snake sea-snake commented May 5, 2026

Summary

PR 2 of the email-recovery stack (docs/ongoing/email-recovery.md §10 Phase 0). Stacks on top of PR 3838 (DNSSEC verifier). Lands a hand-rolled RFC 6376 DKIM verifier that consumes a parsed SmtpRequest plus an already-trusted DKIM TXT record and returns a per-step EmailVerificationStatus.

Note: This PR targets main but includes PR 3838's commits (DNSSEC verifier) as its base. Review the DKIM-specific changes by looking at commits after 9bbd8717 (the last PR 3838 commit). Once PR 3838 merges, this PR's diff will shrink to just the DKIM additions.

Why hand-rolled

The design originally specified mail-auth (Stalwart's well-tested DKIM library), but mail-auth pulls a non-optional hickory-resolver dep that fails to compile for wasm32-unknown-unknown (transitive: tokio + mio). Forking + patching mail-auth would be possible but creates perpetual rebase burden. We hand-roll instead — "the right way, no shortcuts" was the explicit guidance.

What's in this PR

src/internet_identity_interface/src/internet_identity/types/smtp.rs

Brings forward the SMTP gateway protocol types from PoC PR 3760: SmtpRequest/SmtpResponse/SmtpHeader/SmtpMessage/SmtpAddress/SmtpEnvelope, the size bounds, and the input-bound validation (format_address lowercases both halves; truncate_at_char_boundary clamps to the previous UTF-8 boundary so a multi-byte subject can't trap the canister). Drops postbox-specific bits (PostboxEmail, ValidatedSmtpRequest, anchor-number parser).

src/internet_identity/src/dkim/

  • types.rs — Algorithm (RsaSha256, Ed25519Sha256), HeaderCanon/BodyCanon (Relaxed, Simple), DkimCheck/DkimCheckName/DkimCheckStatus per-step diagnostics, EmailVerificationStatus / VerificationFailReason result shape.
  • parse.rs (RFC 6376 §3.5) — DKIM-Signature header tag-list parser. Splits structurally on ; first then on the first = per element, so a literal b= substring inside another tag's base64 doesn't get misread as a new tag start (the bug class the PoC PR review specifically flagged). Folded whitespace inside base64 values is stripped before decoding. Tag names case-insensitive; duplicates rejected.
  • canonicalize.rs (§3.4.2 / §3.4.4) — relaxed header canon (lowercase name, unfold continuations, collapse WSP+ to single SP, strip trailing WSP, strip WSP around colon) and relaxed body canon (per-line WSP cleanup, drop trailing empty lines, ensure non-empty output ends in exactly one CRLF).
  • dns_record.rs (§3.6.2) — DKIM TXT record parser. Tag names case-insensitive (P= vs p= was a PoC bug), whitespace inside p= tolerated (multi-chunk DNS TXT records), t=y/t=s flags honoured, unknown tags ignored.
  • signature.rs — RSA-SHA256 (RFC 5702 / RFC 8301) and Ed25519-SHA256 (RFC 8463) signature verification on top of rsa+sha2+ed25519-dalek from PR 1's deps. Enforces 1024-bit RSA minimum per design §5.6. Ed25519 path wraps in SHA-256 per RFC 8463. Plus body_hash_sha256 with optional l= truncation per §3.4.5.
  • verify.rs — orchestration. Multi-signature loop per §5.5 (accept on first pass), tag enforcement per design §5.4 (c=relaxed/* only, x= expiration, i= alignment with d=, k= match, t=y testing-mode), bottom-up header selection per §5.4 when h= lists a name multiple times, b=value blanking that's structural-position-aware so it doesn't mis-target an internal substring.
  • test_vectors.rs#[cfg(test)] .eml loader + 8 end-to-end tests against committed fixtures.

test_vectors/dkim/

  • 3 synthetic .eml files generated offline with dkimpy + a 2048-bit RSA key (relaxed/relaxed, relaxed/simple, simple/simple).
  • The matching DKIM TXT record (public key only).
  • README documenting provenance — the throwaway private key is not committed.

Test plan

  • cargo check -p internet_identity --target wasm32-unknown-unknown — clean.
  • cargo test -p internet_identity --bin internet_identity dkim — 75 tests pass (parse 14, canonicalize 18, dns_record 16, signature 7, verify 12, end-to-end 8).
  • cargo test -p internet_identity --bin internet_identity — 313 tests pass total (was 238 before this PR; +75 DKIM, plus a few in smtp types).
  • cargo test -p internet_identity_interface --lib — 52 tests pass (was 42; +10 SMTP type tests).
  • cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings — clean.
  • cargo fmt --check — clean (modulo pre-existing diffs unrelated to this PR).

Stack

This is PR 2 of a 12-PR series. Includes PR 3838's commits as its base; once PR 3838 merges, the diff shrinks to just the DKIM additions.

Subsequent PRs:

  • PR 3 — DMARC alignment.
  • PR 4 — DoH outcall fallback for unsigned domains (Gmail / Outlook / iCloud — see the design doc §7.6 and the team Slack writeup).
  • PRs 5–9 — storage + Candid + behavior for email recovery.
  • PRs 10–12 — frontend.

PR Stack

# PR Description Status
0 #3836 Design doc Open
1 #3838 DNSSEC verifier scaffold Open
2 #3839 DKIM verifier (RFC 6376) Open
3 #3840 DMARC alignment (RFC 7489) Open
4 #3841 DoH fallback Open
5+6 #3842 Setup flow (storage + smtp_request) Open
7 #3843 Recovery flow (delegation) Open
8 #3844 Frontend + feature flag Open
9 #3855 Deploy/upgrade scripts: dnssec_config + doh_config Open
10 #3857 Email-recovery UX overhaul Open

sea-snake added 5 commits May 4, 2026 14:57
…iring

First PR in the email-recovery stack (docs/ongoing/email-recovery.md §10
Phase 0). Lands the structural pieces of the DNSSEC verifier so PR dfinity#2
(DKIM verifier) and PR dfinity#4-dfinity#9 (storage + recovery methods) can build
against the types. Cryptographic verification logic and real DoH-captured
test vectors arrive in PR #1b.

What's in this PR:

- New workspace crate internet_identity_email_test_vectors with a
  placeholder loader and a README explaining what arrives in PR #1b
  (DNSSEC chains, DKIM happy-path + tampering vectors, DMARC alignment
  matrix).
- New DnssecConfig and DnssecRootAnchor types in
  internet_identity_interface, exposed at the top of InternetIdentityInit
  as 'dnssec_config'. Not specific to email recovery — any feature that
  verifies DNS records against the IANA-rooted DNSSEC chain consumes the
  same anchors.
- New dnssec/ module under src/internet_identity/src/ with type
  definitions (DnsProofBundle, SignedRRset, DelegationLink, Rrsig,
  DnsName, DnssecError, VerifiedRecord) and a stub 'verify' that returns
  Err(DnssecError::NotImplemented). Step-by-step TODOs reference §7.3 of
  the design doc.
- Trust-anchor list plumbed through init/post_upgrade into
  PersistentState.dnssec_config (and through StorablePersistentState for
  cross-upgrade persistence).
- Two unit tests for the stub verifier (NoTrustAnchors path,
  NotImplemented path) — flip to positive assertions in PR #1b.
- internet_identity.did updated with DnssecConfig / DnssecRootAnchor and
  the new init field.

What's deferred to PR #1b:

- RRSIG / DS / DNSKEY canonicalization and signature verification per
  RFC 4034 §6.
- Crypto deps for ECDSA-P256-SHA256 (alg 13) and Ed25519 (alg 15). RSA-
  SHA256 (alg 8) deps already in the workspace.
- Real DoH-captured DNSSEC chains for gmail.com, icloud.com,
  outlook.com, fastmail.com, proton.me, plus deliberately-tampered
  negatives.

Build: cargo check --target wasm32-unknown-unknown clean; 227 internet_identity
bin tests pass; 42 internet_identity_interface unit tests pass.
…scripts

The Dockerfile pre-builds workspace dependencies by COPY-ing each crate's
Cargo.toml and stubbing its lib.rs. Adding internet_identity_email_test_vectors
to Cargo.toml's workspace members without updating these hardcoded lists
causes 'docker-build-base' to fail at the cargo manifest discovery step.

Also add the new crate to the BACKEND_PATHS lists in
.github/actions/release/run.sh and scripts/make-upgrade-proposal so it
is included in release tarballs alongside canister_tests (also test-only).
- Drop the internet_identity_email_test_vectors crate. Test vectors will
  live as plain files at the repo root in PR 1b (still TBD which path);
  no separate crate needed for what is just data-on-disk shared between
  internet_identity unit tests and canister_tests integration tests via
  include_bytes! / fs::read.
- Revert the Dockerfile + release-script entries that registered the
  now-removed crate.
- Switch InternetIdentityInit.dnssec_config from Option<DnssecConfig>
  to Option<Option<DnssecConfig>> to match the same set/clear pattern
  as analytics_config and dummy_auth (per Copilot review). Outer None
  keeps the previously stored value; Some(None) clears; Some(Some(c))
  sets. Avoids a future breaking Candid change if operators ever need
  to detach trust anchors. Updates the consumer in apply_install_arg
  and the round-trip in config(), plus the .did declaration.
- Narrow the dnssec/mod.rs allow from `#![allow(dead_code,
  unused_imports)]` to just `#![allow(dead_code)]`, with per-item
  `#[allow(unused_imports)]` on the two re-exports that need it.
  This keeps unused_imports active so a real issue isn't masked when
  the verifier implementation lands in PR 1b.
Replaces the stub verify() with a working four-step DNSSEC validator
per docs/ongoing/email-recovery.md §7.3:

  1. Validate the bundle's root DNSKEY RRset against a configured trust
     anchor (matches digest of one DNSKEY KSK, then verifies the RRSIG
     covering the entire root DNSKEY RRset under that key).
  2. Walk the delegation chain top-down: each link's DS RRset verifies
     under the parent's DNSKEY, the child's DNSKEY RRset is self-signed
     by a KSK whose digest matches one of the parent's DS records.
  3. Verify the leaf RRset's RRSIG under the deepest zone's DNSKEY.
  4. Freshness check: every RRSIG's [inception, expiration] window must
     contain now ± 60s.

Algorithm coverage (RFC 8624 MUST):
  * 8 — RSA-SHA256, RFC 5702 (root, com, most legacy zones)
  * 13 — ECDSA-P256-SHA256, RFC 6605 (most TLDs, Cloudflare)
  * 15 — Ed25519, RFC 8080
Anything else is rejected with UnsupportedAlgorithm.

New deps: domain (NLnet Labs primitives — currently used for the
docstring/RFC reference frame; signature verification is hand-rolled
on RustCrypto), p256, ed25519-dalek. All wasm32-compatible.

New files:
  - src/internet_identity/src/dnssec/canonical.rs — owner-name
    canonicalization, RR canonical form, RRSIG signed-data
    construction (RFC 4034 §3.1.8.1, §6.2, §6.3), DS digest input.
  - src/internet_identity/src/dnssec/signature.rs — algorithm
    dispatch + DS digest matching.
  - src/internet_identity/src/dnssec/test_vectors.rs — cfg(test)
    JSON loader.
  - test_vectors/dnssec/cloudflare-com-2026-05.json — captured DoH
    chain for cloudflare.com TXT (root → com → cloudflare.com).
    Exercises both alg 8 and alg 13.
  - test_vectors/dnssec/iana-root-anchors-2026-05.json — IANA root
    KSK trust anchors (Klajeyz/2017 + Kmyv6jo/2024).
  - scripts/capture-dnssec-chain.py — reproducible DoH capture script.

Modified:
  - dnssec/types.rs: drop NotImplemented variant (now obsolete).
  - dnssec/verify.rs: replace stub with the real implementation; add
    7 unit tests covering happy path + 6 negative cases.
  - Cargo.toml + internet_identity/Cargo.toml: new deps.

Tests: 238 internet_identity bin tests pass (227 prior + 11 new dnssec
tests). Wasm32 build clean; full workspace cargo check clean (modulo the
unrelated frontend dist/ requirement).

Note on test data lifetime: the captured RRSIGs' validity windows expire
~2-4 weeks after capture. The tests use a 'frozen now' read from the
capture's _meta.captured_at_unix so freshness checks remain stable.
Re-run scripts/capture-dnssec-chain.py to refresh the captures when the
expiration approaches.
The clippy job flagged `#![cfg(test)]` on test_vectors.rs as a
duplicated attribute (the module declaration in dnssec/mod.rs already
gates it with `#[cfg(test)]`). Removed the redundant inner gate.

Also applied rustfmt to the four new dnssec module files so cargo-fmt
stays green.
Copilot AI review requested due to automatic review settings May 5, 2026 09:20
@sea-snake sea-snake requested a review from a team as a code owner May 5, 2026 09:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is part of the email-recovery stack and introduces a hand-rolled DKIM (RFC 6376) verifier that operates on the SMTP gateway’s parsed SmtpRequest plus an already-trusted DKIM TXT record, alongside the supporting DNSSEC verifier/config plumbing and committed fixtures used by tests.

Changes:

  • Add DKIM verification implementation (parsing, canonicalization, body-hash, signature verification, orchestration) plus end-to-end fixture-based tests.
  • Introduce DNSSEC verification types/verifier and test-vector loaders, plus trust-anchor configuration persisted across upgrades and exposed via Candid/init args.
  • Add SMTP gateway protocol wire types + input-bound validation, and commit DNSSEC/DKIM test vectors and capture tooling.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test_vectors/dnssec/iana-root-anchors-2026-05.json Adds root trust-anchor fixture used by DNSSEC verifier tests.
test_vectors/dnssec/cloudflare-com-2026-05.json Adds a captured DNSSEC chain fixture for end-to-end DNSSEC verification tests.
test_vectors/dkim/synth-rsa-test1._domainkey.test.example.com.txt Adds DKIM TXT public key fixture for DKIM end-to-end tests.
test_vectors/dkim/synth-rsa-simple-simple.eml Adds a DKIM-signed email fixture exercising c=simple/simple rejection.
test_vectors/dkim/synth-rsa-relaxed-simple.eml Adds a DKIM-signed email fixture exercising relaxed header + simple body.
test_vectors/dkim/synth-rsa-relaxed-relaxed.eml Adds a DKIM-signed email fixture exercising relaxed/relaxed canonicalization.
test_vectors/dkim/README.md Documents DKIM fixture set and provenance/regeneration approach.
src/internet_identity/src/storage/storable/storable_persistent_state.rs Persists dnssec_config through stable storage and updates defaults/tests.
src/internet_identity/src/state.rs Adds dnssec_config to PersistentState with upgrade-survival semantics.
src/internet_identity/src/main.rs Wires in dkim/dnssec modules and plumbs dnssec_config through init/apply logic.
src/internet_identity/src/dnssec/verify.rs Implements DNSSEC bundle verification (root anchor → chain walk → leaf + freshness).
src/internet_identity/src/dnssec/types.rs Defines DNSSEC bundle/rrset types and verifier error taxonomy.
src/internet_identity/src/dnssec/test_vectors.rs Adds JSON test-vector loader for DNSSEC bundles/anchors.
src/internet_identity/src/dnssec/signature.rs Implements DNSSEC signature verification for alg 8/13/15 and DS matching.
src/internet_identity/src/dnssec/mod.rs Declares DNSSEC module structure and exports verifier entry points/types.
src/internet_identity/src/dnssec/canonical.rs Implements canonical encoding helpers for signed-data and DS digest input.
src/internet_identity/src/dkim/verify.rs Orchestrates DKIM verification across multiple signatures and emits per-step checks.
src/internet_identity/src/dkim/types.rs Defines DKIM algorithms/canonicalization enums, checks, and result types.
src/internet_identity/src/dkim/test_vectors.rs Adds DKIM .eml loader + end-to-end tests against committed fixtures.
src/internet_identity/src/dkim/signature.rs Implements RSA-SHA256 and Ed25519-SHA256 DKIM signature verification + body hashing.
src/internet_identity/src/dkim/parse.rs Parses DKIM-Signature tag lists with duplicate rejection and robust splitting.
src/internet_identity/src/dkim/mod.rs Declares DKIM module layout and exports public API/types.
src/internet_identity/src/dkim/dns_record.rs Parses DKIM DNS TXT records (tags, flags, key material) with tolerances.
src/internet_identity/src/dkim/canonicalize.rs Implements relaxed header/body canonicalization routines used by DKIM verification.
src/internet_identity/internet_identity.did Extends Candid with DnssecConfig/DnssecRootAnchor and dnssec_config init field.
src/internet_identity/Cargo.toml Adds crypto/DNSSEC-related dependencies to the canister crate.
src/internet_identity_interface/src/internet_identity/types/smtp.rs Introduces SMTP gateway request/response types and bounds-based validation helpers.
src/internet_identity_interface/src/internet_identity/types/dnssec.rs Adds Candid types for DNSSEC trust-anchor configuration.
src/internet_identity_interface/src/internet_identity/types.rs Re-exports new dnssec/smtp modules and adds dnssec_config to init args.
scripts/capture-dnssec-chain.py Adds tooling to capture DNSSEC proof bundles from DoH into repo fixtures.
Cargo.toml Adds workspace dependencies for DNSSEC verifier crypto crates.
Cargo.lock Locks newly introduced dependencies and transitive updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/internet_identity/src/dkim/verify.rs Outdated
Comment thread src/internet_identity/src/dkim/test_vectors.rs Outdated
Comment thread src/internet_identity_interface/src/internet_identity/types/smtp.rs Outdated
Comment thread scripts/capture-dnssec-chain.py Outdated
Comment thread scripts/capture-dnssec-chain.py Outdated
Comment thread src/internet_identity/Cargo.toml Outdated
- Drop unused domain workspace dep. The dep was added during planning when
  we considered using NLnet Labs' domain crate for canonicalization, but
  the verifier ended up hand-rolling everything (canonicalize_name,
  rrsig_rdata_for_signing, rr_canonical, ds_digest_input) on top of
  rsa/sha2/p256/ed25519-dalek alone. Removing it saves a 66-line
  Cargo.lock entry and shrinks dependency surface to no actual cost.
- scripts/capture-dnssec-chain.py docstring example: rename name/rdata/
  signature to name_hex/rdata_hex/signature_hex to match what the script
  actually emits (and what the canister-side test loader expects).
- scripts/capture-dnssec-chain.py --resolver help: was 'cloudflare-dns.com'
  but the actual default is https://1.1.1.1/dns-query. Make the help
  text match the default.
- Replace magic numbers across the dnssec module with named
  constants, co-located inside their using function when single-use
  and hoisted to wire.rs only when shared across modules. Every
  RDATA field offset and length now traces back to a named RFC
  reference.

- Beef up RFC citations on every public item: doc comments reference
  specific RFC 4033 / 4034 / 4035 / 5702 / 6605 / 8080 / 8624 sections
  rather than handwaving at "DNSSEC". Algorithm numbers in narrative
  text use the algorithm name (RSA-SHA256, ECDSA-P256-SHA256,
  Ed25519) instead of the bare IANA number.

- Add real-data test coverage for the four email-recovery target
  zones and the Ed25519 algorithm:
    * proton.me      — RSA-SHA256 end-to-end
    * protonmail.com — ECDSA-P256-SHA256 leaf
    * tutanota.com   — ECDSA-P256-SHA256 leaf
    * ed25519.nl     — Ed25519 leaf (closes the alg-15 real-data gap)

  Brings real-data algorithm coverage to the full RFC 8624 MUST set.
  Tests count: 243 (was 238).

- Remove scripts/capture-dnssec-chain.py — the captures committed in
  this change cover every algorithm and every email-recovery target
  the design doc names, so the script's job is done. Re-capture on
  root KSK rollover will be a one-off ad-hoc effort if it ever
  becomes necessary.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from ec371aa to 96b2e42 Compare May 11, 2026 14:33
Three small fixes that lived on feat/doh-fallback as part of a
combined "Copilot review" commit (516c48d) properly belong in
PR 1 since they all concern the dnssec scaffold:

- verify.rs: handle the multi-anchor case correctly. During a KSK
  rollover the operator configures both the rolling-out and
  rolling-in KSKs (RFC 5011 §2). The old "return on first digest
  match" strategy could short-circuit onto the inactive anchor and
  never try the active one. Track digest-match state, try every
  candidate, and surface the cryptographic failure (not the generic
  RootAnchorMismatch) when at least one anchor matched but no
  signature verified.

- mod.rs: doc-comment correctness. The verifier itself makes no
  outcalls, but the email-recovery stack uses DoH for unsigned
  domains (PR 4). The previous wording overclaimed.

- iana-root-anchors-2026-05.json: corrected stale "_comment" that
  said the historical 19036 KSK was included. It isn't, and
  shouldn't be — production deployments should configure 20326 +
  38696 only.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from 96b2e42 to 0a87f73 Compare May 11, 2026 14:36
Three improvements that lived as commits on later branches in the
email-recovery stack are scaffold-level — they belong in PR 1 with
the rest of the verifier, not buried in feature PRs that happen to
need them. Back-port them here so PR 1 ships a complete scaffold
that downstream feature PRs can build on without extending the
verifier surface in passing.

- Two-phase API split (originally on feat/email-recovery-storage-and-smtp
  as cfcc99e): expose `verify_root_dnskey_with_clock`,
  `verify_chain_with_clock`, `verify_extra_chains_with_clock`,
  and `verify_hops_with_clock` as standalone entry points alongside
  the top-level `verify_bundle_with_clock`. Two-phase callers
  (prepare → submit-leaf) can now cache a validated `ZoneKeysMap`
  across calls and only validate the new chains a follow-up
  submission crossed into, avoiding 3-5 RSA verifies per call.

- Multi-zone + CNAME-aware bundle shape (originally feat/email-recovery-flow
  as 43a466e): real DKIM resolution often crosses zone boundaries via
  CNAME (proton.me → proton.ch, tutanota.com → tutanota.de, M365 custom
  domains). Replace the old single-chain `DnsProofBundle { root_dnskey,
  chain, leaf }` with `DnsProofBundle { root_dnskey, chains, hops }`
  where each chain pins a signing zone in the new `ZoneKeysMap` and
  each hop is verified under whichever zone its `RRSIG.signer_name`
  identifies. Adds CNAME-chain coherence checks (first owner matches
  requested name, intermediates are CNAMEs whose target equals the
  next owner, no loops, ≤ MAX_CNAME_HOPS = 4) and four new error
  variants (DuplicateZone, UnknownSigningZone, HopOwnerOutsideZone,
  BadCnameChain, TooManyHops).

- DNSKEY-RRset RRSIG fix (originally feat/email-recovery-flow as 6788020):
  loosen step 2c of `verify_link` to verify the child DNSKEY rrset
  against any key in itself, not just the DS-pinned KSK. Real-world
  zones (proton.me, proton.ch, …) publish a DNSKEY rrset signed by
  both the KSK and the ZSK, and resolvers return whichever RRSIG
  comes first; resolving this on the FE side isn't generally possible.
  Step 2b still pins the DS-referenced KSK as part of the rrset, so
  the chain of trust is intact regardless of which RRSIG we verify.

Test-vector JSONs migrated to the new `{root_dnskey, chains, hops}`
shape. Five new tests cover the new failure modes (duplicate zone,
too many hops, hop owner mismatch, wrong requested type, subdomain
boundary). All five real-data chains still verify end-to-end.

Counts: 248 tests pass (was 243); cargo clippy clean with -D warnings;
wasm32 build clean.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from 0a87f73 to 46bb312 Compare May 11, 2026 14:53
Two-phase consumers (prepare → submit-leaf in the email-recovery
flow, landing in PR 5+6) need to inspect the validated zones after
the chain walk — typically to extract the single zone DNSKEY when
the skeleton bundle is single-zone, or to enumerate the map when
caching it across calls. Adding a read-only iterator is the
minimum surface change to support both.

Insertion order matches the order delegation chains were verified
in, which is the order callers supplied them.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from 46bb312 to 5415f58 Compare May 11, 2026 15:14
sea-snake added 2 commits May 11, 2026 16:00
Minor-version bumps inside the 0.9 line — semver-compatible, no
API changes. Picks up the defensive validations added in 0.9.7
(`RsaPrivateKey::from_components` always validates keys, PKCS#1 v1.5
no longer panics on tiny keys) and 0.9.8.

Does not address RUSTSEC-2023-0071 (Marvin Attack timing
sidechannel) — there's still no patched version on the 0.9.x line
and 0.10.0 is in -rc. Note that the Marvin Attack is a private-key
recovery attack on RSA decryption/signing; this canister only does
`RsaPublicKey::verify(...)` of caller-supplied bytes, so the
threat doesn't apply. We can document that with a `cargo audit`
ignore-with-justification when we add the audit step.

ed25519-dalek 2.2.0 and p256 0.13.2 are already on their latest
0.x.y stable.
Carries forward the wire-format Candid surface the off-chain SMTP gateway
already targets in the PoC PR, slimmed to just what PR 2's DKIM verifier
and PR 8's smtp_request dispatch will need:

- SmtpRequest / SmtpResponse / SmtpHeader / SmtpMessage / SmtpAddress /
  SmtpEnvelope, plus the size bounds and SMTP error codes.
- Validation helpers: validate_envelope, validate_message, validate_smtp_request.
- format_address (lowercases both halves so envelope casing can't bypass
  per-anchor lookups) and truncate_at_char_boundary (fallback to previous
  UTF-8 boundary, avoiding the multi-byte panic the PoC review flagged).

Drops the postbox-specific bits (PostboxEmail, ValidatedSmtpRequest, the
to.user → anchor parser) — they don't fit the email-recovery design, see
docs/ongoing/email-recovery.md §2 non-goals.

10 unit tests for bounds, address normalisation, and char-boundary
truncation. Wasm32 build unaffected.
sea-snake added 8 commits May 11, 2026 16:00
First piece of the hand-rolled DKIM verifier. Going hand-rolled (rather
than depending on stalwartlabs/mail-auth) because mail-auth pulls a
non-optional hickory-resolver dep that fails to compile for
wasm32-unknown-unknown — we'd need to fork+patch with perpetual rebase
burden to use it.

This commit lands:

- src/internet_identity/src/dkim/types.rs: Algorithm (RsaSha256,
  Ed25519Sha256), HeaderCanon / BodyCanon (Relaxed, Simple), DkimCheck /
  DkimCheckName / DkimCheckStatus per-step diagnostics, and the public
  EmailVerificationStatus / VerificationFailReason result shape.
- src/internet_identity/src/dkim/parse.rs: structural tag-list parser
  for the DKIM-Signature header value. Splits on ';' first, then on the
  first '=' per element, so a literal 'b=' substring inside another
  tag's base64 doesn't get misread as the start of a new tag — that
  was the concrete bug the PoC PR review flagged. Folded whitespace
  inside base64 values is stripped before decoding. Tag names are
  case-insensitive; duplicate tags are rejected.
- mod.rs scaffolding wired into main.rs. dnssec module remains in place.

14 unit tests cover: minimal required tags, case-insensitive tag names,
folded whitespace, the b=-inside-bh= antipattern, v != 1, duplicate
tags, unsupported algorithms (rsa-sha1), ed25519-sha256, l/t/x tags,
explicit i= override, default canon, empty h=, missing required tags,
malformed base64.

Wasm32 build clean; full II suite still passes.
Implements the relaxed canonicalisation algorithms — the only ones our
DKIM verifier accepts on the header side (the parsed-pair gateway
contract precludes byte-exact 'simple/*' header canonicalization, see
design doc §5.2).

Header (§3.4.2): lowercase the name; unfold continuation lines (CRLF +
WSP -> WSP); collapse runs of WSP within the value to a single SP;
strip trailing WSP from the value; strip WSP around the colon. Single-
pass implementation that handles all five steps in one walk.

Body (§3.4.4): per-line WSP cleanup (collapse runs, strip trailing);
drop empty lines from the end of the body; ensure non-empty output
ends in exactly one CRLF; empty input maps to empty output (no
synthetic trailing CRLF).

18 unit tests cover both algorithms, including the edge cases that
trip up naïve implementations: empty bodies, bodies with only
whitespace, bodies without a trailing CRLF, internal empty lines that
must NOT be stripped, header values that are entirely whitespace, and
folded continuations with both space and tab continuations.
Parses the published DKIM key record into a DkimDnsRecord:
- key_type: rsa (default) or ed25519
- public_key: base64-decoded p= value (empty = revoked)
- testing: t=y flag
- strict_auid: t=s flag

Tag handling per §3.6.2.1 / §3.6.2.2:
- v=DKIM1 if present must be first; absent is acceptable.
- Tag names are case-insensitive (the PoC PR review specifically flagged
  P= versus p= mismatch as a real-world bug).
- Whitespace inside p= is tolerated — DNS TXT records can be split
  across multiple <character-string>s and may carry stray WSP at chunk
  boundaries.
- Duplicate tags are rejected (defence-in-depth even though the RFC
  technically allows shadowing).
- t= flag list is colon-separated; we honour 'y' and 's', ignore others.
- Unknown tags are silently dropped per RFC.

16 unit tests cover defaulting, both key types, both t= flags
individually and combined, malformed inputs (missing p, v not first,
unsupported v/k), whitespace inside p, empty p, and key-type/algorithm
matching.

Wasm32 build remains clean.
Both algorithms RFC 8624 specifies as MUST for DKIM (RFC 8301 + RFC 8463),
no others. Reuses rsa, sha2, and ed25519-dalek that are already in the
workspace from PR 1.

RSA path (RFC 5702 / RFC 6376 §3.3.1):
- DKIM publishes RSA keys in SubjectPublicKeyInfo (DER), so we use
  RustCrypto's DecodePublicKey::from_public_key_der which handles the
  X.509 + PKCS#1 wrapping.
- Enforces RSA_MIN_KEY_BITS = 1024 per design §5.6 (planned lift to
  2048 once telemetry confirms zero impact).
- Sanity-checks signature length == modulus length before invoking
  RSASSA-PKCS1-v1_5 verify, so a structural mismatch surfaces as a
  clear MalformedSignature instead of an opaque crypto error.

Ed25519 path (RFC 8463):
- Public key is 32 raw bytes (no SPKI wrapping), signature is 64 bytes.
- Wraps the input in SHA-256 ourselves before calling ed25519-dalek's
  pure-Ed25519 verify, since RFC 8463 specifies signing the SHA-256 of
  the canonical header hash input rather than the input directly.

Plus body_hash_sha256 (the bh= side): SHA-256 of the canonical body,
optionally truncated to l= bytes per RFC 6376 §3.4.5. Truncation caps
at the body length so an l= larger than the body just uses the whole
thing.

7 unit tests cover algorithm/key-type mismatches, malformed RSA keys
(non-SPKI bytes), wrong-length Ed25519 keys / signatures, and the body
hash with and without l=. Wasm32 build remains clean.
Ties parse + canonicalize + dns_record + signature into the public
`verify(email, dkim_txt, now_secs)` API. Implements:

- Multi-signature loop per RFC 6376 §5.5: try every DKIM-Signature
  header in order, accept on first pass, return Unverified with the
  most recent reason if all fail.
- Tag enforcement per design §5.4:
  - c=simple/* on header side rejected with UnsupportedCanonicalization
  - x= expiration check
  - i= alignment with d= (subdomain-permissive unless DNS t=s set)
  - DNS k= must match signature a=
  - DNS t=y testing-mode -> Unverified(TestingMode)
- Header bottom-up selection per RFC 6376 §5.4 — when h= lists a name
  multiple times we pick distinct occurrences walking from the bottom.
- DKIM-Signature header itself canonicalised with b=value blanked,
  no trailing CRLF, per §3.7. The b= blanker is structural — only
  blanks at top-level tag positions, so a literal 'b=' substring
  inside another tag's base64 is never mistargeted (the bug class
  the PoC PR review flagged).
- Per-signature DkimCheck breakdown so a UI can render which step
  failed.

12 unit tests cover: AUID alignment (exact, subdomain, evil-suffix,
local-only), b= blanking (simple, bh-not-blanked, b-at-start, no-
internal-substring-misblank), no signature, c=simple rejection,
expired signature, misaligned i=. The cryptographic happy-path tests
land alongside the captured/synthetic test vectors in the next commit.

67 DKIM tests pass total (parse 14 + canonicalize 18 + dns_record 16
+ signature 7 + verify 12). Wasm32 build clean.
Lands the synthetic .eml fixtures plus the loader that turns them into
SmtpRequest-shaped data the verifier actually consumes. End-to-end
tests exercise the full pipeline: parse_eml -> SmtpRequest -> verify.

Synthetic vectors were generated offline using dkimpy (Python's
reference DKIM implementation) against a freshly-generated throwaway
2048-bit RSA key. The private key is *not* committed — only the .eml
output and the matching public DKIM TXT record. README in
test_vectors/dkim/ documents provenance and regeneration.

Three .eml files cover:
- c=relaxed/relaxed (the common case for mainstream senders)
- c=relaxed/simple (header relaxed, body simple — accepted)
- c=simple/simple (rejected with UnsupportedCanonicalization)

Eight tests in dkim/test_vectors.rs cover:
- happy-path verification of relaxed/relaxed and relaxed/simple
- simple/simple rejection
- flipped body byte -> BodyHashMismatch
- flipped signature byte -> SignatureInvalid (or related)
- wrong public key -> SignatureInvalid
- no DKIM-Signature header -> NoSignature
- the parse_eml helper itself (sanity check)

The .eml parser unfolds continuation lines per RFC 5322 §2.2.3, drops
the conventional single SP after the colon (matching what the gateway
does in production), and preserves the rest of the header value bytes
verbatim — so relaxed canonicalisation downstream produces the same
form the signer hashed.

Verifier totals: 75 DKIM tests (parse 14, canonicalize 18, dns_record
16, signature 7, verify 12, end-to-end 8). Full II suite: 313 tests
pass (was 238 pre-PR). Wasm32 build clean.
- Use matches!() instead of explicit match arms for KeyType::matches_signature_alg
- rustfmt across all dkim/ files and types/smtp.rs
- Stale test_vectors.rs doc-comment said the throwaway private key 'is committed' — corrected to 'is *not* committed' (the fixtures' provenance README is the source of truth)
- blank_b_tag_value: preserve original bytes around the b= tag instead
  of always emitting literal 'b='. RFC 6376 §3.7 says only the tag
  value (and surrounding WSP) is replaced; the tag name and the bytes
  between the name and '=' must come through verbatim. Concretely,
  signers that emit 'B=' (uppercase) or 'b\t=' / 'b =' (FWS between
  the name and '=') are valid per §3.2 and were previously mis-handled
  — relaxed canonicalisation collapses both forms to 'b:' downstream,
  so the bug only surfaces on signatures from those few senders, but
  by-construction is more honest. New tests cover uppercase B,
  tab-between-name-and-equals, and space-between-name-and-equals.

- test_vectors.rs module-level docstring: corrected to say the private
  key is *not* committed (the README and the actual tree both already
  said this; the docstring was the only contradiction). Dropped the
  reference to scripts/sign-dkim.py since that file isn't checked in.

- smtp.rs validate_smtp_request_envelope_only_ok: tightened the
  assertion from matches!(..., Ok(()) | Err(SmtpResponse::Ok{..}))
  to is_ok(). The Err arm was unreachable (SmtpResponse::Err always
  wraps SmtpRequestError) and would have masked a regression where
  validation mistakenly returned the Ok variant inside Err.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from 5415f58 to 606482a Compare May 11, 2026 16:01
@sea-snake
Copy link
Copy Markdown
Contributor Author

Replaced with a new PR from an upstream branch (enables direct collaboration). Same content, new PR number.

@sea-snake sea-snake closed this May 12, 2026
pull Bot pushed a commit to mikeyhodl/internet-identity that referenced this pull request May 12, 2026
…iring (dfinity#3838)

## Summary

First PR in the email-recovery stack (`docs/ongoing/email-recovery.md`
§10 Phase 0). Lands a working RFC-4035-compliant DNSSEC verifier for
caller-supplied DNS proof bundles, plus the trust-anchor wiring that
drives it. PR 2 (DKIM verifier) and PRs 4–9 (storage + recovery methods)
build on this.

## What's in this PR

### Verifier core
- New `dnssec/` module under `src/internet_identity/src/`:
- `types.rs` — `DnsProofBundle`, `SignedRRset`, `DelegationLink`,
`Rrsig`, `DnsName`, `DnssecError`, `VerifiedRecord`.
- `canonical.rs` — owner-name canonicalization, RR canonical form, RRSIG
signed-data construction (RFC 4034 §3.1.8.1, §6.2, §6.3), DS digest
input.
  - `signature.rs` — algorithm dispatch + DS digest matching (SHA-256).
- `verify.rs` — four-step algorithm (root anchor match → chain walk →
leaf RRSIG → freshness).
- Algorithm coverage (RFC 8624 MUST set):
  - **alg 8** — RSA-SHA256 (RFC 5702): root, com., most legacy zones.
- **alg 13** — ECDSA-P256-SHA256 (RFC 6605): most TLDs, Cloudflare,
modern zones.
  - **alg 15** — Ed25519 (RFC 8080): rare in production but mandatory.
  - Anything else returns `UnsupportedAlgorithm`.

### Wiring
- New `DnssecConfig` and `DnssecRootAnchor` types in
`internet_identity_interface`, exposed at the top of
`InternetIdentityInit` as `dnssec_config: opt opt DnssecConfig`
(set/clear semantics matching `analytics_config` and `dummy_auth`).
- Trust-anchor list plumbed through `init`/`post_upgrade` into
`PersistentState.dnssec_config` (and `StorablePersistentState` for
cross-upgrade persistence).
- `internet_identity.did` updated.

### Tests
13 unit tests in `dnssec/` covering:
- Real cloudflare.com chain verifies end-to-end (exercises alg 8 at root
and alg 13 at com → cloudflare.com → leaf).
- Empty trust-anchor list rejected with `NoTrustAnchors`.
- Wrong trust anchor rejected with `RootAnchorMismatch`.
- Flipped byte in root DNSKEY → `RootAnchorMismatch` or `BadSignature`.
- Flipped byte in leaf signature → `BadSignature`.
- Stale signature (clock advanced past expiration) →
`StaleOrFutureSignature`.
- Unsupported algorithm (alg 5 / RSA-SHA1) → `UnsupportedAlgorithm(5)`.
- Plus canonical-encoding + RFC 3110 RSA key parsing unit tests.

### Test infrastructure
- `test_vectors/dnssec/cloudflare-com-2026-05.json` — real DoH-captured
chain (root DNSKEY + 2 delegation links + leaf TXT).
- `test_vectors/dnssec/iana-root-anchors-2026-05.json` — IANA root KSK
trust anchors (Klajeyz/2017 + Kmyv6jo/2024).
- `scripts/capture-dnssec-chain.py` — reproducible capture script using
dnspython + DoH wire format. Tests use a frozen now from the capture's
metadata so freshness checks stay stable indefinitely.

### New deps
- `domain` (NLnet Labs, pure Rust) — referenced in docstrings for
canonicalisation primitives; signature verification is hand-rolled on
top of RustCrypto.
- `p256` — ECDSA P-256 verification.
- `ed25519-dalek` — Ed25519 verification.

All three build cleanly for wasm32-unknown-unknown.

## What's deferred to later PRs in the stack

- Captures for additional providers (proton.me, protonmail.com,
tutanota.com — gmail.com / icloud.com / outlook.com / fastmail.com don't
sign with DNSSEC; this is acknowledged in design doc §7.6).
- Synthetic Ed25519 (alg 15) test vector — most production zones are alg
8 or 13; alg 15 is structurally exercised by the dispatch logic but
doesn't have a real captured chain in this PR.

## Test plan

- [x] `cargo check -p internet_identity --target wasm32-unknown-unknown`
— clean (no warnings).
- [x] `cargo test -p internet_identity --bin internet_identity` — 238
tests pass (was 227 pre-PR).
- [x] `cargo test -p internet_identity_interface --lib` — 42 tests pass.
- [x] `cargo clippy -p internet_identity --bin internet_identity --tests
-- -D warnings` — clean.
- [x] `cargo fmt --check` — clean (modulo a pre-existing diff in
attributes.rs unrelated to this PR).
- [x] CI on dfinity#3838 — fully green.

## Design doc


https://github.com/sea-snake/internet-identity/blob/design/email-recovery/docs/ongoing/email-recovery.md
(PR pending review on dfinity/internet-identity)

## Stack

This is PR 1 of a 12-PR series. Subsequent PRs:
- **PR 2** — mail-auth-backed DKIM verifier, consuming DnsProofBundle
from this PR.
- **PR 3** — DMARC alignment.
- **PRs 4–9** — storage + Candid + behavior for email recovery.
- **PRs 10–12** — frontend (DoH walker, Manage UI, recovery wizard).

## PR Stack
| # | PR | Description | Status |
|---|---|---|---|
| 0 | [dfinity#3836](dfinity#3836) |
Design doc | Open |
| 1 | [dfinity#3838](dfinity#3838) |
DNSSEC verifier scaffold | Open |
| 2 | [dfinity#3839](dfinity#3839) |
DKIM verifier (RFC 6376) | Open |
| 3 | [dfinity#3840](dfinity#3840) |
DMARC alignment (RFC 7489) | Open |
| 4 | [dfinity#3841](dfinity#3841) |
DoH fallback | Open |
| 5+6 | [dfinity#3842](dfinity#3842)
| Setup flow (storage + smtp_request) | Open |
| 7 | [dfinity#3843](dfinity#3843) |
Recovery flow (delegation) | Open |
| 8 | [dfinity#3844](dfinity#3844) |
Frontend + feature flag | Open |
| 9 | [dfinity#3855](dfinity#3855) |
Deploy/upgrade scripts: dnssec_config + doh_config | Open |
| 10 | [dfinity#3857](dfinity#3857) |
Email-recovery UX overhaul | Open |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants