Skip to content

Migrate IDNA canonicalization from stdlib (IDNA 2003) to PyPI idna (IDNA 2008) across signing/ #777

@bokelley

Description

@bokelley

Context

ADCP #3690 security.mdx line 1132 mandates IDNA-2008 A-label form for host canonicalization on origin comparisons. The package currently uses stdlib host.encode(\"idna\") (IDNA 2003) in four places:

#775's commit body acknowledged this divergence and made the deliberate call to match the existing convention rather than fragment the canonicalization story across modules. This issue tracks the package-wide migration.

Practical impact (today)

IDNA 2003 vs IDNA 2008 differ on a small set of code points:

  • ß (German sharp s) — IDNA 2003 maps to ss; IDNA 2008 preserves as xn--strae-oqa.
  • Final sigma ς — IDNA 2003 maps to σ; IDNA 2008 preserves.
  • ZWJ / ZWNJ — IDNA 2003 strips; IDNA 2008 may preserve in valid contexts.

For ad-tech operator domains this is near-zero risk in practice. The real exposure is cross-SDK conformance: a peer SDK using the idna package would canonicalize the same wire bytes differently, and the two would disagree on whether straße.de and xn--strae-oqa.de are the same origin.

What to change

  1. Add idna to pyproject.toml [project.dependencies].
  2. Replace host.encode(\"idna\") with idna.encode(host, uts46=True) in all four callsites in a single commit.
  3. Update the canonicalization rationale comments in each module (the four currently cross-reference each other; the migration should update all of them).
  4. Add a regression test: straße.de canonicalizes to xn--strae-oqa.de (IDNA 2008 behavior), not strasse.de (IDNA 2003 behavior). Verifies the migration actually changed behavior.
  5. Run the storyboard + conformance suites to confirm no regressions in the existing IDNA paths.

Why one commit

Splitting the migration across PRs would leave the codebase in a state where two SDK modules canonicalize the same host differently. Keep all four callsites coherent — one commit, one review.

References

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions