Skip to content

Django 6.0 support: replace sanitize_address() / international email address handling #444

@medmunds

Description

@medmunds

The current django-anymail release (13.1) uses the Django internal function sanitize_address(), which is deprecated in Django 6.0. Although this will continue to work for now (with deprecation warnings), Anymail should stop using that function.

sanitize_address() has three primary functions:

  1. Blocking CR and NL characters in addresses (under some conditions), to help prevent email header injection vulnerabilities
  2. Converting non-ASCII display-names to RFC 2047 encoded-word syntax
  3. Converting non-ASCII domains to IDNA (Django specifically uses IDNA 2003, as that's what's available in Python's standard libraries)

Anymail centralizes email address handling in its anymail.utils.EmailAddress object, which is where sanitize_address() is currently called.

EmailAddress already blocks CR/NL, so the first function is covered. (ESP APIs really should prevent header injection themselves, but experience has shown there are sometimes gaps.)

The other two functions have to do with handling international email addresses, and require a bit more thought.

Encoding non-ASCII display names

Most ESP APIs correctly convert a non-ASCII display name to an RFC 2047 encoded-word, but there are frequently bugs. The bugs can often be avoided by encoding the display name in Anymail, before calling the ESP API.

The plan: Anymail's EmailAddress should expose an option for whether to apply RFC 2047 to the display name, and then individual ESP backends should use that option as necessary.

Here are the results of some testing I ran in early 2025 (and repeated in 12/2025), directly posting non-ASCII display names to the ESP's API (without involving Anymail). "Bug" means Anymail can apply RFC 2047 locally to avoid the problem; "BUG" in all caps is an issue with no workaround possible:

ESP From To Reply-To
Amazon SES Bug Bug Bug
Brevo Bug Bug BUG
Mailersend ok ok ok
Mailgun ok ok ok
Mailjet ok ok ok
Mailpace ok BUG ok
Mailtrap ok ok ok
Mandrill BUG BUG BUG
Postmark ok ok ok
Resend ok ok ok
Scaleway ok ok ok
Sendgrid ok ok ok
Sparkpost ok ok ok
Unisender Go ok ok BUG

Bugs:

  • Amazon SES: SendBulkEmail v2 converts non-ASCII characters in display-names to Unicode replacement characters (�), which it then sends as raw (8-bit) utf-8.1 Anymail can work around this by generating the encoded-word itself, which then sends correctly. (Amazon SES's SendEmail with simple or raw content seems to handle display-names correctly.)
  • Brevo: Silently ignores a display-name that contains both non-ASCII characters and a comma. Anymail could work around this by generating the encoded-word itself, but that exposes another bug: in the Reply-To field only, Brevo incorrectly wraps an encoded-word in double quotes.2
  • Mailpace: Rejects attempts to use a comma in a To display-name. (Otherwise correctly handles RFC 2047 encoding.)
  • Mailtrap: Fixed 2025-12-05. (Was: Rejects a display-name containing a comma in the From field, complaining that the "'From:' header does not match the sender's domain.")
  • Mandrill: Correctly applies RFC 2047 encoding, but then incorrectly wraps the encoded-word in double quotes.2 Anymail cannot work around this.
  • Unisender Go: In the Reply-To field only, converts a non-ASCII domain name to ISO-8859-1 and leaves that as raw (8-bit) characters in the header.1 Anymail could work around this by generating the encoded-word itself, but that exposes another bug: in the reply_to_name API field, Unisender Go incorrectly wraps an encoded-word in double quotes.2

Encoding non-ASCII domain names

International domain names must be converted to a 7-bit encoding before sending. Of Anymail's supported ESPs, only two do that on their own (based on tests in early 2025):

  • Mailgun applies UTS46 non-transitional encoding, which is based on the latest IDNA 2008 spec and is arguably the correct behavior.
  • Unisender Go applies UTS46 transitional encoding, which maintains compatibility with the earlier IDNA 2003 spec. This is possibly outdated, but I wouldn't call it incorrect.

All other ESPs either reject addresses with non-ASCII domains before sending (most ESPs), treat them as an internal bounce (Mailjet, Sparkpost), or silently drop the message (SendGrid).

Django's sanitize_address() uses IDNA 2003, and the updated django.core.mail code in Django 6.0 does the same. IDNA 2003 is the only IDNA option available in Python without installing third-party libraries.

The plan: Anymail should make IDN encoding a configuration option. The default should be Python's built-in IDNA 2003, to maintain compatibility with Django and earlier Anymail versions IDNA 2008. [Edit: changed default, see comment below.] But users should be able to change to IDNA 2008 or UTS46 if they prefer, using the idna or uts46 libraries. (Or select "no encoding" if their ESP handles it.)

Custom headers

While we're fixing up international character handling, it's worth looking at how the ESP APIs handle non-ASCII characters in custom header values. There are three ESPs with problems in custom headers (based on the same testing):

  • Amazon SES [Bug, with workaround]: SendBulkEmail v2 rejects Headers and ReplacementHeaders with non-ASCII characters. Supplying an encoded-word is an effective workaround.
  • Brevo [BUG, no workaround]: Transmits non-ASCII characters in headers as raw (8-bit) utf-8.1 No workaround is possible, because Brevo's API decodes an rfc2047 encoded-word in a custom header value and transmits it as raw (8-bit) utf-8.
  • Unisender Go [Bug, with workaround]: Converts non-ASCII characters in headers to ISO-8859-1 and transmits that as raw 8-bit values.1 Supplying an encoded-word is an effective workaround.

Footnotes

  1. All email headers must be 7-bit ASCII. 8-bit header values can interfere with delivery! (Unless the receiving SMTP server advertises support for the SMTPUTF8 extension. The one I tested with did not.) Non-ASCII characters must be encoded, usually as RFC 2047 encoded-words. In an address header, encoded-word only works for display-name; domains must use IDNA encoding. 2 3 4

  2. An RFC 2047 encoded-word "MUST NOT" be included in a quoted-string. Strictly compliant mail apps would show it to the recipient without decoding. (Gmail goes ahead and decodes it anyway.) 2 3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions