Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sendgrid: charset of text attachment is always iso-8859-1 #150

Closed
nuschk opened this issue May 23, 2019 · 15 comments
Closed

Sendgrid: charset of text attachment is always iso-8859-1 #150

nuschk opened this issue May 23, 2019 · 15 comments
Labels
blocked Waiting on an external dependency docs esp:SendGrid

Comments

@nuschk
Copy link

nuschk commented May 23, 2019

  • Anymail version
    Latest, 6.0.1

  • ESP (Mailgun, SendGrid, etc.)
    Sendgrid

  • Your ANYMAIL settings (change secrets to "redacted")
    Default

  • Versions of Django, requests, python
    django==1.11.20
    requests==2.20.0
    python==3.6.8

  • Exact error message and/or stack trace
    I'm sending an icalendar (ICS) file as attachment. The file's content is text (I don't think the exact content is relevant here, just note that there are some non-ascii chars around, like ü, and ä):

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//flatfox//flatfox.ch//
METHOD:REQUEST
BEGIN:VEVENT
SUMMARY:Viewing at Schreinerstrasse 64\, 8004 Zürich
DTSTART;VALUE=DATE-TIME:20190526T150000Z
DTEND;VALUE=DATE-TIME:20190526T153000Z
DTSTAMP;VALUE=DATE-TIME:20190523T121031Z
UID:036af9c4-74a8-4952-b4f0-a8212b80734a.201905261500@flatfox.ch
SEQUENCE:9
ATTENDEE;CN="Bernhard Mäder";CUTYPE=INDIVIDUAL;PARTSTAT=ACCEPTED;ROLE=REQ
 -PARTICIPANT;RSVP=FALSE;X-NUM-GUESTS=0:MAILTO:<some_email>@flatfox.ch
DESCRIPTION:Address: Schreinerstrasse 64\, 8004 Zürich\nContact person: B
 ernhard Mäder\nWhen: Sun\, 05/26/2019\, 5 p.m.\nAttendees: Max Muster - 2
 738492379847
LOCATION:Schreinerstrasse 64\, 8004 Zürich
ORGANIZER;CN=flatfox:MAILTO:<no_reply>@flatfox.ch
PRIORITY:5
STATUS:CONFIRMED
BEGIN:VALARM
ACTION:DISPLAY
DESCRIPTION:Viewing at Schreinerstrasse 64\, 8004 Zürich
TRIGGER:-PT30M
END:VALARM
END:VEVENT
END:VCALENDAR

The resulting payload (as received by the email client, after the sendgrid roundtrip) is as follows:

--cc91f1040f2ddb86e28e23abd2298e8686508c1291386a1d824cb54aebce
Content-Disposition: attachment; filename="termin.ics" Content-Transfer-Encoding: base64 Content-Type: text/calendar; name=termin.ics; charset=iso-8859-1
QkVHSU46VkNBTEVOREFSDQpWRVJTSU9OOjIuMA0KUFJPRElEOi0vL2ZsYXRmb3gvL2ZsYXRmb3gu
Y2gvLw0KTUVUSE9EOlJFUVVFU1QNCkJFR0lOOlZFVkVOVA0KU1VNTUFSWTpCZXNpY2h0aWd1bmcg
ZvxyIFNjaHJlaW5lcnN0cmFzc2UgNjRcLCA4MDA0IFr8cmljaA0KRFRTVEFSVDtWQUxVRT1EQVRF
LVRJTUU6MjAxOTA1MjZUMTUwMDAwWg0KRFRFTkQ7VkFMVUU9REFURS1USU1FOjIwMTkwNTI2VDE1
MzAwMFoNCkRUU1RBTVA7VkFMVUU9REFURS1USU1FOjIwMTkwNTIzVDEyMzUzNFoNClVJRDo2Mzhi
YjBjYS1mMjhhLTQ4YWItYWVkOS04ODYxYmY1ZTg3NGQuMTY2OTg2QGZsYXRmb3guY2gNClNFUVVF
TkNFOjENCkFUVEVOREVFO0NOPSJNYXggTXVzdGVyIjtDVVRZUEU9SU5ESVZJRFVBTDtQQVJUU1RB
VD1BQ0NFUFRFRDtST0xFPVJFUS1QQVJUDQogSUNJUEFOVDtSU1ZQPUZBTFNFO1gtTlVNLUdVRVNU
Uz0wOk1BSUxUTzpiZXJuaGFyZC5tYWVkZXJAb3V0bG9vay5jb20NCkRFU0NSSVBUSU9OOkFkcmVz
c2U6IFNjaHJlaW5lcnN0cmFzc2UgNjRcLCA4MDA0IFr8cmljaFxuS29udGFrdHBlcnNvbjogQmUN
CiBybmhhcmQgTeRkZXJcblplaXQ6IFNvXCwgMjYuMDUuMjAxOVwsIDE3OjAwDQpMT0NBVElPTjpT
Y2hyZWluZXJzdHJhc3NlIDY0XCwgODAwNCBa/HJpY2gNCk9SR0FOSVpFUjtDTj1mbGF0Zm94Ok1B
SUxUTzpjYWxlbmRhckBmbGF0Zm94LmNoDQpQUklPUklUWTo1DQpTVEFUVVM6Q09ORklSTUVEDQpC
RUdJTjpWQUxBUk0NCkFDVElPTjpESVNQTEFZDQpERVNDUklQVElPTjpCZXNpY2h0aWd1bmcgZvxy
IFNjaHJlaW5lcnN0cmFzc2UgNjRcLCA4MDA0IFr8cmljaA0KVFJJR0dFUjotUFQzME0NCkVORDpW
QUxBUk0NCkVORDpWRVZFTlQNCkVORDpWQ0FMRU5EQVINCg==
--cc91f1040f2ddb86e28e23abd2298e8686508c1291386a1d824cb54aebce--

The problem with this:

  • The payload is encoded with UTF-8, then converted to base64
  • The charset states iso-8859-1
  • Some mail client (outlook.com in particular) will now use the charset to decode the payload, which renders some characters wrongly. This is per spec, where the ICS's encoding may be deduced from the container's.

So, I'd like the charset to be set to utf-8.

Also interesting, but not really important: gmail, thunderbird and a few other clients will (incorrectly) use UTF-8, as this is the default encoding for icalendar files.

  • Any other relevant code and settings (e.g., for problems
    sending, your code that sends the message)
    None
@medmunds
Copy link
Contributor

Hmm. I'm not sure where the charset=iso-8859-1 is coming from.

I just ran this quick test with the SendGrid backend, and the message I received did not specifically declare an attachment charset (so was correctly decoded as utf-8). You can do this in python manage.py shell:

from django.core.mail import EmailMessage
att = """\
BEGIN:VCALENDAR
[... copied from your report above ...]
END:VCALENDAR
"""
msg = EmailMessage(to=['my email'], subject='SendGrid att test', body='check attachment')
msg.attach('termin.ics', att, 'text/calendar')
msg.send()

The raw message I received had these attachment headers, and my email clients displayed the original UTF-8 characters as expected. (I tried both Gmail and Outlook.com.) Notice there's no mention of iso-8859-1:

Content-Disposition: attachment; filename="termin.ics"
Content-Transfer-Encoding: base64
Content-Type: text/calendar; name="termin.ics"

QkVHSU46VkNBTEVOREFS...

So, a few possibilities why you're getting different results:

  • Are you receiving mail behind a proxy that decodes and re-encodes the message before delivering it to you? Like a spam/virus filtering gateway? If so, there might be a bug in that proxy.
  • I guess it would help to see your code that constructs and sends the message, to see how it differs from my example above. In particular, where the attachment content is coming from and how it gets attached to the message. (Maybe something along that path is forcing—or trying to auto-detect—character encoding?)
  • Django's DEFAULT_CONTENT_TYPE setting can affect email attachment charset, if you've changed it. (But the default is 'utf-8', which should do the right thing.)

Incidentally, the SendGrid API doesn't provide any way to declare the charset for attachment content. (I can't find specific docs on charsets, but it seems like SendGrid just assumes UTF-8.) And their API requires base64 encoding for all attachment content (even text).

@nuschk
Copy link
Author

nuschk commented May 24, 2019

Hey Mike, thanks for the fast response.

I'm sorry, I should have provided a full example. Here goes, this one reproduces my problem:

from django.core.mail import EmailMessage

ics_data = """BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//flatfox//flatfox.ch//
METHOD:REQUEST
BEGIN:VEVENT
SUMMARY:Viewing at Schreinerstrasse 64\, 8004 Zürich
DTSTART;VALUE=DATE-TIME:20190526T150000Z
DTEND;VALUE=DATE-TIME:20190526T153000Z
DTSTAMP;VALUE=DATE-TIME:20190523T121031Z
UID:036af9c4-74a8-4952-b4f0-a8212b80734a.201905261500@flatfox.ch
SEQUENCE:9
ATTENDEE;CN="Bernhard Mäder";CUTYPE=INDIVIDUAL;PARTSTAT=ACCEPTED;ROLE=REQ
 -PARTICIPANT;RSVP=FALSE;X-NUM-GUESTS=0:MAILTO:<some_email>@flatfox.ch
DESCRIPTION:Address: Schreinerstrasse 64\, 8004 Zürich\nContact person: B
 ernhard Mäder\nWhen: Sun\, 05/26/2019\, 5 p.m.\nAttendees: Max Muster - 2
 738492379847
LOCATION:Schreinerstrasse 64\, 8004 Zürich
ORGANIZER;CN=flatfox:MAILTO:<no-reply>flatfox.ch
PRIORITY:5
STATUS:CONFIRMED
BEGIN:VALARM
ACTION:DISPLAY
DESCRIPTION:Viewing at Schreinerstrasse 64\, 8004 Zürich
TRIGGER:-PT30M
END:VALARM
END:VEVENT
END:VCALENDAR"""

msg = EmailMessage(to=['<some_email>'], subject='Test', body='Nope')
msg.attach('termin.ics', ics_data, 'text/calendar')
msg.send()

And the resulting email as received by gmail:
"""
Delivered-To: <some_email>
Received: by 2002:a67:e059:0:0:0:0:0 with SMTP id n25csp3021256vsl;
Thu, 23 May 2019 22:22:22 -0700 (PDT)
...

--9e94b2f4a717f4011522c3f0fd23df848d075a48523d34c2fcef0989f247
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0

Nope
--9e94b2f4a717f4011522c3f0fd23df848d075a48523d34c2fcef0989f247
Content-Disposition: attachment; filename="termin.ics"
Content-Transfer-Encoding: base64
Content-Type: text/calendar; name=termin.ics; charset=iso-8859-1

QkVHSU46VkNBTEVOREFSClZFUlNJT046Mi4wClBST0RJRDotLy9mbGF0Zm94Ly9mbGF0Zm94LmNo
...
--9e94b2f4a717f4011522c3f0fd23df848d075a48523d34c2fcef0989f247--
"""

Also:

  • No DEFAULT_CONTENT_TYPE is set. I can even do msg.message().as_string() and see that django is setting charset=utf-8.
  • I'm sending to gmail and outlook, with the same result. I don't think there can be a proxy involved. As the charset is not coming from anymail, it must originate from sendgrid I guess.

After muddling through the code and the API specs, I'm unsure this can even be solved from anymail's side. I think I'll have to check with sendgrid. The funny thing is that you're getting different results...

@medmunds
Copy link
Contributor

The funny thing is that you're getting different results

Yep, since your test code is exactly equivalent to mine. I'm guessing there's something different in our SendGrid account settings that causes our attachments to get different charsets... oh, found it!

It's a SendGrid bug. Workaround is to enable click tracking (or any sort of tracking?) in your SendGrid settings: https://app.sendgrid.com/settings/tracking

I normally run SendGrid with open and click tracking enabled, for testing Anymail's webhooks. When I disable all tracking settings, I'm able to reproduce the problem you see with your example code. (The delivered email incorrectly has attachment charset=iso-8859-1.)

If I enable just click tracking, your example code results in an email with no attachment charset, which gets correctly interpreted as UTF-8. (It also causes SendGrid to add an HTML body generated from the plaintext body. And both body parts have charset=UTF-8, vs. charset=us-ascii when tracking is not enabled.) I haven't tried other tracking settings, but it's likely anything that causes SendGrid to rewrite the message will be a workaround.

My best guess is there's a problem in SendGrid's API-to-SMTP gateway, where it tries to guess the charset for the various message parts it's composing, and gets it wrong for attachments. (Or they've just hard coded text attachment charset to iso-8859-1.) But then there's a counteracting bug (feature?) in the SendGrid tracking filter, where it strips (or corrects?) charsets throughout the message as it rewrites it to add tracking. So if you've enabled tracking, you won't have the problem.

Let me know what you hear from SendGrid support. If they don't show an interest in fixing the API, we should at least document this in Anymail's list of SendGrid quirks.

For interacting with SendGrid support it can be helpful to have the raw API call data. You can dump those to the console that by adding "DEBUG_API_REQUESTS": True to your ANYMAIL settings. (Watch out for your API token in the dump if sharing publicly.)

[Incidentally, rewriting messages is apparently hard to get right. Amazon SES has a similar bug where enabling their tracking filter corrupts otherwise valid messages.]

@medmunds medmunds added blocked Waiting on an external dependency docs esp:SendGrid labels May 24, 2019
@medmunds
Copy link
Contributor

Also, I've got one other possible workaround you could try. It results in multiple, conflicting charsets on the attachment, and I don't know how Outlook would handle that. (I can't test Outlook myself, because apparently SendGrid has moved me to a sending IP on Outlook's block list. Sigh.)

# _possible_ workaround
msg.attach('termin.ics"; charset="utf-8', ics_data, 'text/calendar')  # [unmatched quotes sic]

results in:

Content-Disposition: attachment; filename="termin.ics"; charset="utf-8"
Content-Transfer-Encoding: base64
Content-Type: text/calendar; name=termin.ics; charset=iso-8859-1

But I don't know if Outlook will even notice the charset in the Content-Disposition, let alone give it precedence over the Content-Type charset.

@medmunds
Copy link
Contributor

Gah. One other workaround is to use an alternative part rather than an attachment:

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives(to=['<some_email>'], subject='Test', body='Nope')
msg.attach_alternative(ics_data, 'text/calendar')  # <<< alternative, not attachment
msg.send()

SendGrid still uses charset=iso-8859-1 for the calendar part, but correctly encodes the calendar text in that charset (=FC is the quoted-printable ISO-8859-1 encoding of ü):

--79f64a4aed941a586569a6557a5d4998e5e63d65b169f81a50e3acfd5a57
Content-Transfer-Encoding: quoted-printable
Content-Type: text/calendar; charset=iso-8859-1
Mime-Version: 1.0

BEGIN:VCALENDAR
...
DESCRIPTION:Viewing at Schreinerstrasse 64\, 8004 Z=FCrich
...

Gmail seems to display and handle a text/calendar alternative the same as an attachment. Don't know what Outlook or other clients will do.

Two downsides: you don't get to control the filename (Gmail calls it "invite.ics"). And SendGrid is one of very few ESPs that support adding arbitrary alternative parts, so you'd have to switch back to attachments if you ever change ESP.

(Best solution would still be getting SendGrid to fix how their API handles attachments, of course.)

@nuschk
Copy link
Author

nuschk commented May 26, 2019

You're the man, Mike, many thinks for your in-depth analysis!

I will certainly try the workarounds and test them against different clients and see how it goes. I agree, these are hacks, it should be fixed in sendgrid.

I'm still emailing with sendgrid support, will post back when I know more.

@medmunds
Copy link
Contributor

medmunds commented Jul 2, 2019

@nuschk did you end up getting any useful suggestions from SendGrid support? (I'm pretty sure I already know the answer, but wanted to double check.)

@medmunds
Copy link
Contributor

medmunds commented Jul 7, 2019

@nuschk I'm no longer able to reproduce this bug: SendGrid is no longer forcing charset=iso-8859-1 on text attachments, from what I can tell. (Or maybe they've started running all messages through their tracking rewriter that strips the incorrect attachment charset—even when tracking isn't enabled and nothing needs to be rewritten.)

If you're still seeing the problem, let me know. Otherwise I'm going to assume that your SendGrid support request actually resulted in an API fix, which is great news!

@medmunds medmunds closed this as completed Jul 7, 2019
@nuschk
Copy link
Author

nuschk commented Jul 8, 2019

Hey Mike

Sorry for my late reply, I've been away for vacations and decided to not take my emails with me. ;-)

I'm actually still in conversation with the sendgrid support. It's a rather forthcomming back-and-forth, so I come away with a good impression regarding their support (We're sending ~1M mails, so we at least pay some USD 100s to them, which might have helped).

I discovered that the problem persisted for all types of text/* attachments, unfortunately, it doesn't occur on all kinds of accounts (as you discovered as well, but tracking settings didn't help either). So, they weren't really able to reproduce it consistently, although they took the report seriously.

Their final advice was to send the calendar data as alternative part (as you suggested as well). Maybe their engineering fixed the root cause in the meantime, even if support didn't imply they would... I will have to check but haven't yet have the time for it.

Thanks so much for your help, much appreciated!

@medmunds
Copy link
Contributor

Hey Bernhard,

(No worries on the replies—I figured you were probably on holiday or something.)

I'd been trying to construct a reproducible case, without using Anymail, to report against the sendgrid-python package. That's when I discovered that even your original problem report seemed to be working now.

From what you say, it's not clear to me whether SendGrid actually resolved the issue, or whether there's some intermittent problem (or different code running on different SendGrid API servers). In any case, there's not anything we can do about it in Anymail, but if you (or anyone else) sees the problem again, please add a comment here and I can at least add it to the documentation.

(I use a free SendGrid account for testing Anymail—my company switched to another ESP a few years ago. So my SendGrid support requests generally just get a "you should upgrade to a dedicated IP plan" form response. Glad to hear yours got more attention.)

@nuschk
Copy link
Author

nuschk commented Jul 12, 2019

I'm now unable to reproduce the issue with our account and settings. So yeah, they did fix it on their side. Yay 🎉!

And sure, will report back if it comes up again!

Again, thanks so much for your help!

@swrobel
Copy link

swrobel commented Dec 12, 2019

This is unrelated to django-anymail, but just in case anyone else ends up here, Sendgrid is still changing the charset on some messages. From their support:

Talking with my team, I've discovered that we are seeing unexpected charset changes when sending mail through our system under some circumstances. With that being said, our team is monitoring the impact of this issue. We are gathering information regarding the different use cases for being able to control the charset completely in your requests as opposed to them being changed. We recently passed this case along to our engineering team based on unexpected behavior and I will reply here as soon as we have more information in regards to this from that team.

@medmunds
Copy link
Contributor

@swrobel thanks for passing along the info from SendGrid support. Please let us know if they end up adding attachment charset control to the API—I can update Anymail's attachment handling to pass along the charset.

(SendGrid's API does allow some additional fields in the attachment type, but charset isn't one of them. E.g., "type": "text/calendar; method=POST" is accepted. But "type": "text/calendar; charset=utf-8" resulted in an error about illegal ";" characters when I tried it earlier this year.)

medmunds added a commit that referenced this issue Dec 13, 2019
Document SendGrid's unpredictable behavior around forcing `charset="iso-8859-1"` into text attachments. (Since it seems to be happening again.)

See #150 for details.
medmunds added a commit that referenced this issue Dec 13, 2019
Document SendGrid's unpredictable behavior around forcing `charset="iso-8859-1"` into text attachments. (Since it seems to be happening again.)

See #150 for details.
medmunds added a commit that referenced this issue Nov 27, 2020
Document SendGrid's unpredictable behavior around forcing `charset="iso-8859-1"` into text attachments. (Since it seems to be happening again.)

See #150 for details.
@gawry
Copy link

gawry commented May 31, 2022

@medmunds By going with this method (adding alternative part) do Gmail read the invite properly (and the auto-accept works)? I'm trying to make the auto-accept works without success.

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives(to=['<some_email>'], subject='Test', body='Nope')
msg.attach_alternative(ics_data, 'text/calendar')  # <<< alternative, not attachment
msg.send()

@medmunds
Copy link
Contributor

@gawry sorry, I don't know what auto-accept is. If that's a Gmail feature, you might get better info asking in a Google Workspace developer support forum.

If auto-accept works when you send the text/calendar data as an attachment, but doesn't work when sending it as an alternative part, you have your answer. And if it doesn't work in either case, then something else is going wrong and it's probably not related to SendGrid or django-anymail.

But in any case, until SendGrid either fixes their end, or adds an API option to control attachment charset, the only truly reliable workarounds are:

  • Avoid non-ASCII characters in text attachments when using SendGrid (or just accept that those characters may sometimes show up garbled in some client apps)
  • Or switch to a different ESP that doesn't mangle text attachments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Waiting on an external dependency docs esp:SendGrid
Projects
None yet
Development

No branches or pull requests

4 participants