Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around broken encodings in received messages #2700

Closed
3 tasks done
flexibeast opened this issue Apr 30, 2024 · 7 comments
Closed
3 tasks done

Work around broken encodings in received messages #2700

flexibeast opened this issue Apr 30, 2024 · 7 comments
Labels
mu4e specific for mu4e rfe

Comments

@flexibeast
Copy link

Context: mu+mu4e 1.12.4 running on Emacs 29.3, both installed via Portage (net-mail/mu and app-editors/emacs) on Gentoo.

i recently received an email whose raw subject line is:

Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=

This subject line is correctly displayed in mu4e:view mode, as:

We’ve reconnected – and next steps

but in mu4e:headers mode, it's displayed as:

We\222ve reconnected \226 and next steps

i.e. the byte sequence:

We�ve reconnected � and next steps

This appears to be the result of the rfc2047-decode-region function, or some equivalent, not being run on the text; running that function on it results in correct display.

Checklist

  • you are running either an 1.10.x/1.12.x release or master (otherwise please upgrade)
  • you can reproduce the problem without 3rd party extensions (including Doom/Evil, various extensions etc.)
  • you have read all of the above
@flexibeast flexibeast added bug mu4e specific for mu4e new labels Apr 30, 2024
@djcb
Copy link
Owner

djcb commented Apr 30, 2024

Can you attach a message file (anonymized as needed) where this happens? Thanks.

@djcb djcb removed the new label Apr 30, 2024
@flexibeast
Copy link
Author

The only such email i have is the one i received today, which contains personal health information. i've redacted the bodies (i.e. the two MIME parts) basically in their entirety, and also redacted various bits of header content in a minimal way, hopefully still leaving it usable.

email.txt

@djcb
Copy link
Owner

djcb commented Apr 30, 2024

Emacs is just showing what it gets from the mu-server, it doesn't decode anything in the headers buffer.
Looking in a message (where it's shown as expected, with M-x describe-char I get:

   character: ’ (displayed as ’) (codepoint 8217, #o20031, #x2019)
              charset: windows-1252 (WINDOWS-1252 (Latin I))

so the problem seems to be that the original message uses the window-1252 charset, but claimed it was ISO-8859-1:

Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=

you can see that if you'd change the subject to

Subject: =?WINDOWS-1252?Q?We=92ve_reconnected_=96_and_next_steps?=

it will show correctly (after re-indexing etc.).

@djcb
Copy link
Owner

djcb commented Apr 30, 2024

Now, while it's the message's sender that's misbehaving, that won't help us very much.

mu can't easily do with gnus mail does (we're bound by GMime), but I'll turn this into an RFE ticket and see if we can find a work-around.

@djcb djcb changed the title [mu4e bug] Subject line not encoded correctly Work around broken encodings in received messages Apr 30, 2024
@djcb djcb added rfe and removed bug labels Apr 30, 2024
@flexibeast
Copy link
Author

Ah, great analysis, thank you. i think i had indeed run describe-char and noticed that 1252 was mentioned, but it didn't click that this was not what the "Subject" header was claiming ....

Thanks for converting this to an RFE. 👍 i'm going to try emailing postmaster@salesforce about this, which is probably unlikely to result in any change, but at least i'll have tried. 😛

@flexibeast
Copy link
Author

It's just come to my attention that the HTML5 spec says that "ISO-8859-1" is to be interpreted as Windows-1252. So presumably what's happening in this email is that it's assumed it will be read in a Web-based client - which, to be fair, is more than likely the case - such that the HTML5 spec is applicable. Which, fwiw, feels incorrect to me: even if the email body contains only a text/html MIME part, the headers are certainly not HTML.

@djcb
Copy link
Owner

djcb commented Jul 27, 2024

I'm moving this to the IDEAS.org file and close it here shortly... would probably best be solved at the GMime level.

@djcb djcb closed this as completed in e399fdc Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mu4e specific for mu4e rfe
Projects
None yet
Development

No branches or pull requests

2 participants