Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad encoding in mu4e-view with mu4e~view-gnus #1823

Closed
thierryvolpiatto opened this issue Oct 28, 2020 · 32 comments
Closed

Bad encoding in mu4e-view with mu4e~view-gnus #1823

thierryvolpiatto opened this issue Oct 28, 2020 · 32 comments

Comments

@thierryvolpiatto
Copy link
Contributor

Expected or desired behavior

Have accentuated characters correctly encoded.

Actual behavior

Accentuated characters are not encoded.

e.g. pièce => pi\303\250ce

This happen only in the gnus view, mu4e~view-internal is encoding properly.

Possibles solutions:

  1. Force reencoding instead of limiting to message with a content-type header:
diff --git a/mu4e/mu4e-view.el b/mu4e/mu4e-view.el
index d415d254..16ac7b66 100644
--- a/mu4e/mu4e-view.el
+++ b/mu4e/mu4e-view.el
@@ -390,11 +390,9 @@ article-mode."
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
     (erase-buffer)
     (insert-file-contents-literally path)
-    (unless (message-fetch-field "Content-Type" t)
-      ;; For example, for messages in `mu4e-drafts-folder'
-      (let ((coding (or (default-value 'buffer-file-coding-system)
-                        'prefer-utf-8)))
-        (recode-region (point-min) (point-max) coding 'no-conversion)))
+    (let ((coding (or (default-value 'buffer-file-coding-system)
+                      'prefer-utf-8)))
+      (recode-region (point-min) (point-max) coding 'no-conversion))
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")
      gnus-original-article-buffer (current-buffer))
  1. Use insert-file-contents (not litteraly)
diff --git a/mu4e/mu4e-view.el b/mu4e/mu4e-view.el
index d415d254..83b75113 100644
--- a/mu4e/mu4e-view.el
+++ b/mu4e/mu4e-view.el
@@ -389,7 +389,7 @@ article-mode."
                                             gnus-buttonized-mime-types)))
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
     (erase-buffer)
-    (insert-file-contents-literally path)
+    (insert-file-contents path)
     (unless (message-fetch-field "Content-Type" t)
       ;; For example, for messages in `mu4e-drafts-folder'
       (let ((coding (or (default-value 'buffer-file-coding-system)

In this case it is probably not needed to recode region as insert-file-contents is already doing it.

Versions of mu, mu4e/emacs, operating system etc.

Last Mu from Git, Emacs-27, Linuxmint.

@djcb
Copy link
Owner

djcb commented Nov 3, 2020

Can you attach a message (sufficiently censored of course) where this happens? Thanks.

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Nov 3, 2020 via email

@thierryvolpiatto
Copy link
Contributor Author

I use 2) since some days now without problems. I have also removed the block where you recode region, it is unneeded with insert-file-contents:

diff --git a/mu4e/mu4e-view.el b/mu4e/mu4e-view.el
index d415d254..259ab00a 100644
--- a/mu4e/mu4e-view.el
+++ b/mu4e/mu4e-view.el
@@ -389,12 +389,7 @@ article-mode."
                                             gnus-buttonized-mime-types)))
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
     (erase-buffer)
-    (insert-file-contents-literally path)
-    (unless (message-fetch-field "Content-Type" t)
-      ;; For example, for messages in `mu4e-drafts-folder'
-      (let ((coding (or (default-value 'buffer-file-coding-system)
-                        'prefer-utf-8)))
-        (recode-region (point-min) (point-max) coding 'no-conversion)))
+    (insert-file-contents path)
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")
      gnus-original-article-buffer (current-buffer))

@djcb
Copy link
Owner

djcb commented Nov 3, 2020

Thanks -- I didn't find the message though... did github cut it out perhaps? You can send it to me directly if you do not want to use the github web interface.

We did see quite a few of such encoding probs in the gnus-view early days (2018!) but I haven't seen them lately; will be interesting to see what's going on.

@djcb djcb closed this as completed in 861194f Nov 11, 2020
@djcb
Copy link
Owner

djcb commented Nov 11, 2020

Thanks, yeah, I can reproduce it with the test message, I've installed your change.

Now, the old way (using insert-file-contents-literally) may have been there to fix some other issues, but I could see any problem when trying your solution with some older "problem cases".

Thanks!

@Chris00
Copy link
Contributor

Chris00 commented Jan 5, 2021

This change causes problems for some messages. An example is attached. Here is how it is rendered.
email
test-encoding.txt

@Chris00
Copy link
Contributor

Chris00 commented Jan 5, 2021

With insert-file-contents-literally, it works fine. Moreover its documentation says that

A buffer may be modified in several ways after reading into the buffer,
due to Emacs features such as format decoding, character code
conversion, ‘find-file-hook’, automatic uncompression, etc.

This function ensures that none of these modifications will take place.

I do not think that we want to run ‘find-file-hook’ and the like in this case. @thierryvolpiatto, please upload a message using the Github interface (not replying by email).

@Chris00
Copy link
Contributor

Chris00 commented Jan 5, 2021

The problem with insert-file-contents-literally — which leaves to Gnus to do the right thing with the Content-type — is when there is no content type (such as for mails in the Draft folder). We could replace the added complexity by calling insert-file-contents in that case (provided we are happy with the hooks).

@thierryvolpiatto
Copy link
Contributor Author

I can't reproduce with your example email. As a french user I receive a lot of mails from french people without any decoding problem, did I miss something?
Here a screenshot with your example email:
Capture d’écran_2021-01-05_15-32-30

@thierryvolpiatto
Copy link
Contributor Author

As a replacement see also mm-insert-file-contents it is working fine here.

@@ -390 +390 @@ article-mode."
-    (insert-file-contents path)
+    (mm-insert-file-contents path nil nil nil nil t)

@Chris00
Copy link
Contributor

Chris00 commented Jan 11, 2021

@thierryvolpiatto I do not know what you did to my message — did you copy the file in one of your folders? — because it is not encrypted. On my above message (mm-insert-file-contents path nil nil nil nil t) does not work for me. And BTW, I also am a French user. ;-)

@Chris00
Copy link
Contributor

Chris00 commented Jan 11, 2021

The problem comes from the presence of format=flowed—without it, the decoding works...

@Chris00
Copy link
Contributor

Chris00 commented Jan 11, 2021

Another example that does not work with (mm-insert-file-contents path nil nil nil nil t) or (insert-file-contents path):

Content-Transfer-Encoding: 8bit
From: test <test@gmail.com>
Date: Mon, 20 Aug 2018 10:05:08 +0200
Content-Type: text/html; charset="utf-8"
Subject: test

<html><body>
Afficher  Documents partagés
</body>
</html>

(to be copied to a file in a mail subdirectory cur/)

@Chris00
Copy link
Contributor

Chris00 commented Jan 11, 2021

One thing that works for me is

    (insert-file-contents-literally path)
    (unless (mail-fetch-field "Content-Type" t)
      (mm-insert-file-contents path nil nil nil t t))

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 12, 2021 via email

@Chris00
Copy link
Contributor

Chris00 commented Jan 12, 2021

however why not decoding the text with the right encoding instead of reinserting all?

No problem to do that — that is what was done before your change. However, checking on some examples, this does not work as well as the previous code (what mm-insert-file-contents does for that has to be understood). Here is an example:

To: no-endoding@gmail.com
Subject: NO encoding specified but UTF-8
From: Christophe TROESTLER <Christophe.Troestler@umons.ac.be>
Date: Sun, 5 Jan 2021 22:37:00 +0200

Mathématique « diff » è

(I tried with this example saved both in UTF-8 and in latin-1.) Note that no Content-Type is present. The absence of Content-Type should be fairly rare (mostly for drafts).

P.S. Could you post the message that did not decode properly and triggered all this?

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 12, 2021 via email

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 12, 2021 via email

@Chris00
Copy link
Contributor

Chris00 commented Jan 12, 2021

It works except when the file is saved in latin-1. The following seems better:

     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
     (erase-buffer)
-    (insert-file-contents path)
+    (insert-file-contents-literally path)
+    (unless (mail-fetch-field "Content-Type" t)
+      (decode-coding-region (point-min) (point-max)
+                            (detect-coding-region (point-min) (point-max) t)))
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")

It seems from the documentation that detect-coding-region will always return a coding system, so no or is required. Does this work for you?

@Chris00
Copy link
Contributor

Chris00 commented Jan 12, 2021

It is not, all my email beeing encrypted, I have always:

Content-Type: multipart/encrypted; boundary=[...]; protocol="application/pgp-encrypted"

In this case, I guess (mail-fetch-field "Content-Type" t) returns t and so the decoding is not performed (Gnus does it). I meant (mail-fetch-field "Content-Type" t) returning nil is rare—except for the drafts.

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 13, 2021 via email

@thierryvolpiatto
Copy link
Contributor Author

Here what is working with my email with its subject encoded in latin and all your example emails you sent:

diff --git a/mu4e/mu4e-view.el b/mu4e/mu4e-view.el
index 97d940f2..bd5c0f73 100644
--- a/mu4e/mu4e-view.el
+++ b/mu4e/mu4e-view.el
@@ -376,6 +376,10 @@ article-mode."
 (defvar-local mu4e~gnus-article-mime-handles nil)
 (put 'mu4e~gnus-article-mime-handles 'permanent-local t)
 
+(defvar mu4e-content-type-to-decode '("multipart/encrypted")
+  "Decode message when its content-type is member of this list.
+When no content-type is found in message message is decoded.")
+
 (defun mu4e~view-gnus (msg)
   "View MSG using Gnus' article mode. Experimental."
   (require 'gnus-art)
@@ -388,7 +392,12 @@ article-mode."
                                             gnus-buttonized-mime-types)))
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
     (erase-buffer)
-    (insert-file-contents path)
+    (insert-file-contents-literally path)
+    (let* ((ct (message-fetch-field "Content-Type" t))
+           (split (and ct (split-string ct ";")))) 
+      (unless (and ct (not (member (car split) mu4e-content-type-to-decode)))
+        (decode-coding-region (point-min) (point-max)
+                              (detect-coding-region (point-min) (point-max) t))))
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")
      gnus-original-article-buffer (current-buffer))

An exception, you html example email you sent here unless mm-text-html-renderer is non nil, in this case the html backend e.g. w3m, shr etc... is doing the job.

@thierryvolpiatto
Copy link
Contributor Author

I created the variable mu4e-content-type-to-decode so that we can add more cases "to decode".

@thierryvolpiatto
Copy link
Contributor Author

Here the comment just added:

        ;; When a content-type is found and it is member of
        ;; `mu4e-content-type-to-decode' do nothing and let
        ;; `gnus-display-mime-function' doing the job, otherwise it
        ;; would defeat the encoding done here as it expects raw text.

@Chris00
Copy link
Contributor

Chris00 commented Jan 13, 2021

@thierryvolpiatto Would it be possible that you post a message here (maybe a fake one that you create in a file on your disk) that exhibits your problem? I sent to myself an encrypted message

Content-Type: multipart/encrypted; boundary="=-=-=";
	protocol="application/pgp-encrypted"

but the accents display properly.

@Chris00
Copy link
Contributor

Chris00 commented Jan 13, 2021

(Alternatively, you can send me an email that you know is causing trouble.)

@Chris00
Copy link
Contributor

Chris00 commented Jan 13, 2021

Alternative proposal:

                                             gnus-buttonized-mime-types)))
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
-    (erase-buffer)
-    (insert-file-contents-literally path)
-    (unless (mail-fetch-field "Content-Type" t)
-      (decode-coding-region (point-min) (point-max)
-                            (detect-coding-region (point-min) (point-max) t)))
+    (buffer-disable-undo)
+    (insert-file-contents-literally path nil nil nil t)
+    (mm-enable-multibyte)
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")
      gnus-original-article-buffer (current-buffer))
-    (run-hooks 'gnus-article-decode-hook)
+    (let* ((ct (mail-fetch-field "Content-Type"))
+           (ct (and ct (mail-header-parse-content-type ct)))
+           (charset (mail-content-type-get ct 'charset))
+           (charset (if charset (intern charset)))
+           (gnus-newsgroup-charset
+            (if (coding-system-p charset) charset
+              (detect-coding-region (point-min) (point-max) t))))
+      (run-hooks 'gnus-article-decode-hook))
     (let ((mu4e~view-rendering t) ; customize gnus in mu4e
           (max-specpdl-size mu4e-view-max-specpdl-size)

Chris00 added a commit to Chris00/mu that referenced this issue Jan 13, 2021
@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 13, 2021 via email

@Chris00
Copy link
Contributor

Chris00 commented Jan 13, 2021

If you refer to accented letters in the subject, I think it is badly encoded. One may attempt to work around this by specifically applying the charset (in the last version) to the subject.

Chris00 added a commit to Chris00/mu that referenced this issue Jan 13, 2021
@Chris00
Copy link
Contributor

Chris00 commented Jan 13, 2021

This works:

                                             gnus-buttonized-mime-types)))
     (switch-to-buffer (get-buffer-create mu4e~view-buffer-name))
-    (erase-buffer)
-    (insert-file-contents path)
+    (buffer-disable-undo)
+    (insert-file-contents-literally path nil nil nil t)
+    (mm-enable-multibyte)
     (setq
      gnus-summary-buffer (get-buffer-create " *appease-gnus*")
      gnus-original-article-buffer (current-buffer))
-    (run-hooks 'gnus-article-decode-hook)
+    (let* ((ct (mail-fetch-field "Content-Type"))
+           (ct (and ct (mail-header-parse-content-type ct)))
+           (charset (mail-content-type-get ct 'charset))
+           (charset (and charset (intern charset)))
+           (gnus-newsgroup-charset
+            (if (and charset (coding-system-p charset)) charset
+              (detect-coding-region (point-min) (point-max) t))))
+      (run-hooks 'gnus-article-decode-hook))
     (let ((mu4e~view-rendering t) ; customize gnus in mu4e
           (max-specpdl-size mu4e-view-max-specpdl-size)

@thierryvolpiatto
Copy link
Contributor Author

thierryvolpiatto commented Jan 14, 2021 via email

@thierryvolpiatto
Copy link
Contributor Author

Very good you found gnus-newsgroup-charset it looks like the best approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants