Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-UTF-8 attachments are messed up after being downloaded by Offlineimap3 #44

Closed
Elvith opened this issue Feb 1, 2021 · 13 comments
Closed
Labels
question Further information is requested

Comments

@Elvith
Copy link

Elvith commented Feb 1, 2021

I am not entirely sure this is indeed an Offlineimap3 bug. But I upgraded to offlineimap3 yesterday evening, and after starting it for the first time this morning, I had to correct my python script that decodes with GPG the files where I store my passwords, and now this script is working but I still have some weird behavior with my emails that did not exist yesterday. Since the python script/GPG thing was clearly related to Offlineimap3, I am guessing that this is not a coincidence and Offlineimap3 is also responsible of the other bug. Moreover, the emails that are messed up on my Debian using offlineimap are not messed up on my phone using K9-Mail, so I’d say it is not the emails themselves that have a problem, but indeed my Debian setup (so, mutt+offlineimap).

The problem is that emails with non-utf-8 charset, for instance iso-8859-15, that have been downloaded by Offlineimap3 (so, downloaded today) are not displayed correctly. This is the case both for text/html attachments (displayed in mutt with w3m) or for text/plain attachments. I have strings like ï¿œ everywhere I should have é, for instance.

Similar emails that were downloaded before the Offlineimap3 upgrade are displayed correctly, so the problem really seems to be with Offlineimap3.

If I run file on one of these text attachments, I find that the charset is UTF-8, even if the HTML header says otherwise. For emails downloaded before the Offlineimap3 upgrade, the charset indicated by file and that indicated in the HTML header are consistent.

If I run iconv -f UTF-8 -t UTF-8 on one of these text attachments, the ᅵ is changed into �.

So my feeling is that somehow Offlineimap3 writes all emails it is downloading in UTF-8 but does not convert the characters from the old charset to the UTF-8 charset properly. By doing so, characters like é are changed into the special UTF-8 character �, but then mutt and w3m still think that the file is in ISO-8859 or whatever because this is what is written in the HTML header, and are unable to display correctly the text.

I am not a charset expert and I am probably not making any sense, sorry. I will try to answer as precisely as I can any question you might have.

General informations

  • system/distribution (with version): Debian testing
  • offlineimap version (offlineimap -V): 7.3.0
  • Python version: 3.9.1
  • server name or domain:
  • CLI options:

Configuration file offlineimaprc

[general]
# General information.  See the fully annotated example more information
# https://github.com/jgoerzen/offlineimap/blob/master/offlineimap.conf
pythonfile = ~/Programmation/Scripts/offlineimap_gpg.py

metadata = ~/.offlineimap
accounts = account1,account2,account3,account4
maxsyncaccounts = 6
socktimeout = 60
ui = basic

[mbnames]
# This populates a mailbox file for mutt to use.
# We disabled this once it was made !

enabled = no
filename = ~/.mail_configs/mutt/mailboxes
header = "mailboxes "

peritem = "+%(accountname)s/%(foldername)s"
sep = " "
footer = "\n"

[Account account1]
localrepository = local-account1
remoterepository = remote-account1
autorefresh = 2
quick = 2
postsynchook = ~/Programmation/Scripts/spamscanner.sh

[Account account2]
localrepository = local-account2
remoterepository = remote-account2
autorefresh = 2
quick = 2

[Account account3]
localrepository = local-account3
remoterepository = remote-account3
autorefresh = 2
quick = 2

[Account account4]
localrepository = local-account4
remoterepository = remote-account4
autorefresh = 2
quick = 2

[Repository local-account1]
type = Maildir
localfolders = /home/elvith/.mail/account1
nametrans = lambda foldername: re.sub ('INBOX.INBOX', 'INBOX', re.sub (r'^', r'INBOX.', foldername))

[Repository remote-account1] 
type = IMAP 
remotehost = REDACTED
remoteuser = REDACTED
remotepasseval = get_pass_account1()
ssl = yes
sslcacertfile = OS-DEFAULT
realdelete = yes
maxconnections = 3
#holdconnectionopen = true
#keepalive = 60
folderfilter = lambda f: f not in ['Templates']
nametrans = lambda foldername: re.sub('^INBOX\.', '', foldername)

[Repository local-account2]
type = Maildir
localfolders = /home/elvith/.mail/account2
nametrans = lambda foldername: re.sub ('inbox', 'INBOX', foldername) 

[Repository remote-account2] 
type = IMAP 
remotehost = REDACTED
remoteuser = REDACTED
remotepasseval = get_pass_account2()
ssl = yes
sslcacertfile = OS-DEFAULT
realdelete = yes
maxconnections = 3
#holdconnectionopen = true
#keepalive = 60
nametrans = lambda foldername: re.sub ('INBOX', 'inbox', foldername)

[Repository local-account3]
type = Maildir
localfolders = /home/elvith/.mail/account3 
nametrans = lambda foldername: re.sub ('inbox', 'INBOX', foldername) 

[Repository remote-account3]
type = IMAP
remotehost = REDACTED
remoteuser = REDACTED
remotepasseval = get_pass_account3()
ssl = yes
sslcacertfile = OS-DEFAULT
ssl_version = tls1_2
cert_fingerprint = 4169f501ed5ddfcc6c6705bc9114bae5c89b8ca7
realdelete = yes
maxconnections = 3
#holdconnectionopen = true
#keepalive = 60
folderfilter = lambda f: f in ['INBOX','sent','trash']
nametrans = lambda foldername: re.sub ('INBOX', 'inbox', foldername)

[Repository local-account4]
type = Maildir
localfolders = /home/elvith/.mail/account4
nametrans = lambda foldername: re.sub ('INBOX.INBOX', 'INBOX', re.sub ('inbox', 'INBOX', re.sub (r'^', r'INBOX.', foldername)))

[Repository remote-account4]
type = IMAP
remotehost = REDACTED
remoteuser = REDACTED
remotepasseval = get_pass_account4()
ssl = yes
sslcacertfile = OS-DEFAULT
realdelete = yes
maxconnections = 3
#holdconnectionopen = true
#keepalive = 60
folderfilter = lambda foldername: not re.search('INBOX.INBOX.', foldername)

# Change INBOX.*, and lowercase
nametrans = lambda foldername: re.sub('^inbox\.', '', foldername.lower())

pythonfile (if any)

This is the file offlineimap_gpg.py that is referred to in the file .offlineimaprc above

#! /usr/bin/env python
from subprocess import check_output

def get_pass_account1():
        return check_output("gpg -dq ~/Documents/Informatique/password_account1.gpg", shell=True).decode().strip("\n")

def get_pass_account2():
        return check_output("gpg -dq ~/Documents/Informatique/password_account2.gpg", shell=True).decode().strip("\n")

def get_pass_account3():
        return check_output("gpg -dq ~/Documents/Informatique/password_account3.gpg", shell=True).decode().strip("\n")

def get_pass_account4():
        return check_output("gpg -dq ~/Documents/Informatique/password_account4.gpg", shell=True).decode().strip("\n")
jishac added a commit to jishac/offlineimap3 that referenced this issue Feb 24, 2021
jishac added a commit to jishac/offlineimap3 that referenced this issue Feb 24, 2021
…s well and I reviewed the code several times. However, I cannot test it, testers wanted!

This commit: Minor bug fixes from testing

Should finalize implementation of enhancement OfflineIMAP#48
OfflineIMAP#48

And fix issues OfflineIMAP#43 and OfflineIMAP#44
OfflineIMAP#43
OfflineIMAP#44

Signed-off-by: Joseph Ishac <jishac@nasa.gov>
Tested-by: Joseph Ishac <jishac@nasa.gov>
@jishac jishac mentioned this issue Feb 24, 2021
5 tasks
@Elvith
Copy link
Author

Elvith commented Mar 28, 2021

With the latest version of Offlineimap3 in Debian Testing (offlineimap3 0.0~git20210225.1e7ef9e+dfsg-3 according to https://tracker.debian.org/pkg/offlineimap3) I still encounter the bug. (I re-downloaded from the IMAP server one faulty message and it is still messed up.)

Below an extract of the text of one such message (although I do not think this can really help…) with the correct utf-8 encoding. The characters replaced by � are mostly ’ é à è. In mutt they are by default displayed as � because the text is not recognized as utf-8 but as windows-1252, but if I change the encoding I obtain what follows.


*La direction du CNRS a d�cid� de publier en 2021 les classementsdes 
jurys d�admissibilit� aux concours de charg�s de recherchepar ordre 
alphab�tique*, alors qu'ils �taient communiqu�s jusqu�en 2020par ordre 
de m�rite.

Le Sgen CFDT Recherche^EPST s�oppose � cette d�cision qui m�prise les 
r�gles jusque-l� admises par l�ensemble de la communaut� et d�nature le 
jugement par les pairs r�alis� par les sections du comit� national. 
Cette absence de transparence instille le doute sur l�impartialit� du 
jury d�admission et met � mal la confiance que tous et notamment les 
candidats portent sur l�organisation des concours au CNRS�!

Can I do anything to help?

@sudipm-mukherjee
Copy link
Contributor

Not sure if this is related but another Debian user has reported non utf-8 decoding error. Looks like #56 has not completely fixed the utf-8 issue.

https://bugs.debian.org/985663

@thekix do you think these two are same issue or do you want me to open a new issue for the Debian bug?
@jishac fyi.

@jishac
Copy link
Contributor

jishac commented Mar 29, 2021

The stack trace at

https://bugs.debian.org/985663

is a bit suspicious since it lists

  File "/usr/share/offlineimap3/offlineimap/folder/Maildir.py", line 262, in getmessage
    retval = file.read()

which is the old code base. Is there a chance that both old and new are installed?

@sudipm-mukherjee
Copy link
Contributor

Thanks for spotting that @jishac, I should have checked before posting here. :(
Let me go back to the user and check.

@jishac
Copy link
Contributor

jishac commented Mar 29, 2021

With the latest version of Offlineimap3 in Debian Testing (offlineimap3 0.0~git20210225.1e7ef9e+dfsg-3 according to https://tracker.debian.org/pkg/offlineimap3) I still encounter the bug. (I re-downloaded from the IMAP server one faulty message and it is still messed up.)

@Elvith I've had a few messages that were permanently altered by the old code base when uploading a message to the server in this very way (utf-8 encoding when the email states another). Are you able to diagnose the raw message stored on the server (by downloading it out of band for example)? If it is mangled on the server, then a manual correction may be needed.

@Elvith
Copy link
Author

Elvith commented Mar 31, 2021

@Elvith I've had a few messages that were permanently altered by the old code base when uploading a message to the server in this very way (utf-8 encoding when the email states another). Are you able to diagnose the raw message stored on the server (by downloading it out of band for example)? If it is mangled on the server, then a manual correction may be needed.

Not sure about what you mean. What I did is :

  1. Kill offlineimap
  2. Remove (rm -r) the directory on my computer where offlineimap downloads emails
  3. Start offlineimap again.
    Then it re-created the missing directory, and re-downloaded all missing emails from the server. And the faulty email was still messed up.

Is there another test I might do?

@jishac
Copy link
Contributor

jishac commented Mar 31, 2021

If your email provider has a web interface, most of the time you can download a single raw message. Some providers make it hard (Outlook Web Mail comes to mind) but most I've seen have some way of doing it. Then inspect or open that email in your viewer of choice (mutt, Thunderbird, etc). If you do not have a web interface, then you could enable debugging in offlineimap to capture the raw message, but I would not recommend doing that for the entire mailbox as you would generate a lot of output.

Basically, by trying to capture what the server has stored as the message we can determine if offlineimap is having an issue or if the problem exists in the source message stored on the server.

@Elvith
Copy link
Author

Elvith commented Mar 31, 2021

OK. I just looked at the “source” of the faulty message on my webmail interface; it is still scrambled.

But if I understand correctly how offlineimap synchronizes the messages on my computer and on the server, this just means that the scrambled version on my computer, scrambled by offlineimap when downloaded, replaced at some point the original version on the server. This does not mean that the original version on the server was also scrambled and therefore this does not mean that offlineimap is not having an issue. Or does it?

And if this is correct, then in fact this message does not prove anything, since it is dated from March 22, before the upgrade of offlineimap on my debian… It was scrambled on my computer before the upgrade, the scrambled version was uploaded/synchronized on the server before the upgrade, and therefore the message will remain scrambled, upgrade or not. No upgrade will miraculously “un-scramble” it.
What I would need in order to claim that offlineimap is still buggy is a new scrambled email, received after the upgrade. Correct?

@jishac
Copy link
Contributor

jishac commented Mar 31, 2021

OK. I just looked at the “source” of the faulty message on my webmail interface; it is still scrambled.

But if I understand correctly how offlineimap synchronizes the messages on my computer and on the server, this just means that the scrambled version on my computer, scrambled by offlineimap when downloaded, replaced at some point the original version on the server. This does not mean that the original version on the server was also scrambled and therefore this does not mean that offlineimap is not having an issue. Or does it?

And if this is correct, then in fact this message does not prove anything, since it is dated from March 22, before the upgrade of offlineimap on my debian… It was scrambled on my computer before the upgrade, the scrambled version was uploaded/synchronized on the server before the upgrade, and therefore the message will remain scrambled, upgrade or not. No upgrade will miraculously “un-scramble” it.

Correct, no upgrade will miraculously “un-scramble” the original message as the server copy is affected, very likely in exactly the manner you outlined (see note at end). Knowing the old offlineimap behavior, one could try and write a tool to seek out such messages based on a heuristic, and try to correct them. What I did in my case is manually correct any message that was "important".

What I would need in order to claim that offlineimap is still buggy is a new scrambled email, received after the upgrade. Correct?

Correct, seeing the issue on a new message after the upgrade would be a bug, but the new handling should be robust to encoding types so you should not experience any.

PS: It can be hard to assign blame for scrambled/manged messages. I have seen scrambled/mangled email that arrived to my inbox. Usually rare for non-spam messages (but possible). That said, if the sever copy of the March 22 message has a "X-OfflineIMAP" header, then offlineimap uploaded that copy and is likely to blame.

@Elvith
Copy link
Author

Elvith commented Mar 31, 2021

PS: It can be hard to assign blame for scrambled/manged messages. I have seen scrambled/mangled email that arrived to my inbox. Usually rare for non-spam messages (but possible). That said, if the sever copy of the March 22 message has a "X-OfflineIMAP" header, then offlineimap uploaded that copy and is likely to blame.

Interesting. No, it does not have that header, but the local copy does not have it as well. I guess I must have parametrized my offlineimap in such a way that this header is not written, or maybe it is the default behavior.

Anyway, thank you for your precious input, I will keep an eye on new emails in the forthcoming weeks and I will report here in case of scrambled messages. 👍

@thekix
Copy link
Contributor

thekix commented Oct 11, 2021

Hi @Elvith

can we close this issue?

Thanks a lot.
kix

@thekix thekix added the question Further information is requested label Oct 11, 2021
@Elvith
Copy link
Author

Elvith commented Oct 12, 2021

Yes, the problem seems solved, I have not had any issue since my previous message. Thank you very much!

@Elvith Elvith closed this as completed Oct 12, 2021
@nicolas33
Copy link
Member

Side note: the "X-OfflineIMAP" header is only set when the server does not support the UID PLUS IMAP extension. This header allows offlineimap to get the assigned UID once the email is uploaded in this case.

However, adding a new header on each uploaded email might be useful to track what's was done by offlineimap. This might help a lot to debug issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants