-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-UTF-8 attachments are messed up after being downloaded by Offlineimap3 #44
Comments
Should finalize implementation of enhancement OfflineIMAP#48 OfflineIMAP#48 And fix issues OfflineIMAP#43 and OfflineIMAP#44 OfflineIMAP#43 OfflineIMAP#44
…s well and I reviewed the code several times. However, I cannot test it, testers wanted! This commit: Minor bug fixes from testing Should finalize implementation of enhancement OfflineIMAP#48 OfflineIMAP#48 And fix issues OfflineIMAP#43 and OfflineIMAP#44 OfflineIMAP#43 OfflineIMAP#44 Signed-off-by: Joseph Ishac <jishac@nasa.gov> Tested-by: Joseph Ishac <jishac@nasa.gov>
With the latest version of Offlineimap3 in Debian Testing (offlineimap3 0.0~git20210225.1e7ef9e+dfsg-3 according to https://tracker.debian.org/pkg/offlineimap3) I still encounter the bug. (I re-downloaded from the IMAP server one faulty message and it is still messed up.) Below an extract of the text of one such message (although I do not think this can really help…) with the correct utf-8 encoding. The characters replaced by � are mostly ’ é à è. In mutt they are by default displayed as � because the text is not recognized as utf-8 but as windows-1252, but if I change the encoding I obtain what follows.
Can I do anything to help? |
The stack trace at
is a bit suspicious since it lists
which is the old code base. Is there a chance that both old and new are installed? |
Thanks for spotting that @jishac, I should have checked before posting here. :( |
@Elvith I've had a few messages that were permanently altered by the old code base when uploading a message to the server in this very way (utf-8 encoding when the email states another). Are you able to diagnose the raw message stored on the server (by downloading it out of band for example)? If it is mangled on the server, then a manual correction may be needed. |
Not sure about what you mean. What I did is :
Is there another test I might do? |
If your email provider has a web interface, most of the time you can download a single raw message. Some providers make it hard (Outlook Web Mail comes to mind) but most I've seen have some way of doing it. Then inspect or open that email in your viewer of choice (mutt, Thunderbird, etc). If you do not have a web interface, then you could enable debugging in offlineimap to capture the raw message, but I would not recommend doing that for the entire mailbox as you would generate a lot of output. Basically, by trying to capture what the server has stored as the message we can determine if offlineimap is having an issue or if the problem exists in the source message stored on the server. |
OK. I just looked at the “source” of the faulty message on my webmail interface; it is still scrambled. But if I understand correctly how offlineimap synchronizes the messages on my computer and on the server, this just means that the scrambled version on my computer, scrambled by offlineimap when downloaded, replaced at some point the original version on the server. This does not mean that the original version on the server was also scrambled and therefore this does not mean that offlineimap is not having an issue. Or does it? And if this is correct, then in fact this message does not prove anything, since it is dated from March 22, before the upgrade of offlineimap on my debian… It was scrambled on my computer before the upgrade, the scrambled version was uploaded/synchronized on the server before the upgrade, and therefore the message will remain scrambled, upgrade or not. No upgrade will miraculously “un-scramble” it. |
Correct, no upgrade will miraculously “un-scramble” the original message as the server copy is affected, very likely in exactly the manner you outlined (see note at end). Knowing the old offlineimap behavior, one could try and write a tool to seek out such messages based on a heuristic, and try to correct them. What I did in my case is manually correct any message that was "important".
Correct, seeing the issue on a new message after the upgrade would be a bug, but the new handling should be robust to encoding types so you should not experience any. PS: It can be hard to assign blame for scrambled/manged messages. I have seen scrambled/mangled email that arrived to my inbox. Usually rare for non-spam messages (but possible). That said, if the sever copy of the March 22 message has a "X-OfflineIMAP" header, then offlineimap uploaded that copy and is likely to blame. |
Interesting. No, it does not have that header, but the local copy does not have it as well. I guess I must have parametrized my offlineimap in such a way that this header is not written, or maybe it is the default behavior. Anyway, thank you for your precious input, I will keep an eye on new emails in the forthcoming weeks and I will report here in case of scrambled messages. 👍 |
Hi @Elvith can we close this issue? Thanks a lot. |
Yes, the problem seems solved, I have not had any issue since my previous message. Thank you very much! |
Side note: the "X-OfflineIMAP" header is only set when the server does not support the UID PLUS IMAP extension. This header allows offlineimap to get the assigned UID once the email is uploaded in this case. However, adding a new header on each uploaded email might be useful to track what's was done by offlineimap. This might help a lot to debug issues. |
I am not entirely sure this is indeed an Offlineimap3 bug. But I upgraded to offlineimap3 yesterday evening, and after starting it for the first time this morning, I had to correct my python script that decodes with GPG the files where I store my passwords, and now this script is working but I still have some weird behavior with my emails that did not exist yesterday. Since the python script/GPG thing was clearly related to Offlineimap3, I am guessing that this is not a coincidence and Offlineimap3 is also responsible of the other bug. Moreover, the emails that are messed up on my Debian using offlineimap are not messed up on my phone using K9-Mail, so I’d say it is not the emails themselves that have a problem, but indeed my Debian setup (so, mutt+offlineimap).
The problem is that emails with non-utf-8 charset, for instance iso-8859-15, that have been downloaded by Offlineimap3 (so, downloaded today) are not displayed correctly. This is the case both for text/html attachments (displayed in mutt with w3m) or for text/plain attachments. I have strings like ï¿œ everywhere I should have é, for instance.
Similar emails that were downloaded before the Offlineimap3 upgrade are displayed correctly, so the problem really seems to be with Offlineimap3.
If I run
file
on one of these text attachments, I find that the charset is UTF-8, even if the HTML header says otherwise. For emails downloaded before the Offlineimap3 upgrade, the charset indicated byfile
and that indicated in the HTML header are consistent.If I run
iconv -f UTF-8 -t UTF-8
on one of these text attachments, the ï¿œ is changed into �.So my feeling is that somehow Offlineimap3 writes all emails it is downloading in UTF-8 but does not convert the characters from the old charset to the UTF-8 charset properly. By doing so, characters like é are changed into the special UTF-8 character �, but then mutt and w3m still think that the file is in ISO-8859 or whatever because this is what is written in the HTML header, and are unable to display correctly the text.
I am not a charset expert and I am probably not making any sense, sorry. I will try to answer as precisely as I can any question you might have.
General informations
offlineimap -V
): 7.3.0Configuration file offlineimaprc
pythonfile (if any)
This is the file offlineimap_gpg.py that is referred to in the file .offlineimaprc above
The text was updated successfully, but these errors were encountered: