Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash when uploading locally created no-utf-8 encoded message #43

Closed
sudipm-mukherjee opened this issue Feb 1, 2021 · 11 comments
Closed

Comments

@sudipm-mukherjee
Copy link
Contributor

General informations

  • system/distribution (with version): Debian
  • offlineimap version (offlineimap -V): 0.0~git20210105.00d395b+dfsg-2
  • Debian bug: https://bugs.debian.org/981485

Logs, error

OfflineIMAP 7.3.0
  Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
imaplib2 v3.05, Python v3.9.1+, OpenSSL 1.1.1i  8 Dec 2020
Account sync foo:
 *** Processing account foo
 Establishing connection to mail.foo.com:993 (fooRemote)
Folder INBOX [acc: foo]:
 Syncing INBOX: IMAP -> Maildir
 Copy message UID -2 (1/2) fooLocal:INBOX -> fooRemote:INBOX
 ERROR: Copying message -2 [acc: foo]
  'utf-8' codec can't decode byte 0xfc in position 317: invalid start byte
 ERROR: while syncing INBOX [account foo]
  'utf-8' codec can't decode byte 0xfc in position 317: invalid start byte
 ERROR: ERROR in syncfolder for foo folder INBOX: Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/accounts.py", line 666, in syncfolder
    localfolder.syncmessagesto(remotefolder, statusfolder)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 1186, in syncmessagesto
    action(dstfolder, statusfolder)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 1013, in __syncmessagesto_copy
    self.copymessageto(uid, dstfolder, statusfolder, register=0)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 902, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/Maildir.py", line 262, in getmessage
    retval = file.read()
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 317: invalid start byte

  'utf-8' codec can't decode byte 0xfc in position 317: invalid start byte
Account sync foo:
 *** Finished account 'foo' in 0:00
ERROR: Exceptions occurred during the run!
ERROR: Copying message -2 [acc: foo]
  'utf-8' codec can't decode byte 0xfc in position 317: invalid start byte

Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 902, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/Maildir.py", line 262, in getmessage
    retval = file.read()
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

Steps to reproduce the error

Details at the Debian bug report. But a summary is:

Compose a message in mutt containing German umlauts. 
system uses utf-8 encoding, and mutt has a send_charset
option that defaults to `us-ascii:iso-8859-1:utf-8` and thus the
message was encoded as "Content-Type: text/plain; charset=iso-8859-1".
the error is seen while offlineimap tries to upload the sent message to IMAP server
@sudipm-mukherjee
Copy link
Contributor Author

Another user reported the same issue at https://bugs.debian.org/981685

@ahf
Copy link

ahf commented Feb 3, 2021

I'm affected by this as well. I'm using Danish characters in my email signature and have set send_charset = "us-ascii:iso-8859-1:utf-8" in my muttrc. This allows mutt to recode my emails down to the first character set that works, which in most cases is ISO-8859-1, which offlineimap3 seems unhappy about.

Exciting with the Python 3 version of OfflineIMAP!

@jishac
Copy link
Contributor

jishac commented Feb 3, 2021

Like @ahf , I have noticed this on sent mail using mutt as well.

Furthermore, the patch listed at Debian bug 981485 will result in issues when the message is synced to an IMAP server since the encoding is hard coded to utf-8 and will result in a discrepancy between the content type listed in the email header and the actual encoding. In other words the email will be encoded for 'utf-8' but say it's encoded as 'iso-8859-1' resulting in mangled text when viewed in an email client.

So a proper fix would either need to mangle the original message to change the encoding type, or the code will need to factor in and store the encoding so that it can be properly encoded/decoded at various points throughout the software.

A work around in the interim is to set send_charset = "us-ascii:utf-8" and avoid using other charsets like 'iso-8859-1'. The change to mutt to fix this offlineimap bug is not ideal but will sidestep the issue of composing messages in mutt for the time being at the cost of a few extra bytes here and there.

@jishac
Copy link
Contributor

jishac commented Feb 3, 2021

The hard coded use of utf-8 is likely the cause of bug #44 as well

@Elvith
Copy link

Elvith commented Feb 4, 2021

I am affected by this bug as well (the change of encoding from utf-8 to iso-8859-1 is not due to German umlauts in my case, but to the “&” character).

Being the user who reported bug #44, I think that the two bugs are probably related indeed.

@thekix
Copy link
Contributor

thekix commented Feb 19, 2021

Hello,

this bug is very interesting, but IMO, it is hard to solve it in the right way :-)

I will try to explain it:

    # Interface from BaseFolder
    def getmessage(self, uid):
        """Return the content of the message."""

        filename = self.messagelist[uid]['filename']
        filepath = os.path.join(self.getfullname(), filename)
        file = open(filepath, 'rt')
        retval = file.read()
        file.close()
        # TODO: WHY are we replacing \r\n with \n here? And why do we
        #      read it as text?
        return retval.replace("\r\n", "\n")

This function reads the message as text, (rt), then replace the carriage return). Probably this is not the right way to do it, and we should simply read the message as binary (take a look in the rb, something like:

    # Interface from BaseFolder
    def getmessage(self, uid):
        """Return the content of the message."""

        filename = self.messagelist[uid]['filename']
        filepath = os.path.join(self.getfullname(), filename)
        file = open(filepath, 'rb')
        retval = file.read()
        file.close()
        return retval

The problem is we need make more changes in other parts. We need check the header to read some values.

IMO, there are three options to solve the problem:

  1. Deep analysis of this code and rewrite some functions. Read it as binary,...
  2. Include a new option in the configuration file to specify the charset (like mutt)
  3. Try to detect the charset

I will try with the last option, because it is backward compatible with offlineimap2 and it is faster than option 1 (I am very very busy these days)

Regards,
kix

PS. I won't close this bug, because I will try to check the option 1.
PS2. Please, the new patch includes a new library, chardet (python3-chardet) in Debian. @sudipm-mukherjee please, check the Depends
PS3. If the patch is working for you, please, add an smile or something to this post (as feedback). Thanks.

thekix added a commit to thekix/offlineimap3 that referenced this issue Feb 19, 2021
This patch includes charset detection to read the message.

This patch is related to issue OfflineIMAP#43

Signed-off-by: Rodolfo García Peñas (kix) <kix@kix.es>
@thekix thekix mentioned this issue Feb 19, 2021
5 tasks
@jishac
Copy link
Contributor

jishac commented Feb 19, 2021

this bug is very interesting, but IMO, it is hard to solve it in the right way :-)

IMO, there are three options to solve the problem:

1. Deep analysis of this code and rewrite some functions. Read it as binary,...

2. Include a new option in the configuration file to specify the charset (like mutt)

3. Try to detect the charset

I will try with the last option, because it is backward compatible with offlineimap2 and it is faster than option 1 (I am very very busy these days)

Regards,
kix

PS. I won't close this bug, because I will try to check the option 1.
PS2. Please, the new patch includes a new library, chardet (python3-chardet) in Debian. @sudipm-mukherjee please, check the Depends
PS3. If the patch is working for you, please, add an smile or something to this post (as feedback). Thanks.

@thekix - I have made significant progress with option 1 with #48, if you have time to give it a look.... I can create a pull request if desired as I am just getting to testing the changes.

The problem with option 3 is that it won't work given that messages can contain multiple encodings and simply detecting one the encoding doesn't save you when you go to write it back to the server. I will explain further in #53 comments.

@thekix
Copy link
Contributor

thekix commented Feb 19, 2021

Hi @jishac

Of course, IMO the option 1 is the best. I was checking your repo/patch, amazing!

Some comments:

Please double check the syntax, for example, some spaces here:

        msg.add_header(headername,headervalue)
        return msg.get_all(headername,[])

I think you are replacing the function "get_message_date()" (file emailutil):

                - message_timestamp = emailutil.get_message_date(content, 'Date')
                + message_timestamp = self.get_message_date(msg, 'Date')

IMO is better change it in the same file. Take a look that you are changing all calls:

kix@inle:~/src/offlineimap3$ rgrep get_message_date * | grep -v binar
offlineimap/emailutil.py:def get_message_date(content, header='Date'):
offlineimap/folder/IMAP.py:            rtime = emailutil.get_message_date(content)
offlineimap/folder/Maildir.py:                message_timestamp = emailutil.get_message_date(content, 'Date')
offlineimap/folder/Maildir.py:                    message_timestamp = emailutil.get_message_date(
offlineimap/folder/Maildir.py:                datestr = emailutil.get_message_date(content)
offlineimap/folder/Maildir.py:                date = emailutil.get_message_date(content, 'Date')
offlineimap/folder/Maildir.py:                datestr = emailutil.get_message_date(content)
kix@inle:~/src/offlineimap3$ 

Please, could you create a new pull request with these changes and with the current offlineimap status? (remove my stuff and include your code).

Again, thanks a lot for your amazing work!!

Best regards,
kix

jishac added a commit to jishac/offlineimap3 that referenced this issue Feb 24, 2021
jishac added a commit to jishac/offlineimap3 that referenced this issue Feb 24, 2021
…s well and I reviewed the code several times. However, I cannot test it, testers wanted!

This commit: Minor bug fixes from testing

Should finalize implementation of enhancement OfflineIMAP#48
OfflineIMAP#48

And fix issues OfflineIMAP#43 and OfflineIMAP#44
OfflineIMAP#43
OfflineIMAP#44

Signed-off-by: Joseph Ishac <jishac@nasa.gov>
Tested-by: Joseph Ishac <jishac@nasa.gov>
@thekix
Copy link
Contributor

thekix commented Aug 7, 2021

Hello @sudipm-mukherjee

probably we can close this bug. Is it ok?

Regards!
kix

@sudipm-mukherjee
Copy link
Contributor Author

@thekix yes, sorry I forgot to close it after packaging the fix in Debian.

@thekix
Copy link
Contributor

thekix commented Aug 7, 2021

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants