Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'utf7' codec can't decode byte 0xd0 in position 0 #299

Closed
gaydenko opened this issue Feb 1, 2016 · 27 comments
Closed

'utf7' codec can't decode byte 0xd0 in position 0 #299

gaydenko opened this issue Feb 1, 2016 · 27 comments

Comments

@gaydenko
Copy link

gaydenko commented Feb 1, 2016

Hi! I know utf7 <-> utf8 folder conversion is experimental, but as far as the feature absent would be an absolute show stopper for many users (and I'm among them), I guess the report is legal :)

So, the first offlineimap run is OK, folders are created, messages are got. The second results in:

⋊> ~ offlineimap
OfflineIMAP 6.6.1
  Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
Account sync me:
 *** Processing account me
 Establishing connection to imap.example.com:993
 ERROR: While attempting to sync account 'me'
  'utf7' codec can't decode byte 0xd0 in position 0: unexpected special character
 *** Finished account 'me' in 0:00
ERROR: Exceptions occurred during the run!
ERROR: While attempting to sync account 'me'
  'utf7' codec can't decode byte 0xd0 in position 0: unexpected special character

Traceback:
  File "/usr/lib/python2.7/site-packages/offlineimap/accounts.py", line 263, in syncrunner
    self.__sync()
  File "/usr/lib/python2.7/site-packages/offlineimap/accounts.py", line 329, in __sync
    remoterepos.sync_folder_structure(localrepos, statusrepos)
  File "/usr/lib/python2.7/site-packages/offlineimap/repository/Base.py", line 230, in sync_folder_structure
    newdst_name = folder.getvisiblename().replace(
  File "/usr/lib/python2.7/site-packages/offlineimap/folder/IMAP.py", line 267, in getvisiblename
    return imaputil.decode_mailbox_name(vname)
  File "/usr/lib/python2.7/site-packages/offlineimap/imaputil.py", line 364, in decode_mailbox_name
    return ret.decode('utf-7').encode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_7.py", line 12, in decode
    return codecs.utf_7_decode(input, errors, True)

Is it possible to workaround the issue?
Also I'm ready to switch to git head would be directed there :)

@gaydenko
Copy link
Author

gaydenko commented Feb 1, 2016

I have also tried to get it manually as described here:
http://piao-tech.blogspot.no/2010/03/get-offlineimap-working-with-non-ascii.html

And have got exactly the same result (the first run is OK, then error).
Unfortunately I'm not pythonist and can not recognize the issue in the utf7 codec.

@gaydenko
Copy link
Author

gaydenko commented Feb 1, 2016

@nicolas33 , I have discovered interesting information while playing with options:

  1. Start the first run - all is OK, folders are created, messages are fetched.
  2. Start the next run - the error shown above.
  3. Add to remote repository config section createfolders = False option
  4. Start the next run - syncing does work!!, changes are fetched.

So the issue probably lays somewhere in nametrans rather in the utf7 codec itself, as I guessed before. That is some code ignores names translation.

I have also tried git master to be sure everything is the same - yes, am sure :)

@gaydenko
Copy link
Author

gaydenko commented Feb 1, 2016

Adding readonly = True option also results in successful (one way) synchronization. So, it seems the code propagating local changes to remote repository doesn't do back encoding, that is from utf8 to utf7.

@nicolas33
Copy link
Member

What's the output of offlineimap --info?

@gaydenko
Copy link
Author

gaydenko commented Feb 1, 2016

⋊> ~ offlineimap --info
OfflineIMAP 6.6.1
  Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
Remote repository 'me-yandex-remote': type 'IMAP'
Host: imap.yandex.ru Port: None SSL: True
Establishing connection to imap.yandex.ru:993
Server supports ID extension.
Server welcome string: * OK Yandex IMAP4rev1 at imap29j.mail.yandex.net:993 ready to talk with ::ffff:some-IP:54140, 2016-Feb-01 19:10:29, TAL8OH3Px8cb
Server capabilities: ('IMAP4REV1', 'CHILDREN', 'UNSELECT', 'LITERAL+', 'NAMESPACE', 'XLIST', 'BINARY', 'UIDPLUS', 'ENABLE', 'ID', 'IDLE', 'MOVE')

Folderlist:
 Archive
 INBOX
 Trash
 &BBgEQQRFBD4ENARPBEkEOAQ1- -> Исходящие
 &BB4EQgQ,BEAEMAQyBDsENQQ9BD0ESwQ1- -> Отправленные
 &BCEEPwQwBDw- -> Спам
 &BCMENAQwBDsENQQ9BD0ESwQ1- -> Удаленные
 &BCcENQRABD0EPgQyBDgEOgQ4- -> Черновики

Local repository 'me-yandex-local': type 'Maildir'
Folderlist:
 Trash
 Отправленные
 Удаленные
 Archive
 INBOX
 Исходящие
 Черновики
 Спам

⋊> ~   

Not sure it is important, but Trash and Удаленные have got the same sense (and both folders are visible via web-interface also).

@nicolas33
Copy link
Member

My best guess is that the replace function in folder.getvisiblename().replace(...) assumes ASCII characters only. This relies on Python and Python2 makes such assumptions. That's one of the reasons why playing with UTF-8 with Python2 is hard.

As you already know, UTF-8 never got a successfull implementation despite serious attempts. I'm not willing to go into this further because I don't want to replay the match, again. IOW, I'm not surprised the same limitations are surfacing in different ways depending on the cases.

I've merge the current implementation because it's small changes in the code. However, it seriously changes the stability of the program. That's why it will likely never be marked stable. The current approach is : if it works for you, you're lucky and you might choose to go for it. If it doesn't, sorry but this feature is known to not work in many cases and it's not available to you. Enabling this feature might be a poor choice in the future with new releases since it's not supported at all.

Anyway, this is still a valid bug report. If one day someone is willing to dig into this, the bug tracker is where to start. Keeping this open.

@gaydenko
Copy link
Author

gaydenko commented Feb 2, 2016

@nicolas33 , in the issue context - must I try https://github.com/OfflineIMAP/imapfw ?

@nicolas33
Copy link
Member

I aim imapfw at replacing OfflineIMAP in the long term. You might like to try it now so you can share your expectations in details but it won't sync anything. It's still work in progress. I want early support for UTF-8, yes.

I did not talk about imapfw because it's not ready yet. Things are going further at the rate of free time and contributions. ,-)

@gaydenko
Copy link
Author

gaydenko commented Feb 2, 2016

I want early support for UTF-8, yes.

So, imapfw is the first world candidate to implement the feature - have added your blog to my rss reader :)

@uliska
Copy link
Contributor

uliska commented Sep 21, 2017

I am also suffering from this and see it as an inacceptable limitation (currently trying to figure out if I can build my email backup/handling infrastructure on offlineimap). I do not want to have folders like Antwort.Sch\xc3\xb6nberg in the system but proper UTF-8 encoded names. However, I may look into the matter if I'd get some guidance and potential feedback.

I've collected somewhat more info. When repeatedly syncing a mailbox it starts well, establishing the connection, then

2017-09-21 12:06:53 DEBUG: [maildir]: _GETFOLDERS_SCANDIR STARTING. root = /path/to/local-mailbox, extension = None
2017-09-21 12:06:53 DEBUG: [maildir]:   toppath = /path/to/local-mailbox

which is followed by a sequence of pairs like

2017-09-21 12:06:53 DEBUG: [maildir]:   dirname = Sent.Queue
2017-09-21 12:06:53 DEBUG: [maildir]:   This is maildir folder 'Sent.Queue'.

going through all the toplevel folders of the mailbox. This may also include special characters:

2017-09-21 12:06:53 DEBUG: [maildir]:   dirname = Entwürfe
2017-09-21 12:06:53 DEBUG: [maildir]:   This is maildir folder 'Entwürfe'.

Then follows the suspicious action

2017-09-21 12:06:53 DEBUG: [maildir]:   dirname = 
2017-09-21 12:06:53 DEBUG: [maildir]: 
    _GETFOLDERS_SCANDIR RETURNING ['Sent.Queue', <...>  'Entw\xc3\xbcrfe' <...>]

(with <...> representing other folders). I'm not sure if the emtpy dirname is already a symptom or if it only represents the end of the list. But obviously the list that is reported here has not been properly encoded or decoded.

The next thing that can be seen is the error

2017-09-21 12:06:53 ERROR: ERROR: While attempting to sync account 'mailUrsliskaDe'
  'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

followed by the actual Python backtrace

2017-09-21 12:06:53 ERROR: ['  File "/usr/share/offlineimap/offlineimap/accounts.py", line 278, 
in syncrunner\n    self.__sync()\n', '  File "/usr/share/offlineimap/offlineimap/accounts.py", line 344,
in __sync\n    remoterepos.sync_folder_structure(localrepos, statusrepos)\n', '  
File "/usr/share/offlineimap/offlineimap/repository/Base.py", line 245, 
in sync_folder_structure\n    tmp_remotefolder = remote_repo.getfolder(remote_name)\n', '  
File "/usr/share/offlineimap/offlineimap/repository/IMAP.py", line 434,
in getfolder\n    return self.getfoldertype()(self.imapserver, foldername, self)\n', ' 
File "/usr/share/offlineimap/offlineimap/folder/IMAP.py", 
line 49, in __init__\n    super(IMAPFolder, self).__init__(name, repository)\n', '  
File "/usr/share/offlineimap/offlineimap/folder/Base.py", line 52, 
in __init__\n    self.visiblename = repository.nametrans(name)\n', '  
File "<string>", line 1, 
in <lambda>\n']

@nicolas33
Copy link
Member

We failed at learning offlineimap to work with UTF-8, sorry.

@uliska
Copy link
Contributor

uliska commented Sep 21, 2017

I found a workaround with the external utf file mentioned in the original post. I will give more details when I have the chance. It looks like it works (for me) now :-)

@uliska
Copy link
Contributor

uliska commented Sep 21, 2017

@gaydenko probably you're not into this topic anymore by now, but I'll post my findings anyway as they may become useful for others stumbling over this.
Actually the problem is described here: http://www.offlineimap.org/doc/nametrans.html

  • When syncing for the first time, the local repository is empty. Then the folders from the IMAP are converted from IMAP_utf-7 to utf-8
  • When syncing for the second time there are utf-7 folders in the remote repository and utf-8 folders locally. So we have to convert the utf-8 folders to utf-7 on the way from local to remote.

What I did is save the code from http://piao-tech.blogspot.de/2010/03/get-offlineimap-working-with-non-ascii.html and saved it as a file which is included via the pythonfile directive.

Then I entered (as directed by that post)

nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')

In the remote part of the configuration.


The trick is to add the reverse operation in the local part of the config

nametrans = lambda foldername: foldername.decode('utf-8').encode('imap4-utf-7')

Basically this can work because "imap4-utf-7" is created as a "real" encoding, so decode and encode can work in both directions.

If I can find where the options are actually used in the now built-in functionality I may try to find a way to integrate it into a pull request.

@nicolas33
Copy link
Member

Would be so great to have a patch for this. Though, there must be no regression. This must be declared experimental and enabled via a configuration option.

Good work, BTW.

@uliska
Copy link
Contributor

uliska commented Sep 22, 2017

I've started with something but need further guidance on the integration.

If you look at the branch on my fork you will see that I integrated the code from this blog post and provided two interface functions to convert foldernames in both directions.

This seems to work properly, and I think it is acceptable to add this file, as it is only imported when requested by the user.

The third commit adds preliminary integration: adding IMAP-utf8 = yes to the local and utf8-IMAP = yes to the remote repository configuration will activate re-encoding in both directions.

However, this integration is far from ideal, as discussed in the commit message. But I didn't see (yet) how to implement the integration as it should be.

@uliska
Copy link
Contributor

uliska commented Sep 22, 2017

OH STOP! Don't look at it yet, I do have to repair an embarrassing mistake I just realized ... (I did everything in the wrong direction (only regarding naming)

@uliska
Copy link
Contributor

uliska commented Sep 22, 2017

OK, I fixed it: The previous comment is still true, but:

@uliska
Copy link
Contributor

uliska commented Sep 22, 2017

Please add new or extend the issues at https://github.com/uliska/offlineimap/issues

@uliska
Copy link
Contributor

uliska commented Sep 22, 2017

I have the impression I'm getting closer :-)
The utf8 branch now provides the following characteristics:

  • Add a config option utf8foldernames on Account level
  • If that is set IMAP folder names are converted to utf-8, and local folder names are converted back to IMAP utf-7. So the user should only specify it in an IMAP->Maildir account.
  • It should not interfere anymore with a nametrans config. Instead the functions will first handle the encoding and then pass the result to a user-provided nametrans function. NOTE: I haven't tested this as I don't use any nametrans functions.

Actually this makes the decodefoldernames repository config obsolete, and it should be discussed whether to remove this. There is some value in keeping it for backward compatibility, but as you have said this is already labeled experimental, so it might by justified to remove the option.

(what is of course still missing before opening a pull request is a documenting entry in the example .conf file).

@nicolas33
Copy link
Member

nicolas33 commented Sep 23, 2017

I have the impression I'm getting closer :-)
The utf8 branch now provides the following characteristics:

Add a config option utf8foldernames on Account level

Ok.

If that is set IMAP folder names are converted to utf-8, and local folder names are converted back to IMAP utf-7. So the user should only specify it in an IMAP->Maildir account.

I wonder what would happen for users (wrongly) enabling utf8foldernames and having IMAP/IMAP setups.

It should not interfere anymore with a nametrans config. Instead the functions will first handle the encoding and then pass the result to a user-provided nametrans function. NOTE: I haven't tested this as I don't use any nametrans functions.

We could call for some other contributors to test this once applied to the 'next' branch.

Actually this makes the decodefoldernames repository config obsolete, and it should be discussed whether to remove this. There is some value in keeping it for backward compatibility, but as you have said this is already labeled experimental, so it might by justified to remove the option.

Yes, I'm fine removing this option. As far as this option can still work when utf8foldernames is disabled I'd rather to keep it for some time.

IOW, I think we should stop the sync of the account when both are enabled for this account.

(what is of course still missing before opening a pull request is a documenting entry in the example .conf file).

Good! Don't forget to add a note that this feature is still not tested with nametrans. Mark the option experimental ("This option is EXPERIMENTAL") and it should all be fine. ,-)

@uliska
Copy link
Contributor

uliska commented Sep 25, 2017

Actually this makes the decodefoldernames repository config obsolete, and it should be discussed whether to remove this. There is some value in keeping it for backward compatibility, but as you have said this is already labeled experimental, so it might by justified to remove the option.

Yes, I'm fine removing this option. As far as this option can still work when utf8foldernames is disabled I'd rather to keep it for some time.

I've now made decodefoldername use the new code (uliska@642071c), basically it has become an alias and it doesn't interfere with the utf8foldernames option. I'll add a deprecation comment in the example conf file for it.

If that is set IMAP folder names are converted to utf-8, and local folder names are converted back to IMAP utf-7. So the user should only specify it in an IMAP->Maildir account.

I wonder what would happen for users (wrongly) enabling utf8foldernames and having IMAP/IMAP setups.

It will probably break things and either produce scrambled folder names on the "local" IMAP repository or (probably) fail with an exception because it can't write to that. This is ugly, but actually you could already produce this behaviour with the existing decodefoldernames option, I think.

I have thought about it and I think the problem should be handled slightly differently than it currently is. Encoding from utf-8 to IMAP utf-7 should not be handled in the Maildir folder class but also in the IMAP folder class. Basically we should always treat folder names as utf-8 and only do the conversions at the border between an IMAP folder and the internal representation.

I will look into this, and somehow I have the feeling that if we find an elegant way to do this supporting utf-8 conversion for IMAP folders could eventually become a default feature.

@nicolas33
Copy link
Member

nicolas33 commented Sep 25, 2017

I've now made decodefoldername use the new code (uliska/offlineimap@642071c), basically it has become an alias and it doesn't interfere with the utf8foldernames option. I'll add a deprecation comment in the example conf file for it.

Why to introduce the utf8foldernames, then?

I'm not sure superseding decodefoldernames is great because both don't imply the same nametrans, AFAICT.

Either we supersede the current node with great release notes or we introduce yet another option. Both is confusing.

EDIT: code*

It will probably break things and either produce scrambled folder names on the "local" IMAP repository or (probably) fail with an exception because it can't write to that. This is ugly, but actually you could already produce this behaviour with the existing decodefoldernames option, I think.

I think you're right. I was thinking that preventing from this incorrect behaviour would not be hard to introduce. ,-)

I have thought about it and I think the problem should be handled slightly differently than it currently is. Encoding from utf-8 to IMAP utf-7 should not be handled in the Maildir folder class but also in the IMAP folder class. Basically we should always treat folder names as utf-8 and only do the conversions at the border between an IMAP folder and the internal representation.

I fully agree.

I will look into this, and somehow I have the feeling that if we find an elegant way to do this supporting utf-8 conversion for IMAP folders could eventually become a default feature.

Sure!

@nicolas33
Copy link
Member

@uliska I think I won't reply in your fork anymore. We've passed the early stages for this feature and I'd rather all the contributors can follow the contributions here (or they won't).

Please, make PR for the next reviews. We don't care much how many PR this will require, so feel free. ,-)

@uliska
Copy link
Contributor

uliska commented Sep 25, 2017

@uliska I think I won't reply in your fork anymore. We've passed the early stages for this feature and I'd rather all the contributors can follow the contributions here (or they won't).

OK. Having issues on my fork was intended as a private todo list anyway.

Please, make PR for the next reviews. We don't care much how many PR this will require, so feel free. ,-)

OK. I'll mark them as WIP to indicate when I don't consider them ready yet.

Why to introduce the utf8foldernames, then?

As the new option is located at the account level it's a "new" one anyway.

I'm not sure superseding decodefoldernames is great because both don't imply the same nametrans, AFAICT.

Going from IMAP to Maildir they imply the same thing. The other direction hadn't been possible yet.

Either we supersede the current node with great release notes or we introduce yet another option. Both is confusing.

Hm, I'm not in the position to argue but I think that's not completely true (because the new option is at a different level). What I think will be necessary is

  • Add/document a new option utf8foldernames that can be set at account level
  • Mark the decodefoldernames IMAP repository option as deprecated.

I don't think this is terribly confusing. At least not more confusing as any added functionality can be.

@nicolas33
Copy link
Member

Hm, I'm not in the position to argue but I think that's not completely true (because the new option is at a different level). What I think will be necessary is

Ok.

Add/document a new option utf8foldernames that can be set at account level
Mark the decodefoldernames IMAP repository option as deprecated.
I don't think this is terribly confusing. At least not more confusing as any added functionality can be.

Ok. What about only renaming utf8foldernames to decodefoldernames? The new code would appear to users as just a move to the upper section. I'm nitpicking, here.

@uliska
Copy link
Contributor

uliska commented Sep 25, 2017

I have no problem with renaming it "back". I would have the impression this is more confusing, but since we are talking about functionality that had already been labeled experimental this will be not an issue, I think.

But what I would say is: if we provide a decodefoldernames option on account level we should definitely remove the old one on repository level. Having two different options with the same name seems pretty strange to me. So adding a newly named one and deprecate the old one might be a smoother transition. Cleaner would probably be the cut to "lift" the "same" function upwards.

@nicolas33
Copy link
Member

Actually, I've had the feeling that your feature was not exactly code refactoring of the decodefoldernames when you first talked about it (outside the upper level move). Next, you almost convinced me this was the case in which case superseding decodefoldernames is correct. Now, I realize that the current decodefoldernames setups would be damaged by the new utf8foldernames feature (see the latest comment in your fork talking about the side effects on nametrans).

Most of the users don't read the changelogs. Hence,

  1. If superseding the current decodefoldernames has impacts on the current setups we should introduce a new configuration option.
  2. Otherwise, the best is to just move the option up to the account section.

I'm fine with whatever changes for the decodefoldernames because it's marked experimental. OTOH, we should really do the changes smoothly and avoid breakages if possible. This would be the best while not mandatory. That's what the experimental flag is about, after all. Though, I know the smooth transition is not hard to support so I'm in favour to avoid breakages in the current setups because of the required changes in nametrans. While at it, we should not change the current decodefoldernames for the same reason (damaging the current setups).

Hope this helps.

uliska added a commit to uliska/offlineimap that referenced this issue Sep 28, 2017
Add code to reencode IMAP folder names to regular utf-8.
This starts an implementation that will add a new config option
`utf8foldernames` on account level which will fix OfflineIMAP#299 and on the
long run replace the current `decodefoldernames` option.

This commit introduces code to register an `imap4_utf_7` codec
on which two-way conversion methods will later be built.

Signed-off-by: Urs Liska <git@ursliska.de>
uliska added a commit to uliska/offlineimap that referenced this issue Oct 1, 2017
Add code to reencode IMAP folder names to regular utf-8.
This starts an implementation that will add a new config option
`utf8foldernames` on account level which will fix OfflineIMAP#299 and on the
long run replace the current `decodefoldernames` option.

This commit introduces code to register an `imap4_utf_7` codec
on which two-way conversion methods will later be built.

Original code by
(https://www.blogger.com/profile/16648963337079496096),
taken from
http://piao-tech.blogspot.no/2010/03/get-offlineimap-working-with-non-ascii.html

In the comment
http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html?showComment=1316041409339#c669880170006851138
indicates that this code is expected to be incorporated into offlineIMAP and therefore the author implicitly agrees to put it under this license.

Signed-off-by: Urs Liska <git@ursliska.de>
(cherry picked from commit 8691dd5)
nicolas33 pushed a commit that referenced this issue Oct 2, 2017
Add code to reencode IMAP folder names to regular utf-8.
This starts an implementation that will add a new config option
`utf8foldernames` on account level which will fix #299 and on the
long run replace the current `decodefoldernames` option.

This commit introduces code to register an `imap4_utf_7` codec
on which two-way conversion methods will later be built.

Original code by
(https://www.blogger.com/profile/16648963337079496096),
taken from
http://piao-tech.blogspot.no/2010/03/get-offlineimap-working-with-non-ascii.html

In the comment
http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html?showComment=1316041409339#c669880170006851138
indicates that this code is expected to be incorporated into offlineIMAP and therefore the author implicitly agrees to put it under this license.

Signed-off-by: Urs Liska <git@ursliska.de>
Signed-off-by: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
michaelcoyote pushed a commit to michaelcoyote/offlineimap that referenced this issue Jan 15, 2018
Add code to reencode IMAP folder names to regular utf-8.
This starts an implementation that will add a new config option
`utf8foldernames` on account level which will fix OfflineIMAP#299 and on the
long run replace the current `decodefoldernames` option.

This commit introduces code to register an `imap4_utf_7` codec
on which two-way conversion methods will later be built.

Original code by
(https://www.blogger.com/profile/16648963337079496096),
taken from
http://piao-tech.blogspot.no/2010/03/get-offlineimap-working-with-non-ascii.html

In the comment
http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html?showComment=1316041409339#c669880170006851138
indicates that this code is expected to be incorporated into offlineIMAP and therefore the author implicitly agrees to put it under this license.

Signed-off-by: Urs Liska <git@ursliska.de>
Signed-off-by: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants