Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'utf8foldernames' account option #498

Closed
wants to merge 8 commits into from

Conversation

uliska
Copy link
Contributor

@uliska uliska commented Sep 28, 2017

This pull request adds an option utf8foldernames that can be applied on account level. If enabled folder names are immediately converted from IMAP4_utf-7 to utf-8 when fetched from an IMAP server and reencoded to utf-7 immediately before contacting the IMAP server again. This means that all operations performed during the synchronization and all values passed to config options can be submitted as utf-8 strings.

More details can be found in the commit messages and in the documentation added in 83490c6.

Enabling utf-8 support will fundamentally break an existing synchronization pair, so after some discussion and consideration I realized that there is no need trying to avoid breaks of individual functions. Users who want to enable utf-8 will have to create a new copy of the local repository and review any configuration options they have specified. This is clearly documented.

More details on the state of development and testing at the end.

Peer reviews

Trick to fetch the pull request:
there is a (read-only) refs/pull/ namespace.

git fetch OFFICIAL_REPOSITORY_NAME pull/PULL_ID/head:LOCAL_BRANCH_NAME

This PR

  • I've read the DCO.
  • I've read the Coding Guidelines
  • The relevant informations about the changes stands in the commit message, not here in the message of the pull request.
  • Code changes follow the style of the files they change.
  • Code is tested (provide details). See below

References

Additional information

I have tested the functionality against web.de, a public freemail service, and against accounts on a Dovecot installation on my own remote server.

I did not try to test any possible combination of server settings and did not perform any stress testing against configuration options. I just tested that it "works for me", i.e. on the accounts I already have.

I have tested the following configuration options with utf8foldernames:

  • decodefoldernames
    The old option does not work together with the new, so having both is considered a misconfiguration.
    In this case the affected account will not be synchonized (with an error message), other accounts are not affected.
  • folderfilter
  • folderincludes
  • foldersort

I have also tested synchronization between two IMAP accounts. While in this case it may seem unnecessary to convert to utf-8 and then back to utf-7 it may make sense because it allows to specify filters and functions with utf-8 values.

I have not tested the following configurations:

  • nametrans
    as I didn't figure out a reasonable way to use nametrans in the first place.
    However, apart from having passed the utf-8 encoded folder name to nametrans there should be no differences.
  • idlefolders
    I didn't manage to get this option to work at all, not even with the stable version without the new option. So I couldn't test any issues with this.
  • GMail
    I didn't manage to set up a GMail synchronization so I didn't test the new option with that.
    However, as GMail inherits from the IMAP classes the changes do have effects on that. I assume that GMail suffered the same limitations as plain IMAP, so I assume that it will experience the same benefits now, but I strongly suggest that this be tested separately.


# Functionality to convert folder names encoded in IMAP_utf_7 to utf_8.
# This is achieved by defining 'imap4_utf_7' as a proper encoding scheme.
# Original code by (https://www.blogger.com/profile/16648963337079496096), from
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this comment got lost during reshaping of the branch:
http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html?showComment=1316041409339#c669880170006851138
indicates to me that the original author of this code expects it to be incorporated into offlineIMAP and therefore implicitly agrees to put it under its license.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this credit to the commit message instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require rebasing and thus changing the existing pushed commits. What I can do is adding another commit which removes the credit from the file and adds URL, reference to the author and mention of the license question in the commit message.
Should I do that @nicolas33? That's not perfectly clean either.

Copy link
Member

@nicolas33 nicolas33 Oct 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find the ancestor of your first commit of the branch and rebase on top of this. It's likely 1ce596d.

You might like to tag this ancestor to have a pointer:

git tag utf8-ancestor 1ce596d7135e

This way it's possible to rebase as many times as needed to improve the patches (or squash already existing patches) with

git rebase -i utf8-ancestor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct ancestor of this branch is v7.1.2. So, all the rebases can be done against tag v7.1.2.

Copy link
Contributor Author

@uliska uliska Oct 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.
My problem is not how to rebase. If I'm going to rebase I can as well do that on top of next. The problem is/was the question whether rebasing will destroy the review history - and if so if this is a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An compromise would be to finish the review until you are happy with it, and then I can rebase the changes in a new branch and push that as an alternative PR. So we'll have a clean history and you still have the original branch as a reference if anything should be unclear.

Copy link
Member

@nicolas33 nicolas33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR! Some comments to fix but the most important changes looks OK to me.

@@ -273,6 +273,17 @@ def syncrunner(self):
if e.severity >= OfflineImapError.ERROR.CRITICAL:
raise
return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. git diff --check didn't point this out.

@@ -273,6 +273,17 @@ def syncrunner(self):
if e.severity >= OfflineImapError.ERROR.CRITICAL:
raise
return

if self.utf_8_support and self.remoterepos.getdecodefoldernames():
import sys, logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports to remove.

"\nAccount setting 'utf8foldernames' and repository " +
"setting 'decodefoldernames'\nmay not be used at the " +
"same time. This account has not been synchronized.\n" +
"Please check the configuration and documentation.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use self.ui.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(instead of logging.error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure self.ui is the correct way to go because this likely won't set the correct exit code.
I think it's better to raise OfflineImapError, level REPO; See offlineimap/error.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's correctly done, but b70f053 tries to fix the issue.

offlineimap.conf Outdated
# - no support is provided.
#
# This feature is EXPERIMENTAL and was merged because it's small changes in the
# code. However, this might seriously decrease the stability of the program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove "and was merged because...". This was true for decodefoldernames that I never intented to mark stable.

However, this news feature is the correct way to implement encodings and if this proves to be stable I do intend to mark it as TESTING and finally STABLE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for the next sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(below)

offlineimap.conf Outdated
# it's not supported at all and new releases might break the setup.
#
# IMPORTANT: READ THIS SECTION if you intend to enable this feature for an
# EXISTING SYNCHRONIZATION pair!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pair" is not a term we use in offlineimap. Do you mean "account"?

# In any case the given name is first dequoted.
name = imaputil.IMAP_utf8(imaputil.dequote(name)) \
if decode and repository.account.utf_8_support \
else imaputil.dequote(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoir multi-lines statements.

if ... :
    ....
else:
    ....

except imapobj.readonly:
imapobj.select(self.getfullname(), readonly = True, force = force)
imapobj.select(self.getfullIMAPname(), readonly = True, force = force)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the unexpected whitespaces while at it.


# Functionality to convert folder names encoded in IMAP_utf_7 to utf_8.
# This is achieved by defining 'imap4_utf_7' as a proper encoding scheme.
# Original code by (https://www.blogger.com/profile/16648963337079496096), from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this credit to the commit message instead.

@nicolas33
Copy link
Member

And please, rebase your work on top of the current next branch. There should be no conflicts.

@nicolas33
Copy link
Member

I've tested this branch today. It appears that offlineimap is syncing both UTF-7 & UTF-8:

Folder INBOX/sp&AOk-cial [acc: z]:
 Syncing INBOX/sp&AOk-cial: IMAP -> Maildir
Folder INBOX/spécial [acc: z]:
 Syncing INBOX/spécial: IMAP -> Maildir
[Account z]
localrepository = Localz
remoterepository = Remotez
utf8foldernames = yes
autorefresh = 0.1
#postsynchook = ./test.sh
#maxage = 2015-01-01
maxconnections = 1

[Repository Localz]
type = Maildir
localfolders = ~/Mail/.nicolas.XXX
filename_use_mail_timestamp = yes
sep = .
#folderfilter = lambda f: f in ['INBOX.sp&AOk-cial']

[Repository Remotez]
type = IMAP
#ssl = %(default_ssl)s
ssl = yes
cert_fingerprint = XXX
remotehost = imap.laposte.net
remoteuser = nicolas.XXX
foldersort = %(INBOX_FIRST)s
remotepasseval = passwd()
#newmail_hook = lambda: os.system("notify-send plop")
#copy_ignore_eval = lambda x: {'INBOX/sp&AOk-cial': range(2422, 5500)}.get(x)
retrycount = 3

@nicolas33
Copy link
Member

nicolas33 commented Sep 30, 2017

My bad. It appeared that the server had a folder called 'INBOX/sp&AOk-cial' (remainder from some old tests). Once removed it works great.

@uliska
Copy link
Contributor Author

uliska commented Sep 30, 2017

LOL.
I had a similar issue when testing earlier and thought new local folders weren't synchronized to the remote server. I could see them in the server's file system but not in the mail client, so I thought the folder with special characters was somehow written in a wrong manner.

Eventually I realized that Thunderbird doesn't automatically update the folder structure for an IMAP account.

uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Better error message handling when utf8foldernames *and*
decodefoldernames has been configured both in one account.

Addresses OfflineIMAP#498 (comment), but I'm not sure if it is really correct.

Signed-off-by: Urs Liska <git@ursliska.de>
@uliska
Copy link
Contributor Author

uliska commented Oct 1, 2017

OK, one request is still open (have asked for clarification), and another one may not correctly been fixed, but apart from that I think I'm through with your requests, @nicolas33.

Copy link
Member

@nicolas33 nicolas33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 6a0b5587ea7 should be improved and squashed into bf59b2601a9 with 4a462cc92.
  • d402e7ca2 and 9f81ae8a4 should be squashed into 7c32ffc05a
  • 18bdaa6ec2 should be squashed into c1c967ec8e

@@ -69,6 +69,8 @@ def __init__(self, config, name):
self.name = name
self.metadatadir = config.getmetadatadir()
self.localeval = config.getlocaleval()
# Store utf-8 support as a property of Account object
self.utf_8_support = self.getconfboolean('utf8foldernames', False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

"'%s'. "% self.getname())
# Abort *this* account without doing any changes.
# Other accounts are not affected.
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise OfflineImapError(...) instead of ui.error() + return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If I do that then the program will terminate immediately like this (with two accounts, one of them being "misconfigured"):

$ offlineimap
OfflineIMAP 7.1.2
  Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
imaplib2 v2.57 (bundled), Python v2.7.13, OpenSSL 1.1.0f  25 May 2017
Account sync git@ursliska.de:
 *** Processing account git@ursliska.de
Thread 'Account sync mail@ursliska.de' terminated with exception:
Traceback (most recent call last):
  File "/home/urs/git/software/offlineimap/offlineimap/threadutil.py", line 160, in run
    Thread.run(self)
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/urs/git/software/offlineimap/offlineimap/accounts.py", line 285, in syncrunner
    OfflineImapError.ERROR.REPO)
OfflineImapError: Configuration mismatch in account 'mail@ursliska.de'. 
Account setting 'utf8foldernames' and repository setting 'decodefoldernames'
may not be used at the same time. This account has not been synchronized.
Please check the configuration and documentation.


Last 3 debug messages logged for Account sync mail@ursliska.de prior to exception:
thread: Register new thread 'Account sync mail@ursliska.de' (account 'mail@ursliska.de')
imap: Using authentication mechanisms ['GSSAPI', 'XOAUTH2', 'CRAM-MD5', 'PLAIN', 'LOGIN']
maildir: MaildirRepository initialized, sep is '.'

Any later accounts won't be started at all.
The code in the commit will properly sync the other account:

$ offlineimap
OfflineIMAP 7.1.2
  Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
imaplib2 v2.57 (bundled), Python v2.7.13, OpenSSL 1.1.0f  25 May 2017
Account sync mail@ursliska.de:
 ERROR: Configuration mismatch in account 'mail@ursliska.de'. 
  Configuration mismatch in account 'mail@ursliska.de'. 
Account setting 'utf8foldernames' and repository setting 'decodefoldernames'
may not be used at the same time. This account has not been synchronized.
Please check the configuration and documentation.
Account sync git@ursliska.de:
 *** Processing account git@ursliska.de
 Establishing connection to XXX:993 (git@ursliska.de.Remote)
Folder Archives [acc: git@ursliska.de]:

...

Folder Trash [acc: git@ursliska.de]:
 Syncing Trash: IMAP -> Maildir
Account sync git@ursliska.de:
 *** Finished account 'git@ursliska.de' in 0:08
ERROR: Exceptions occurred during the run!
ERROR: Configuration mismatch in account 'mail@ursliska.de'. 
  Configuration mismatch in account 'mail@ursliska.de'. 
Account setting 'utf8foldernames' and repository setting 'decodefoldernames'
may not be used at the same time. This account has not been synchronized.
Please check the configuration and documentation.

I'm still not sure if what I did is the best way, but directly raising that error clearly gives undesirable results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we'll fix this later.

imapobj.select(self.getfullIMAPname(), readonly=True, force=force)


def getfullIMAPname(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many blank lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, see https://github.com/uliska/offlineimap/blame/b70f053a2fb8813fdd79475e11dd7d5bb9bf33f4/offlineimap/folder/IMAP.py#L78. And there's a number of further instances of this. (But of course I'll fix the instances in this file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right there is a lot of unexpected in the sources.

@nicolas33
Copy link
Member

Here are my latest comments. We're getting pretty close! ,-)

@nicolas33
Copy link
Member

Adding a mention @uliska for above comments.

uliska added a commit to uliska/offlineimap that referenced this pull request Oct 1, 2017
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
@nicolas33
Copy link
Member

@uliska
This PR looks OK to me. Feel free to rebase. ,-)

An compromise would be to finish the review until you are happy with it, and then I can rebase the changes in a new branch and push that as an alternative PR. So we'll have a clean history and you still have the original branch as a reference if anything should be unclear.

Ok, I didn't get your initial post. I have PR from people with all kind of knowledge, some not even aware of 'git rebase'. Sorry.

To anwser your question, I don't think any rebase will clear the history. Github might be changing without warnings, though. I have all your successive PR locally. I'm curious what would happen if you "erase" the current PR with the rebased branch. Please do!

Add code to reencode IMAP folder names to regular utf-8.
This starts an implementation that will add a new config option
`utf8foldernames` on account level which will fix OfflineIMAP#299 and on the
long run replace the current `decodefoldernames` option.

This commit introduces code to register an `imap4_utf_7` codec
on which two-way conversion methods will later be built.

Original code by
(https://www.blogger.com/profile/16648963337079496096),
taken from
http://piao-tech.blogspot.no/2010/03/get-offlineimap-working-with-non-ascii.html

In the comment
http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html?showComment=1316041409339#c669880170006851138
indicates that this code is expected to be incorporated into offlineIMAP and therefore the author implicitly agrees to put it under this license.

Signed-off-by: Urs Liska <git@ursliska.de>
(cherry picked from commit 8691dd5)
This commit adds two functions
- imaputil.IMAP_utf8()
- imaputil.utf8_IMAP()
as an interface to the new imap4_utf_7 codec.

Signed-off-by: Urs Liska <git@ursliska.de>
(cherry picked from commit 44ce947)
While intending *not* to change the behaviour of the existing
decodefoldernames option this commit transparently improves
the coding.
So far this worked by overriding the folder's getvisiblename() method
which reads self.visiblename from and applies the conversion on
*every* invocation of getvisiblename().
This commit does the calculation once in the IMAPFolder's __init__.

Signed-off-by: Urs Liska <git@ursliska.de>
(cherry picked from commit 662c7d0)
Signed-off-by: Urs Liska <git@ursliska.de>
(cherry picked from commit 829c7ec)
If utf8foldernames is enabled on account level all folder names read
from the IMAP server will immediately be reencoded to UTF-8. Names
will be treated as UTF-8 as long as the IMAP server isn't contacted again,
for which they are reencoded to IMAP4-UTF-7.

This means that any further processing such as nametrans, folderfilter
etc. will act upon the UTF-8 names, which will have to be documented
carefully.

NOTE 1:
GMail repositories and folders inherit from the IMAP... classes, so I don't
know yet if these changes have ugly side-effects. But web research suggests
that GMail IMAP folders are equally encoded in UTF-7 so that should work
identically here and incorporate the same improvements.

NOTE 2:
I could not test the behaviour with idlefolders as I didn't get this option
to work at all, not even with the latest stable version.

NOTE 3:
I *did* test to sync an IMAP repository against another IMAP repository.

Signed-off-by: Urs Liska <git@ursliska.de>
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
The new 'utf8foldernames' will not work together with the existing
'decodefoldernames' option (which will be documented in the next
commit). Therefore this commit will check for this condition and
abort the synchronization of a misconfigured account before doing
any changes.
Other accounts are not affected.

Signed-off-by: Urs Liska <git@ursliska.de>
- Document the new utf8foldernames config option
- Deprecate the old decodefoldernames option
  Update its documentation, discussing the limitations.

Signed-off-by: Urs Liska <git@ursliska.de>
nicolas33 pushed a commit that referenced this pull request Oct 2, 2017
Addresses #498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
Signed-off-by: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
@nicolas33
Copy link
Member

Hmm, what sucks most is that github won't notificate about the update of the branch.

Anyway, applied. Thank you very much!

@nicolas33 nicolas33 closed this Oct 2, 2017
@uliska
Copy link
Contributor Author

uliska commented Oct 2, 2017

Indeed, the "history" shown above seems to be a mix of existing and overwritten commits. My impression is that - in order to keep review items - they keep copies of commits even when they don't actually exist anymore. Whether that's actually true or not, I'd say this gives huge potential for confusion and follow-up errors through people responding wrongly to what they think they see.

@nicolas33
Copy link
Member

Yes, I think you're right.
I'd say Github is buggy.

michaelcoyote pushed a commit to michaelcoyote/offlineimap that referenced this pull request Jan 15, 2018
Addresses OfflineIMAP#498 (comment)

Signed-off-by: Urs Liska <git@ursliska.de>
Signed-off-by: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants