Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quota -f wrongly removes quota roots from sub mailboxes #2877

Closed
illenseer opened this issue Sep 17, 2019 · 12 comments
Closed

quota -f wrongly removes quota roots from sub mailboxes #2877

illenseer opened this issue Sep 17, 2019 · 12 comments
Assignees
Labels
2.5 affects 2.5 3.0 affects 3.0 3.1 affects 3.1 dev releases

Comments

@illenseer
Copy link

Ubuntu 18.04
Cyrus-IMAP 3.0.11 build from source (replication enabled)
Cyrus SASL 2.1.27~101-g0780600+dfsg-3ubuntu2
Problems also seen on FreeBSD 10.3/11.3 with Cyrus 3.0.11 and 2.5

altnamespace: yes
unixhierarchysep: yes
virtdomains: userid

quota_db: quotalegacy
autocreate_quota: 0

On some mailboxes there is a problem with quota handling, but most of the mailboxes are handled correctly. The problem occurs when running quota -f to fix quotas, but instead it removes some quota roots from sub mailboxes. We set quotas only on top level mailboxes never on sub mailboxes.
This is how the quota should look like:

# sudo -u cyrus /usr/lib/cyrus/bin/quota -d vrmd.de
   Quota   % Used     Used             Resource Root
 2097152        1    26343              STORAGE user/mail@vrmd.de
                       718              MESSAGE user/mail@vrmd.de
                         0 X-ANNOTATION-STORAGE user/mail@vrmd.de
                         5        X-NUM-FOLDERS user/mail@vrmd.de

But after running quota -f, quota roots in sub mailboxes are being removed and in the corresponding cyrus.header files:

# sudo -u cyrus /usr/lib/cyrus/bin/quota -d vrmd.de -f
vrmd.de!user.mail: STORAGE usage was 26975777, now 23530672
vrmd.de!user.mail: MESSAGE usage was 718, now 691
vrmd.de!user.mail: X-NUM-FOLDERS usage was 5, now 1
vrmd.de!user.mail.Drafts: quota root vrmd.de!user.mail --> (none)
vrmd.de!user.mail.Junk: quota root vrmd.de!user.mail --> (none)
vrmd.de!user.mail.Sent: quota root vrmd.de!user.mail --> (none)
vrmd.de!user.mail.Trash: quota root vrmd.de!user.mail --> (none)
   Quota   % Used     Used             Resource Root
 2097152        1    22979              STORAGE user/mail@vrmd.de
                       691              MESSAGE user/mail@vrmd.de
                         0 X-ANNOTATION-STORAGE user/mail@vrmd.de
                         1        X-NUM-FOLDERS user/mail@vrmd.de

Correct Quota for the user can only be restored by removing quota completely for that user from the quota_db and then set the quota again. But another quota -f will remove the quota roots from sub mailboxes again.

@elliefm elliefm added 2.5 affects 2.5 3.0 affects 3.0 labels Sep 18, 2019
@elliefm
Copy link
Contributor

elliefm commented Sep 18, 2019

The problem occurs when running quota -f to fix quotas, but instead it removes some quota roots from sub mailboxes. We set quotas only on top level mailboxes never on sub mailboxes.

If you never set quotas on sub mailboxes, but quota -f removes quota roots from sub mailboxes, then are you saying that quota -f is removing quotaroots that don't exist?

Are you able to verify (before running -f) that the sub mailboxes definitely don't accidentally have a quota root set somehow? (Maybe the user's client is setting them?)

I'm trying to reproduce this (without deliberately creating quotas on the sub mailboxes) but so far haven't been able to...

@illenseer
Copy link
Author

If you never set quotas on sub mailboxes, but quota -f removes quota roots from sub mailboxes, then are you saying that quota -f is removing quotaroots that don't exist?

Are you able to verify (before running -f) that the sub mailboxes definitely don't accidentally have a quota root set somehow? (Maybe the user's client is setting them?)

Okay, what we do is, we set quota on the INBOX and this includes all the sub folders/mailboxes (e.g. Archive, Drafts, Junk, Sent, Trash, …), in the corresponding cyrus.header files of theses mailboxes the quota root is set to the INBOX and this is what we want, one quota for the whole account.

Running quota shows now the right amount of X-NUM-FOLDERS and everything seems fine.

Running quota -f sets now the quota root to (none) in sub folders/mailboxes and which removes the quota root from the cyrus.header files and X-NUM-FOLDERS is reduced to 1.

To be fair, quota -f works on most of our accounts as intended, only on less than 1% of our accounts this problem occurs.

For one account we could fix the problem by moving the user to another Cyrus backend with sync_client and running quota -f on that account is fine now. But we moved recently several Cyrus backends completely with sync_client to new hardware and accounts with this problems seem at first glance correct, but after running quota -f the problem shows again.

It seems there is race condition somewhere in quota -f. I am willing to help, but so far I could not figure out, what the cause is.

@elliefm
Copy link
Contributor

elliefm commented Sep 19, 2019

Okay, what we do is, we set quota on the INBOX and this includes all the sub folders/mailboxes (e.g. Archive, Drafts, Junk, Sent, Trash, …), in the corresponding cyrus.header files of theses mailboxes the quota root is set to the INBOX and this is what we want, one quota for the whole account.

That makes sense, and is pretty much exactly what I started out testing with

Running quota shows now the right amount of X-NUM-FOLDERS and everything seems fine.

Running quota -f sets now the quota root to (none) in sub folders/mailboxes and which removes the quota root from the cyrus.header files and X-NUM-FOLDERS is reduced to 1.

Aaand that's where things get weird!

To be fair, quota -f works on most of our accounts as intended, only on less than 1% of our accounts this problem occurs.

Long shot, but can you think of anything the accounts this affects might have in common? (Though, I assume if you could, you would've already mentioned.) I don't suppose they're listed in admins: in imapd.conf are they? (I don't know if/how quota applies to admin users, but altnamespace is hairy for admins soooo maybe?)

For one account we could fix the problem by moving the user to another Cyrus backend with sync_client and running quota -f on that account is fine now. But we moved recently several Cyrus backends completely with sync_client to new hardware and accounts with this problems seem at first glance correct, but after running quota -f the problem shows again.

So, it sounds like it keeps happening with these accounts, even after they've been fixed once they're liable to get weird again? And accounts that don't have the problem don't seem to get it, either?

It seems there is race condition somewhere in quota -f. I am willing to help, but so far I could not figure out, what the cause is.

Thanks. It's sounding to me like the key here is going to be figuring what weird thing these accounts have in common, and if we can reproduce that we can probably reproduce the problem.

Meanwhile, I'll have another look through the quota tool code and see if any unhandled edge cases stick out from that direction...

@elliefm elliefm self-assigned this Sep 19, 2019
@elliefm
Copy link
Contributor

elliefm commented Sep 20, 2019

Not sure if it's important, but what's your value for improved_mboxlist_sort (imapd.conf)?

@elliefm
Copy link
Contributor

elliefm commented Sep 20, 2019

vrmd.de!user.mail.Drafts: quota root vrmd.de!user.mail --> (none)

So, this output comes from fixquota_fixroot() when its root argument is NULL. It's only called in this way when findroot() can't find its quotaroot (and thus thisquota == -1). So I think it's safe to say this is the code path that is removing the quotaroot from the sub mailbox.

It's very interesting to observe that findroot() does not check all the quota roots, just the ones that it thinks it hasn't done yet. So, maybe we're landing on an edge case where we don't examine the sub mailbox until after we think we've finished processing the inbox's quotaroot, so we ignore that, and then don't find any other suitable quotaroot for the sub mailbox.... I'm trying to tickle this in a test case but so far still haven't managed to.

@elliefm
Copy link
Contributor

elliefm commented Sep 20, 2019

I think I've found it!

If improved_mboxlist_sort is set to 'no', and if you have userids that contain hyphens, and if some other userid is identical to the pre-hyphen part of one of those, they'll be processed in an order like this:

user/foo@example.com
user/foo-bar@example.com
user/foo/Drafts@example.com
]...]

i.e. foo-bar's inbox will be seen before foo's sub mailboxes. So when it does eventually get to user/foo/Drafts and such, user/foo is no longer in consideration as a possible quota root for it... so it finds nothing, and nukes what was there.

This is the exact class of problem that improved_mboxlist_sort is meant to resolve, but it's also a fairly complicated procedure to change it on a live system (so don't just reflexively do that!)

I think I have a fix on the way...

@elliefm
Copy link
Contributor

elliefm commented Sep 20, 2019

I think this commit fixes it, by making an optimisation that can only work with improved_mbox_sort: yes only active when that configuration is present: elliefm@06778af

The optimisation was originally added as 054f7f2 (2.5.0), and its commit message reads:

This is not only more efficient, and actually correct, but
it removes the window in which folders can be created and
not be counted correctly (except in the case of multiple
subroots, where things can happen between them)

So, as a side effect I guess my fix re-opens "the window in which folders can be created and not counted correctly", but I'm not sure if anything can be done about that. It seems to work okay for me in testing. Do you have somewhere to gently test it yourself, and see if it fixes the issue (and doesn't break anything else)?

@elliefm elliefm added the 3.1 affects 3.1 dev releases label Sep 20, 2019
@elliefm
Copy link
Contributor

elliefm commented Sep 20, 2019

For one account we could fix the problem by moving the user to another Cyrus backend with sync_client and running quota -f on that account is fine now. But we moved recently several Cyrus backends completely with sync_client to new hardware and accounts with this problems seem at first glance correct, but after running quota -f the problem shows again.

In hindsight, I guess this worked for that one user because the backend they were moved to didn't have some other very similar userid with a hyphen in a bad place. But when you mass-moved several complete backends, the userid combinations that trigger the bug all got moved together, so it kept happening.

I keep saying "hyphen", but I really mean "any character that is valid in a mailbox name and sorts earlier than . in ASCII order". There might be other characters in this set (I haven't checked) but hyphen in usernames is reasonably common, so at a guess that's probably the culprit here.

@illenseer
Copy link
Author

Okay, I checked and improved_mboxlist_sort: no is set indeed and for all affected mailboxes there are also mailboxes userpart-<something>. I will try to test the fix and report back.

@illenseer
Copy link
Author

I tried the patch (elliefm/cyrus-imapd@06778af) on a test system with production data and it fixed the broken quota roots. This patch seems to fix our problems.
Thank you for helping and fixing the bug.

@elliefm
Copy link
Contributor

elliefm commented Sep 23, 2019

Great, thanks heaps for testing that out! I'll get it merged properly for inclusion in future releases.

elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Sep 23, 2019
When improved_mboxlist_sort is not enabled, we cannot rely on mailbox
iteration order to prove we're finished with any particular quotaroot.

Fixes cyrusimap#2877
@elliefm elliefm reopened this Sep 23, 2019
elliefm added a commit that referenced this issue Sep 23, 2019
When improved_mboxlist_sort is not enabled, we cannot rely on mailbox
iteration order to prove we're finished with any particular quotaroot.

Fixes #2877
@elliefm
Copy link
Contributor

elliefm commented Sep 23, 2019

The fix is now on 2.5 (86407b4), 3.0 (eadda03) and master (17d49eb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.5 affects 2.5 3.0 affects 3.0 3.1 affects 3.1 dev releases
Projects
None yet
Development

No branches or pull requests

2 participants