Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss due to skipping index records with MODSEQ = 0 #2839

Closed
mdoggydog opened this issue Jul 26, 2019 · 12 comments
Closed

Data loss due to skipping index records with MODSEQ = 0 #2839

mdoggydog opened this issue Jul 26, 2019 · 12 comments
Assignees
Labels

Comments

@mdoggydog
Copy link

@mdoggydog mdoggydog commented Jul 26, 2019

In attempting to upgrade a cyrus-imapd 2.4.17 installation (e.g., Debian jessie) to 3.0.8 (Debian buster, the recent stable release), I noticed messages disappeared from mailboxes. The message files are still in the spool, but are not advertised to IMAP clients and do not appear to utilities like mbexamine. It appears that the common trait of all of these messages is a MODSEQ of zero in their index records.

In #1029 there is evidence that at some point there was some code to try to reinitialize zero MODSEQ's to ones (which apparently never happened to all the messages I am looking at). I didn't see anything in the 2.4.17 code base that would obviously apply this kind of fix. Nonetheless, 2.4.17 (and probably 2.5.x) seems to happily process zero MODSEQ messages.

At some point in the 3.x branch, however, the mailbox_iter abstraction was added, along with this line of code

        if (record->modseq <= iter->changedsince) continue;

which has the effect of ignoring/discarding all MODSEQ = 0 records, even if iter->changedsince itself is set to zero (which seems to be used by callers to indicate "ignore modseq when iterating"). This means that these index records are ignored even by reconstruct.

I suspect that simply changing that one line to

        if ((iter->changedsince > 0) && 
            (record->modseq <= iter->changedsince)) continue;

will solve the problem while retaining the existing meaning for changedsince --- but I don't know enough about the semantics of MODSEQ to know if this will be ok. (And I haven't tried it out yet myself.)

@mdoggydog

This comment has been minimized.

Copy link
Author

@mdoggydog mdoggydog commented Jul 27, 2019

FYI, I have also filed a debian bug report referencing this one:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=933163

@brong brong self-assigned this Jul 29, 2019
@brong brong added 3.0 bug P1 labels Jul 29, 2019
brong added a commit that referenced this issue Jul 29, 2019
brong added a commit that referenced this issue Jul 29, 2019
@brong

This comment has been minimized.

Copy link
Member

@brong brong commented Jul 29, 2019

yes, this is a legit and pretty bad bug - sorry about that! Your fix is fine. I've pushed equivalent code to the master and 3.0 branches.

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Jul 29, 2019

For people who don't have this patch yet, is there something they can invoke to safely initialise affected modseqs to non-zero?

The patch from #1029 looks like it was on the 'reconstruct -V' code path, but it looks like the specific "detect and fix zero" behaviour was lost in the big pre-2.5 refactors. I haven't yet looked into whether the behaviour is still in 2.4

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Jul 29, 2019

Just checking the 2.4 branch and it was the big "The Rest" commit (fdc0eb3) that replaced the original patched modseq handling with refactored code, which precedes gitk tells me precedes 2.4.0. I'm not sure if the re-initialize on upgrade behaviour was necessarily lost at this point, but the comment explaining what it was definitely was...

So I guess anyone who previously upgraded to 2.4 from an earlier version, and now upgrades to 3.0 from there, might potentially trip over this, even if they try to be cautious and reconstruct -V max before upgrading. Uggh

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Jul 29, 2019

There will be a new 3.0 release in the next few days containing this fix, just getting release notes and other relevant documentation in order

@elliefm elliefm added the 3.1 label Jul 29, 2019
@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Jul 30, 2019

Just released 3.0.11 containing this fix :)

@elliefm elliefm closed this Jul 30, 2019
@mdoggydog

This comment has been minimized.

Copy link
Author

@mdoggydog mdoggydog commented Jul 30, 2019

(Thank you for the very speedy validation and fix-up release!)

@mdoggydog

This comment has been minimized.

Copy link
Author

@mdoggydog mdoggydog commented Jul 30, 2019

(Oh, and, well, it looks like the 3.0.11 release notes inadvertently point to #2389 instead of #2839 --- i.e., the 8 and 3 got transposed.)

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Aug 1, 2019

crap, retconning...

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Aug 1, 2019

Fixed in git, website should rebuild on the next hour. Thanks!

elliefm added a commit to elliefm/cassandane that referenced this issue Aug 5, 2019
This introduces a new dependency on IO::File::fcntl, which you can
obtain by installing the IO::File::Lockable CPAN package.

(regression test for cyrusimap/cyrus-imapd#2839)
elliefm added a commit to elliefm/cassandane that referenced this issue Aug 5, 2019
This introduces a new dependency on IO::File::fcntl, which you can
obtain by installing the IO::File::Lockable CPAN package.

(regression test for cyrusimap/cyrus-imapd#2839)
@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Aug 7, 2019

For package maintainers that can't easily bump to a new numbered version, but can apply a patch to what you've currently got, the commit you want is 0284050

@elliefm

This comment has been minimized.

Copy link
Contributor

@elliefm elliefm commented Dec 30, 2019

The same 2839/2389 typo was also in the upgrade documentation, which I've just discovered and corrected, and will be live on the web in a few minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.