New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changelog trimming ignores disabled replica-agreement and can erase updates not yet replicated #2472
Comments
Comment from tbordaz (@tbordaz) at 2017-10-24 18:30:25 |
Comment from tbordaz (@tbordaz) at 2017-10-24 18:30:32 Metadata Update from @tbordaz:
|
Comment from mreynolds (@mreynolds389) at 2017-10-26 18:01:15 Metadata Update from @mreynolds389:
|
Comment from lkrispen (@elkris) at 2017-11-14 11:23:32 Metadata Update from @elkris:
|
Comment from lkrispen (@elkris) at 2017-11-14 11:30:00 |
Comment from tbordaz (@tbordaz) at 2017-11-17 14:48:06 The patch is good.
|
Comment from lkrispen (@elkris) at 2017-11-17 15:47:33 I see your point. you want to trim as much as possibe. I'm not sure it is worth it, it will make calculation of the purge ruv more complicated and have no guarantee it will not result in missing csns when the agmt is reenabled later. with the proposed patch, what can happen is, is that the changelog grows until the agmt is really removed, but it can be explained and corrected. But while writing, omething we could consider is to always take the max(consumerRuv, currentRUV - purgedelay) as purge limit. we do purge entrystate and tombstones rigidly, so maybe it should be done for the changelog as well. |
Comment from tbordaz (@tbordaz) at 2017-11-17 16:43:14 I agree. Making it simple will simplify diagnostic. while writing I think of an other corner case that currently prevent to purge that is when a out of sync replica remains in the topology. Nobody can update it, but the RAs to the replica will prevent purging. Again it would rather be the job of monitoring aspect. I am not sure about your algo. We want to purge old updates (<now-purgedelay) and updates that are already known by consumers. Do you want to merge those two conditions in only one ? |
Comment from lkrispen (@elkris) at 2017-11-17 16:49:27 I was probably unclear, I wa sthinking of the "other" purge delay. We have a purge delay defined in the replica used for tombstone purging and entry state purging. It defines a time span how long replication should be able resolve updates. |
Comment from tbordaz (@tbordaz) at 2017-11-17 17:01:02 ah !! yes I agree It leads to incoherency and we need to have both changelog trimming and entry state purging having the same value. I wrote a testcase for it but not created a ticket. |
Comment from firstyear (@Firstyear) at 2017-11-20 10:49:10 For now I think I agree with @elkris that the patch can be simple and we can add more later? because we have added the improvement, then this already improves our server state. :) |
Comment from tbordaz (@tbordaz) at 2017-11-28 08:44:09 Metadata Update from @tbordaz:
|
Comment from lkrispen (@elkris) at 2018-01-11 15:43:49 so do we agree on the simple patch ? |
Comment from lkrispen (@elkris) at 2018-01-11 15:43:52 Metadata Update from @elkris:
|
Comment from tbordaz (@tbordaz) at 2018-01-11 16:28:00 Yes I do. Taking into account all replica agreements will fix the issue. |
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49413
Issue Description
The issue is related to changelog trimming.
The updates are trimmed at the condition they are older than maxage AND have been replicated to all consumers.
The set of consumers taken into consideration is those that have an enabled replica agreement.
So if for any reason a replica agreement is disabled for a short period of time, and trimming thread runs at that time, then latest updates known by the consumer can be trimmed and replication breaks
Package Version and Platform
any version
Steps to reproduce
attached test case
Actual results
Replication breaks
Expected results
Replication should not break
The text was updated successfully, but these errors were encountered: