Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changelog trimming ignores disabled replica-agreement and can erase updates not yet replicated #2472

Open
389-ds-bot opened this issue Sep 13, 2020 · 16 comments
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49413


Issue Description

The issue is related to changelog trimming.
The updates are trimmed at the condition they are older than maxage AND have been replicated to all consumers.
The set of consumers taken into consideration is those that have an enabled replica agreement.

So if for any reason a replica agreement is disabled for a short period of time, and trimming thread runs at that time, then latest updates known by the consumer can be trimmed and replication breaks

Package Version and Platform

any version

Steps to reproduce

attached test case

Actual results

Replication breaks

Expected results

Replication should not break

@389-ds-bot 389-ds-bot added this to the 1.4 backlog milestone Sep 13, 2020
@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-10-24 18:30:25

ticket49413_test.py

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-10-24 18:30:32

Metadata Update from @tbordaz:

  • Custom field component adjusted to None
  • Custom field origin adjusted to None
  • Custom field reviewstatus adjusted to None
  • Custom field type adjusted to None
  • Custom field version adjusted to None

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-10-26 18:01:15

Metadata Update from @mreynolds389:

  • Issue set to the milestone: 1.4 backlog

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2017-11-14 11:23:32

Metadata Update from @elkris:

  • Issue assigned to elkris

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2017-11-14 11:30:00

0001-Ticket-49413-Changelog-trimming-ignores-disabled-rep.patch

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-11-17 14:48:06

The patch is good.
Do you think it worth improving it with additional conditions ?

  • tests of modifytimestamp/createtimestamp ? For example if the agreement is disabled from longer than the changelog purge delay, then it is safe to ignore that replica agreement to compute the ruv to purge.
  • tests of nsds5replicaLastUpdateStart ? For example if the RA have been in constant backoff for longer that changelog purge delay (likely the consumer is unreachable), then it is safe to ignore that RA

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2017-11-17 15:47:33

I see your point. you want to trim as much as possibe.

I'm not sure it is worth it, it will make calculation of the purge ruv more complicated and have no guarantee it will not result in missing csns when the agmt is reenabled later.

with the proposed patch, what can happen is, is that the changelog grows until the agmt is really removed, but it can be explained and corrected.

But while writing, omething we could consider is to always take the max(consumerRuv, currentRUV - purgedelay) as purge limit. we do purge entrystate and tombstones rigidly, so maybe it should be done for the changelog as well.

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-11-17 16:43:14

I agree. Making it simple will simplify diagnostic.
We can transfer the detection of dangling RA to some external tool (it could part of the monitoring aspect)

while writing I think of an other corner case that currently prevent to purge that is when a out of sync replica remains in the topology. Nobody can update it, but the RAs to the replica will prevent purging. Again it would rather be the job of monitoring aspect.

I am not sure about your algo. We want to purge old updates (<now-purgedelay) and updates that are already known by consumers. Do you want to merge those two conditions in only one ?
If consumers/supplier are in sync, the purge limit would purge the most recent updates ?

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2017-11-17 16:49:27

I was probably unclear, I wa sthinking of the "other" purge delay. We have a purge delay defined in the replica used for tombstone purging and entry state purging. It defines a time span how long replication should be able resolve updates.
But then we have in the changelog a separate setting, either maxage or maxentries, which are unrelated to this. My thought was if we should not respect the pirge delay set in the replica

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-11-17 17:01:02

ah !! yes I agree

It leads to incoherency and we need to have both changelog trimming and entry state purging having the same value. I wrote a testcase for it but not created a ticket.

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-11-20 10:49:10

For now I think I agree with @elkris that the patch can be simple and we can add more later? because we have added the improvement, then this already improves our server state. :)

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-11-28 08:44:09

Metadata Update from @tbordaz:

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-01-11 15:43:49

so do we agree on the simple patch ?

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-01-11 15:43:52

Metadata Update from @elkris:

  • Custom field reviewstatus adjusted to ack (was: None)

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2018-01-11 16:28:00

Yes I do. Taking into account all replica agreements will fix the issue.
In addition, if an old replica agreement (that needs to be cleanup) is preventing the trimming, it is fine. It is an administrative task to clear it.

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-01-11 16:58:45

To ssh://pagure.io/389-ds-base.git
339caec..575d9e2 master -> master

To ssh://pagure.io/389-ds-base.git
f20aa78..8a4b26e 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant