New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not treat missing csn as fatal #2079
Comments
Comment from mreynolds (@mreynolds389) at 2016-11-03 19:18:15 Just sharing some recent information: Redhat IT just ran into this in a testing environment with DS 10.0. A csn failed to be committed to the changelog (deadlock retry errors). But this sent several agreements into a stop-fatal state (from which there is no return). The CSN was committed to the changelog one second later, but the agreements were already halted. Restarting the server fixed the issue (I'm assuming disabling/enabling the agreements would have worked too). |
Comment from nhosoi (@nhosoi) at 2016-11-04 00:17:56 Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1391700 |
Comment from nhosoi (@nhosoi) at 2016-11-04 00:20:09 Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1391701 |
Comment from lkrispen (@elkris) at 2016-11-04 18:35:50 attachment |
Comment from lkrispen (@elkris) at 2016-11-04 18:38:09 the attached part 1 removes the automatic selection of an alternative csn and goes into backoff instead of fatal The second part would have to check the "enforce" attr and the uses a next best csn |
Comment from lkrispen (@elkris) at 2016-11-17 20:33:56 attachment |
Comment from lkrispen (@elkris) at 2016-11-17 20:35:13 attachment |
Comment from lkrispen (@elkris) at 2016-11-17 20:39:00 there are two new attached patches the second on is a consolidated version of part1+part2 and should be used in reviews |
Comment from nhosoi (@nhosoi) at 2016-11-17 23:52:00 Looks good to me. |
Comment from firstyear (@Firstyear) at 2016-11-18 04:42:36 Thanks Ludwig, I'm happy with the change you made for my suggestion! |
Comment from mreynolds (@mreynolds389) at 2016-12-22 20:54:25 We have a regression with cleanallruv. This was found by running ds/dirsrvtests/tests/suites/replication/cleanallruv_test.py After running a cleanallruv task just once we get missing CSN errors and things break:
Going to gather more info, and add it to the ticket. |
Comment from mreynolds (@mreynolds389) at 2016-12-22 21:28:22 Update: So when the cleanallruv task is run on Master A to remove Master D, it purges the changelog of all the changes from Master A (but not Master B, C, or D). D is the only one that's supposed to be cleaned, but only the local changes(Master A) are purged. |
Comment from mreynolds (@mreynolds389) at 2016-12-22 21:55:06 False alarm. This fix exposed a regression from ticket 48964 |
Comment from nhosoi (@nhosoi) at 2017-01-05 04:59:58 Do we need to backport to the 1.2.11 branch? |
Comment from lkrispen (@elkris) at 2017-01-12 19:31:09 attachment |
Comment from lkrispen (@elkris) at 2017-01-12 19:31:39 attached backport to 1.2.11 |
Comment from nhosoi (@nhosoi) at 2017-01-13 00:03:03
Ack. Thanks, Ludwig! |
Comment from lkrispen (@elkris) at 2017-01-13 15:11:42 committed to 1.2.11 branch: commit 55aa091 |
Comment from firstyear (@Firstyear) at 2017-02-11 22:58:51 Metadata Update from @Firstyear:
|
Comment from vashirov (@vashirov) at 2017-02-14 18:13:25 |
Comment from vashirov (@vashirov) at 2017-02-14 18:13:33 Metadata Update from @vashirov:
|
Comment from vashirov (@vashirov) at 2017-02-14 18:14:03 Metadata Update from @vashirov:
|
Comment from mreynolds (@mreynolds389) at 2017-02-14 18:46:04 Metadata Update from @mreynolds389:
|
Comment from mreynolds (@mreynolds389) at 2017-02-14 18:47:19 Acked |
Comment from vashirov (@vashirov) at 2017-02-14 20:16:05 Metadata Update from @vashirov:
|
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49020
There have been many tickets and fixes about managing csn in a replication session and it is not yet fully settled.
There are situations where replication should backoff instead of going into fatal state.
A summary of the problem and status was discussed on a mailing list and is cited here:
The text was updated successfully, but these errors were encountered: