Recovered supplier need to reject direct update until it is in sync with the topology #1317

389-ds-bot · 2020-09-12T23:15:03Z

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47986

Created at 2015-01-12 21:59:02 by tbordaz (@tbordaz)
Assigned to nobody

The problem is that if a recovered supplier accepts direct updates before being in sync, replica to that supplier and from that supplier is broken.

Use case in MMR with two suppliers M1/rid1 and M2/rid2

ldif RUV is : [ rid1_t0, rid2_t1]

T20:
M1 RUV is [rid1_t5, rid2_t6]
M2 RUV is [rid1_t5, rid2_t6]

M1 is recovered from the ldif file

T21:
M1 RUV is [rid1_t0, rid2_t1]
M2 RUV is [rid1_t5, rid2_t6]

T22:
ldapclient send update to M1
M1 RUV is [rid1_t22, rid2_t1]
M2 RUV is [rid1_t5, rid2_t6]

T23
M2 starts a replication session to M1, It will update M2 with updates [rid2_t1..rid2_t6]
M1 RUV is [rid1_t22, rid2_t6]
M2 RUV is [rid1_t5, rid2_t6]

But here replication is broken both ways. M2 does not know rid1_t22 in its CL and can not update M1. After the import, M1 CL has been cleared, so M1 does not know rid1_t5 and can not update M2.

This problem exist with ldif recovery but I think it also exists with backup recovery.

389-ds-bot · 2020-09-12T23:15:04Z

Comment from nhosoi (@nhosoi) at 2015-03-10 23:41:51

Comments made in the ticket triage:
Ludwig: should be done, but is change in behaviour, sjould be configurable.
Thierry: if configurable, what would be the default behavior: reject/accept.

389-ds-bot · 2020-09-12T23:15:05Z

Comment from nhosoi (@nhosoi) at 2016-05-13 00:19:24

Per triage, push the target milestone to 1.3.6.

389-ds-bot · 2020-09-12T23:15:06Z

Comment from nhosoi (@nhosoi) at 2017-02-11 22:49:27

Metadata Update from @nhosoi:

Issue set to the milestone: 1.3.6.0

389-ds-bot · 2020-09-12T23:15:07Z

Comment from mreynolds (@mreynolds389) at 2017-05-08 22:27:14

Metadata Update from @mreynolds389:

Issue close_status updated to: None
Issue set to the milestone: 1.4 backlog (was: 1.3.6.0)

389-ds-bot · 2020-09-12T23:15:08Z

Comment from mreynolds (@mreynolds389) at 2020-05-27 16:11:11

Metadata Update from @mreynolds389:

Custom field reviewstatus adjusted to None
Issue set to the milestone: 1.4.4 (was: 1.4 backlog)
Issue tagged with: Replication

droideck · 2023-09-01T03:04:59Z

Okay, I wrote a test case (for which we just need to add the error checks), and it looks like this: droideck@763ff3c

And when I run ds-replcheck after the test, I have the next report.
i1347_report.txt

So, the issue seems legit.
But please, recheck the code and report, in case I missed something.

progier389 · 2023-09-01T09:26:26Z

I am also quite sure that the issue is legit.
About your test case:

IMHO the pause_all_replicas / resume_all_replicas is useless (because the backend is down during the import)
Probably a good idea to wait for replication s2,s1 and s1,s2 at the end of the test (since replication is broken it should fails ...)

droideck · 2023-09-01T19:34:03Z

I am also quite sure that the issue is legit. About your test case:

IMHO the pause_all_replicas / resume_all_replicas is useless (because the backend is down during the import)

Probably a good idea to wait for replication s2,s1 and s1,s2 at the end of the test (since replication is broken it should fails ...)

Sounds good! I'll play around with the test a bit more, and I'll create PR later next week. I probably set it as XFail as I'm not exactly sure when we'll work on that...

tbordaz · 2023-09-04T09:51:40Z

Not sure where to comments :(

I would suggest a slight change before resuming servers
It should stop s2 during import/start/killer_update on s1.
To be in replication breakage we need that s2 does not replicate to s1 before killer_update occurs.
Once killer_update is completed, you may start s2 and verify that S1toS2 is broken and S2toS1 is broken.

Description: Add a test to that checks the situation when a recovered supplier accepts direct updates before being in sync, replica to that supplier and from that supplier is broken. Related: 389ds#1317 Reviewed by: ?

@progier389

Description: Add a test that checks the situation when a recovered supplier accepts direct updates before being in sync, replica to that supplier and from that supplier is broken. Related: #1317 Reviewed by: @progier389 (Thanks!)

tbordaz · 2023-10-18T15:48:15Z

Somehow related to #2035

389-ds-bot added the replication Issue involves replication label Sep 12, 2020

389-ds-bot added this to the 1.4.4 milestone Sep 12, 2020

vashirov added the priority_medium good value but complex/risky/not crucial label Sep 6, 2023

droideck mentioned this issue Sep 6, 2023

Issue 1317 - Add a CI test #5923

Merged

droideck mentioned this issue Sep 11, 2023

updates are not replicated after restore #1832

Closed

vashirov mentioned this issue Sep 13, 2023

Data loss on replication topology #1592

Closed

tbordaz mentioned this issue Oct 18, 2023

After reinit a topology can silently diverge instead of breaking replication #2035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovered supplier need to reject direct update until it is in sync with the topology #1317

Recovered supplier need to reject direct update until it is in sync with the topology #1317

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

droideck commented Sep 1, 2023

progier389 commented Sep 1, 2023

droideck commented Sep 1, 2023

tbordaz commented Sep 4, 2023

tbordaz commented Oct 18, 2023