Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovered supplier need to reject direct update until it is in sync with the topology #1317

Open
389-ds-bot opened this issue Sep 12, 2020 · 10 comments
Labels
priority_medium good value but complex/risky/not crucial replication Issue involves replication
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47986


The problem is that if a recovered supplier accepts direct updates before being in sync, replica to that supplier and from that supplier is broken.

Use case in MMR with two suppliers M1/rid1 and M2/rid2

ldif RUV is : [ rid1_t0, rid2_t1]

T20:
M1 RUV is [rid1_t5, rid2_t6]
M2 RUV is [rid1_t5, rid2_t6]

M1 is recovered from the ldif file

T21:
M1 RUV is [rid1_t0, rid2_t1]
M2 RUV is [rid1_t5, rid2_t6]

T22:
ldapclient send update to M1
M1 RUV is [rid1_t22, rid2_t1]
M2 RUV is [rid1_t5, rid2_t6]

T23
M2 starts a replication session to M1, It will update M2 with updates [rid2_t1..rid2_t6]
M1 RUV is [rid1_t22, rid2_t6]
M2 RUV is [rid1_t5, rid2_t6]

But here replication is broken both ways. M2 does not know rid1_t22 in its CL and can not update M1. After the import, M1 CL has been cleared, so M1 does not know rid1_t5 and can not update M2.

This problem exist with ldif recovery but I think it also exists with backup recovery.

@389-ds-bot 389-ds-bot added the replication Issue involves replication label Sep 12, 2020
@389-ds-bot 389-ds-bot added this to the 1.4.4 milestone Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2015-03-10 23:41:51

Comments made in the ticket triage:
Ludwig: should be done, but is change in behaviour, sjould be configurable.
Thierry: if configurable, what would be the default behavior: reject/accept.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2016-05-13 00:19:24

Per triage, push the target milestone to 1.3.6.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2017-02-11 22:49:27

Metadata Update from @nhosoi:

  • Issue set to the milestone: 1.3.6.0

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-05-08 22:27:14

Metadata Update from @mreynolds389:

  • Issue close_status updated to: None
  • Issue set to the milestone: 1.4 backlog (was: 1.3.6.0)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-05-27 16:11:11

Metadata Update from @mreynolds389:

  • Custom field reviewstatus adjusted to None
  • Issue set to the milestone: 1.4.4 (was: 1.4 backlog)
  • Issue tagged with: Replication

@droideck
Copy link
Member

droideck commented Sep 1, 2023

Okay, I wrote a test case (for which we just need to add the error checks), and it looks like this: droideck@763ff3c

And when I run ds-replcheck after the test, I have the next report.
i1347_report.txt

So, the issue seems legit.
But please, recheck the code and report, in case I missed something.

@progier389
Copy link
Contributor

I am also quite sure that the issue is legit.
About your test case:

  • IMHO the pause_all_replicas / resume_all_replicas is useless (because the backend is down during the import)
  • Probably a good idea to wait for replication s2,s1 and s1,s2 at the end of the test (since replication is broken it should fails ...)

@droideck
Copy link
Member

droideck commented Sep 1, 2023

I am also quite sure that the issue is legit. About your test case:

  • IMHO the pause_all_replicas / resume_all_replicas is useless (because the backend is down during the import)
  • Probably a good idea to wait for replication s2,s1 and s1,s2 at the end of the test (since replication is broken it should fails ...)

Sounds good! I'll play around with the test a bit more, and I'll create PR later next week. I probably set it as XFail as I'm not exactly sure when we'll work on that...

@tbordaz
Copy link
Contributor

tbordaz commented Sep 4, 2023

Not sure where to comments :(

I would suggest a slight change before resuming servers
It should stop s2 during import/start/killer_update on s1.
To be in replication breakage we need that s2 does not replicate to s1 before killer_update occurs.
Once killer_update is completed, you may start s2 and verify that S1toS2 is broken and S2toS1 is broken.

@vashirov vashirov added the priority_medium good value but complex/risky/not crucial label Sep 6, 2023
droideck added a commit to droideck/389-ds-base that referenced this issue Sep 6, 2023
Description: Add a test to that checks the situation when a recovered
supplier accepts direct updates before being in sync, replica to
that supplier and from that supplier is broken.

Related: 389ds#1317

Reviewed by: ?
droideck added a commit that referenced this issue Sep 7, 2023
Description: Add a test that checks the situation when a recovered
supplier accepts direct updates before being in sync, replica to
that supplier and from that supplier is broken.

Related: #1317

Reviewed by: @progier389 (Thanks!)
@tbordaz
Copy link
Contributor

tbordaz commented Oct 18, 2023

Somehow related to #2035

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority_medium good value but complex/risky/not crucial replication Issue involves replication
Projects
None yet
Development

No branches or pull requests

5 participants