Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When creating a view, the creator should be our own address #54

Closed
wants to merge 2 commits into from

Conversation

dimbleby
Copy link
Contributor

I don't think that I've seen this cause problems, but it seems surprising and dangerous for a node A to create a view whose ViewID says that its creator is B (say). This must open up the possibility of two nodes creating two different views with the same ViewID, with confusion to follow.

(I considered also changing this bit of code so that the local node was the coordinator. But I don't think that this ought to be necessary, so I've stuck with the smallest change that I think needs making).

@dimbleby
Copy link
Contributor Author

Actually, having the view creator and coordinator be different breaks at least the recently added in code in MERGE3 ("Only add view creators which are actually in the set as well").

  • Suppose A, B and C all have the view C|1000 [A, B, C]
  • D has view D|0 [D]

Now A won't be a merge leader because it's not a view-creator, and C won't be a merge leader because it's not a coordinator; and if the ordering of addresses falls unfortunately then D will expect one of the others to do it.

I wouldn't be surprised if there's other code that implicitly assumes the view creator is also the coordinator. I'll look at making another change to restore that property.

@belaban
Copy link
Owner

belaban commented Jun 25, 2012

No, it is necessary to have a creator which is not in the view: when A leaves A|6={A,B,C}, it'll create and install B|7={B,C}

@dimbleby
Copy link
Contributor Author

I see Util.shutdown() where a leaving member will "// inject view in which the shut down member is the only element", but I haven't found code that behaves in the way you describe. Probably I'm just missing it - could you point it out?

But this seems dangerous, too. Suppose A and C leave at about the same time. Then I think you're saying that A would create B|7={B,C} while C would create B|7={A,B}. Is that right? (Not to mention that B might also be creating a B|7, eg if some other member D is joining at the same time...)

Why isn't it a bad idea to have different views with the same view ID?

@belaban
Copy link
Owner

belaban commented Jul 2, 2012

Just returned from JBossWorld. Your pull requests affect code that has worked well for 10+ years, so understand that I'm very hesitant to change this, to make edge cases involving a protocol that nobody else uses (yet) work. I want to be 110% sure that I don't jeopardize mainstream code by fixing an edge case. This takes time.
However, I also need to release 3.1 asap, so I'll possibly move the SEQUENCER related cases into 3.2.

@dimbleby
Copy link
Contributor Author

dimbleby commented Jul 2, 2012

Welcome back.

Just so that we're clear, only JGRP-1484 is SEQUENCER-specific. The open pull requests are not. I don't know whether this makes a difference to you.

I had hoped that this change would be uncontroversial, but since it seems not to be... would it be useful if I opened a new ticket to cover it? I somehow feel as though there's more space for discussion over there than here on github!

@belaban
Copy link
Owner

belaban commented Jul 2, 2012

Well, the other pull request is regarding view creation, and that code is core and has worked well for years, so I'm hesitant to touch it.

@dimbleby
Copy link
Contributor Author

dimbleby commented Jul 2, 2012

Of course it's sensible to be cautious about making changes to field-hardened code.

On the other hand: even old code can contain old bugs; and if such a bug is found then we surely shouldn't be so afraid of change that we just have to leave the bug there forever?

For what it's worth, my stress tests have been running with this fix and have not seen problems. (In view of their effectiveness at finding bugs, I think there's a very good chance that if anything was seriously broken by this, I'd have found out).

What would it take to convince you that this fix is safe?

@dimbleby
Copy link
Contributor Author

dimbleby commented Jul 7, 2012

I've just noticed JGRP-1002, saying that it would be a good thing if a post-merge coordinator was also a pre-merge coordinator.

This pull request will have that effect (since the new coordinator is the merge leader, who was chosen from among the existing coordinators).

@belaban
Copy link
Owner

belaban commented Aug 31, 2012

What do you want to achieve with this pull request ?

  • Make the merge leader the new coordinator of the view
  • Sort the merged view but move the merge leader to the head

So this doesn't fix a bug, correct ?

I'm trying to gauge what the benefit of this change is, compared to the risks.

@belaban
Copy link
Owner

belaban commented Aug 31, 2012

Suppose A and C leave at about the same time. Then I think you're saying that A would create
B|7={B,C} while C would create B|7={A,B}. Is that right? (Not to mention that B might also be
creating a B|7, eg if some other member D is joining at the same time...)

Yes, if we have A|6={A,B,C}, then A leaving we'd end up with view B|7={B,C}. C would not create a view on its own as it's not the coordinator. C would ping A, and once B|7 is installed, B to send it a leave request. B would then create B|8={B}.

@belaban
Copy link
Owner

belaban commented Aug 31, 2012

I've just noticed JGRP-1002, saying that it would be a good thing if a post-merge coordinator was
also a pre-merge coordinator.
This pull request will have that effect (since the new coordinator is the merge leader, who was
chosen from among the existing coordinators).

Not necessarily: one of the existing coordinator will be picked to be the new coordinator, so only one partition will continue to have the same coordinator. E.g. if we have {A,B,C}, {M,N,O} and {X,Y,Z}, then the merge coordinators are A, M and X. Say we pick M (after sorting), then 2 partitions will change their coordinator (from A->M and from X->M) whereas one partition will continue to use the existing coordinator.

@belaban
Copy link
Owner

belaban commented Sep 4, 2012

I don't see any benefit of applying this pull request; as a matter of fact there's a lot of risk associated with it. I'd have to browse through a lot of code to see whether your change affects it.
Plus, this doesn't fix a bug.

@belaban belaban closed this Sep 4, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants