Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(atomix): refactor cluster event test #4327

Closed
wants to merge 1 commit into from

Conversation

deepthidevaki
Copy link
Contributor

Description

Refactor DefaultClusterEventTest to remove all thread.sleep() and introduce deterministic waiting.

Related issues

closes #4307

Pull Request Checklist

  • All commit messages match our commit message guidelines
  • The submitting code follows our code style
  • If submitting code, please run mvn clean install -DskipTests locally before committing

@Zelldon
Copy link
Member

Zelldon commented Apr 17, 2020

😄 actually I created today a branch where I removed the DefaultEventService 😅 maybe we can discuss if we really want to keep it

@Zelldon
Copy link
Member

Zelldon commented Apr 17, 2020

@deepthidevaki
Copy link
Contributor Author

Yes. That module has some bugs now. But I like the general idea of it that we can subscribe to specific events and broadcast will only send to that specific receivers instead of sending to the whole cluster.

@npepinpe
Copy link
Member

npepinpe commented Apr 17, 2020

Don't we use it for long polling? But I agree, it's knowingly buggy, so it's arguable if we want to keep it and fix it, or just get rid of it and use something else.

EDIT: could we fix it instead? I remember refactoring it wasn't really hard, just a bit time consuming to write proper tests for the new version.

@Zelldon
Copy link
Member

Zelldon commented Apr 17, 2020

I mean it is a just a buggy wrapper around the clusterCommunicationService. I was able to replace everything so 🤷

@deepthidevaki
Copy link
Contributor Author

The behavior is different. In communicationService a broadcast send the message to all nodes in the cluster. But the eventService broadcast send the messages only to subscribers.

@npepinpe
Copy link
Member

Should we Zoom? 🚀

@Zelldon
Copy link
Member

Zelldon commented Apr 17, 2020

lets do it on monday

@deepthidevaki
Copy link
Contributor Author

Sending to only subscribers is useful in long polling notifications since only gateways are interested in those messages. Not sure how much overhead it has in a large cluster if every broker is broadcasting this to every other broker and gateways.
For deployment we can get away with out the event service and use the communication service because the sender knows who are the leaders and unicast to them instead of broadcast. This is what we did for snapshot replication.

The current issue in DefaultClusterEventService is that the gossiping subscribers info is not implemented correctly and after a node restart the subscription info is lost and the nodes do not re-gossip them. I discussed with @npepinpe and one way to fix it is to use the membership protocol to propagate the subscribers info. Both SWIM and heartbeat protocol in atomix propagates the member properties and we can add the topics each member subscribes in its properties. Use the gossip provided by the membership protocol instead of using the buggy custom gossip implemented in DefaultClusterEventService.

I think it is a useful abstraction and we should fix it and use it. Let's discuss it on monday.

@Zelldon
Copy link
Member

Zelldon commented Apr 29, 2020

@deepthidevaki this is superseded by #4387 right?

@deepthidevaki
Copy link
Contributor Author

Replaced by #4387

@deepthidevaki deepthidevaki deleted the 4307-clusterevent-test branch June 10, 2020 10:21
github-merge-queue bot pushed a commit that referenced this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

io.atomix.cluster.messaging.impl.DefaultClusterEventServiceTest.testClusterEventService is flaky
3 participants