Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to SwimMembershipProtocol #6003

Closed
deepthidevaki opened this issue Dec 11, 2020 · 7 comments · Fixed by #12850
Closed

Add metrics to SwimMembershipProtocol #6003

deepthidevaki opened this issue Dec 11, 2020 · 7 comments · Fixed by #12850
Assignees
Labels
area/observability Marks an issue as observability related component/gossip kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0

Comments

@deepthidevaki
Copy link
Contributor

Description

To help tune the config parameters of Swim protocol, it would be useful to add some metrics.
Examples:

  • Probe latency - RTT for a probe request
  • Gossip latency - how long until a metadata change on a node is propagated to another node.

Related to #4827 (comment) #4827 (comment)

@deepthidevaki deepthidevaki added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog Impact: Observability labels Dec 11, 2020
@npepinpe npepinpe added area/observability Marks an issue as observability related and removed Impact: Observability labels Apr 11, 2022
@Zelldon
Copy link
Member

Zelldon commented Jan 5, 2023

I feel with adding messaging service metrics we this covered as well #11353 we have labels for the message types, which allows us to see also probe and sync req-response latencies etc.

wdyt @npepinpe

@Zelldon
Copy link
Member

Zelldon commented Jan 5, 2023

I could add a separate section to the dashboard for swim using these metrics

@rodrigo-lourenco-lopes
Copy link
Contributor

Since we are broadcasting the gossip of these updates with no answers, how could we go about measuring this gossip latency?
How about implementing a response for these broadcasts along with the metric and putting everything behind a feature flag?
Or ideally, there is a more straightforward way to measure this.

https://github.com/camunda/zeebe/blob/5280440c43122cf2da5b4b83dbff76a644273e7c/atomix/cluster/src/main/java/io/atomix/cluster/protocol/SwimMembershipProtocol.java#LL753C1-L758C4

wdyt? @npepinpe @Zelldon

@npepinpe
Copy link
Member

npepinpe commented May 15, 2023

Honestly, it sounds to me like distributed tracing would be the tool we want to have here. Start an operation somewhere, measure when it completes somewhere else, possibly with hops.

As we don't have that yet, I would postpone this. However, there may be other metrics we'd like just from the local node?

@rodrigo-lourenco-lopes
Copy link
Contributor

The other thing we could measure perhaps is sync() we have a response for this one.

@lenaschoenburg
Copy link
Member

We could also try to just export the current state as metrics. Then we could derive additional properties such as "how long does it take to propagate changes throughout the entire cluster" by calculating it on the metrics. Not ideal but I feel like just exporting the current state of SWIM as metrics is useful already.

@rodrigo-lourenco-lopes
Copy link
Contributor

@oleschoenburg If I understood correctly we would export the local state of all members on each node?

@lenaschoenburg lenaschoenburg added the version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 label Jun 7, 2023
@megglos megglos added the version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0 label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related component/gossip kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants