Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose gateway topology #4554

Closed
npepinpe opened this issue May 19, 2020 · 11 comments · Fixed by #6091
Closed

Expose gateway topology #4554

npepinpe opened this issue May 19, 2020 · 11 comments · Fixed by #6091
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/gateway Marks an issue or PR to appear in the gateway section of the changelog

Comments

@npepinpe
Copy link
Member

npepinpe commented May 19, 2020

Description

It's particularly useful to know what the gateway thinks the topology of the cluster is at a given point in time when debugging, especially when doing post-mortem investigations.

The gateway topology should be exposed as:

  1. Metrics: so we can consume them as a time series and find point-in-time state
  2. MBean/Actuator: so we can debug a running system (Optional)
@npepinpe npepinpe added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. Status: Needs Priority scope/gateway Marks an issue or PR to appear in the gateway section of the changelog and removed Status: Needs Priority labels May 19, 2020
@pihme
Copy link
Contributor

pihme commented Jun 5, 2020

For exposing topology through MBean I would recommend using:

(Note that the info endpoint currently is deactivated)

Happy to help out if you have questions about the actuators.

Not sure how to expose topology as a metric.

@korthout
Copy link
Member

korthout commented Jun 5, 2020

@pihme thanks 👍

@aivinog1
Copy link
Contributor

Hi! I would like to start working on this. I can start implementing an MBean part, after that we can discuss metrics :)

@npepinpe
Copy link
Member Author

Hey @aivinog1, we haven't really done MBean right now. I think @pihme looked into it, and it wasn't all that great? Correct me here if I'm wrong, I only vaguely remember.

My first suggestion would be to implement the metrics in the Topology manager, then add a simple actuator which just returns the topology as JSON, but I'm open to suggestions.

@aivinog1
Copy link
Contributor

@npepinpe Hi! I have a question about how metrics should look like? Should it be the same as the topology response command? For example, the gateway knows about 2 brokers (the first metric), the first broker knows about 2 partitions(so, there must be a second metric), they are both healthy and it is a leader for partition 2 (third metric).

@npepinpe npepinpe added this to To do in Zeebe Dec 14, 2020
@deepthidevaki deepthidevaki self-assigned this Jan 8, 2021
@deepthidevaki
Copy link
Contributor

@npepinpe I don't understand why we need to expose the topology via actuators. The topology is already exposed via zbctl.

For the metrics:
IMO, the interesting metrics to expose is what gateway knows as the leader for each partition. I don't think the whole topology needs to be exposed via metrics. So my proposal is to have one metric that indicates who is the leader for a partition.

@Zelldon
Copy link
Member

Zelldon commented Jan 11, 2021

I agree with @deepthidevaki I would do it similar to the metrics we have in raft

@npepinpe
Copy link
Member Author

Since we already expose the topology as part of our client, you're right, no point making it an actuator.

As for the metrics, definitely the leader is usually the important part, but it can still be useful to know the followers to ensure that we have a consistent view of the cluster. Are we sure that knowing the wrong followers could never lead to an issue or help us diagnose one?

@deepthidevaki
Copy link
Contributor

Gateway doesn't care about followers. We have raft metrics that shows which nodes are followers for each partition. That is enough for debugging, IMO.

image

@npepinpe
Copy link
Member Author

Couldn't it be useful to know the complete topology from the gateway point of view at times? As in, what if I have no leader? Is the gateway unaware of one, or is it a follower according to it, or...? Couldn't it also allow us to track bugs related to how the topology is built over time by checking the transitions from follower to leader and back according to the gateway?

@deepthidevaki
Copy link
Contributor

Ya. Makes sense.
Then the metrics would show the following table:
GatewayId | Broker | Partition | Role (Leader or Follower)

@zeebe-bors zeebe-bors bot closed this as completed in e67dcd0 Jan 15, 2021
Zeebe automation moved this from Ready to Done Jan 15, 2021
@menski menski removed this from Done in Zeebe Mar 9, 2021
github-merge-queue bot pushed a commit that referenced this issue Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/gateway Marks an issue or PR to appear in the gateway section of the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants