Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metrics for group size and threshold #771

Merged
merged 4 commits into from
Apr 14, 2021
Merged

Conversation

willscott
Copy link
Member

fix #767

@willscott
Copy link
Member Author

cc @lukevalenta

@@ -77,6 +77,8 @@ func (d *discrepancyStore) Put(b *chain.Beacon) error {
discrepancy := float64(actual-expected) / float64(time.Millisecond)
metrics.BeaconDiscrepancyLatency.Set(float64(actual-expected) / float64(time.Millisecond))
metrics.LastBeaconRound.Set(float64(b.GetRound()))
metrics.GroupSize.Set(float64(d.group.Len()))
metrics.GroupThreshold.Set(float64(d.group.Threshold))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going for putting group info, we might as well put the genesis time field and the transition field ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as the list of nodes names ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would exposing those fields as metrics be useful for debugging and alerts? If not, maybe better to leave them out to reduce LoC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't aim for LoC reduction per se but I was simply suggesting, if you think that's useless that's fine for me. From my experience, it has been useful for debugging yes especially during a transition but I guess we deal with that with the drand output itself already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I guess I had a specific alert use case for the group size and threshold metrics, but not the other fields. But no real harm in adding the fields if they might be useful in the future. Either way is fine with me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can already get genesis out of the /info endpoint, and i think have monitoring on that one. What do you mean by the transition field, @nikkolasg ? the most recent time the group transitioned?

The group size is I think what we want rather than node names - the nodes can then be accessed through another subset of metrics already. A variable array of string names doesn't work super well in the prometheus model so i'm not really sure how i'd represent / encode that usefully.

Copy link
Collaborator

@nikkolasg nikkolasg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - sorry for the delay.
Let's merge when travis is done.

@nikkolasg
Copy link
Collaborator

We need to merge #797 first so tests pass

@willscott willscott merged commit dbeff67 into master Apr 14, 2021
@willscott willscott deleted the metrics/groupsize branch April 14, 2021 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose group size and threshold via prometheus metrics
4 participants