-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metrics for group size and threshold #771
Conversation
cc @lukevalenta |
@@ -77,6 +77,8 @@ func (d *discrepancyStore) Put(b *chain.Beacon) error { | |||
discrepancy := float64(actual-expected) / float64(time.Millisecond) | |||
metrics.BeaconDiscrepancyLatency.Set(float64(actual-expected) / float64(time.Millisecond)) | |||
metrics.LastBeaconRound.Set(float64(b.GetRound())) | |||
metrics.GroupSize.Set(float64(d.group.Len())) | |||
metrics.GroupThreshold.Set(float64(d.group.Threshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going for putting group info, we might as well put the genesis time field and the transition field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well as the list of nodes names ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would exposing those fields as metrics be useful for debugging and alerts? If not, maybe better to leave them out to reduce LoC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't aim for LoC reduction per se but I was simply suggesting, if you think that's useless that's fine for me. From my experience, it has been useful for debugging yes especially during a transition but I guess we deal with that with the drand output itself already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I guess I had a specific alert use case for the group size and threshold metrics, but not the other fields. But no real harm in adding the fields if they might be useful in the future. Either way is fine with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can already get genesis out of the /info
endpoint, and i think have monitoring on that one. What do you mean by the transition
field, @nikkolasg ? the most recent time the group transitioned?
The group size is I think what we want rather than node names - the nodes can then be accessed through another subset of metrics already. A variable array of string names doesn't work super well in the prometheus model so i'm not really sure how i'd represent / encode that usefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - sorry for the delay.
Let's merge when travis is done.
We need to merge #797 first so tests pass |
fix #767