Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist: add some more useful metrics #13244

Merged
merged 1 commit into from Jun 24, 2022

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Jun 23, 2022

This adds prometheus metric coverage for encoding and decoding of state
(might have helped diagnose the CI slowness we investigated while trying
to land #13160). It also adds coverage for CaS retries as a result of
expectation mismatch (ditto).

Motivation

  • This PR adds a feature that has not yet been specified.

    [Write a brief specification for the feature, including justification
    for its inclusion in Materialize, as if you were writing the original
    feature specification.]

Testing

  • This PR has adequate test coverage / QA involvement has been duly considered.

Release notes

This PR includes the following user-facing behavior changes:

  • N/A

@danhhz danhhz requested review from aljoscha and pH14 June 23, 2022 19:26
@@ -37,6 +37,8 @@ pub struct Metrics {
pub cmds: CmdsMetrics,
/// Metrics for each retry loop.
pub retries: RetriesMetrics,
/// Metrics for various encodings and decodings.
pub codec: CodecMetrics,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I should make this a set of vec counters with state, key, val as the labels. also I should add trace batch encode/decode to this PR

Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These make a lot of sense! I can only say that the ones you added are good, I can't say which ones we should yet add. 😅

CodecsMetrics {
state: self.codec_metrics("state"),
batch: self.codec_metrics("batch"),
// Maybe {key,val}_{encode,decode} too. Probably not time or diff.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, or just a TODO for later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts on whether we should do this? easy enough to do but I wasn't sure if we thought the overhead of the measurement would be worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we're concerned with metrics overhead, as a user I'd always be in favor of more metrics 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm! we can always rip it out later. added a helper for actually doing the measurements to CodecMetrics

Copy link
Contributor

@pH14 pH14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -279,6 +283,7 @@ where
>(
&mut self,
cmd: &CmdMetrics,
metrics: &Metrics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious - is there a reason this needs to be passed in as a param vs accessing the metrics via self?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 it's been a week, apparently. done!

CodecsMetrics {
state: self.codec_metrics("state"),
batch: self.codec_metrics("batch"),
// Maybe {key,val}_{encode,decode} too. Probably not time or diff.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we're concerned with metrics overhead, as a user I'd always be in favor of more metrics 😁

This adds prometheus metric coverage for encoding and decoding of state
(might have helped diagnose the CI slowness we investigated while trying
to land MaterializeInc#13160). It also adds coverage for CaS retries as a result of
expectation mismatch (ditto).
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

@@ -279,6 +283,7 @@ where
>(
&mut self,
cmd: &CmdMetrics,
metrics: &Metrics,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 it's been a week, apparently. done!

CodecsMetrics {
state: self.codec_metrics("state"),
batch: self.codec_metrics("batch"),
// Maybe {key,val}_{encode,decode} too. Probably not time or diff.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm! we can always rip it out later. added a helper for actually doing the measurements to CodecMetrics

@danhhz danhhz enabled auto-merge June 24, 2022 16:05
@danhhz danhhz merged commit 953a909 into MaterializeInc:main Jun 24, 2022
@danhhz danhhz deleted the persist_metrics branch June 24, 2022 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants