Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: Metrics.WAL.BytesWritten appears bogus #3505

Closed
jbowens opened this issue Apr 11, 2024 · 1 comment · Fixed by #3555
Closed

db: Metrics.WAL.BytesWritten appears bogus #3505

jbowens opened this issue Apr 11, 2024 · 1 comment · Fixed by #3555
Assignees
Labels
A-storage bug Something isn't working o-testcluster Issues found as part of DRT testing T-storage

Comments

@jbowens
Copy link
Collaborator

jbowens commented Apr 11, 2024

This metric spikes orders of magnitude beyond WAL.BytesIn, even orders of magnitude beyond bytes flushed or compacted, and beyond the node-level bytes written.

Internal slack link: https://cockroachlabs.slack.com/archives/C06TG0C6VGS/p1712855654340209?thread_ts=1712673101.370579&cid=C06TG0C6VGS

@jbowens jbowens added bug Something isn't working T-storage A-storage labels Apr 11, 2024
@jbowens jbowens added this to Incoming in (Deprecated) Storage via automation Apr 11, 2024
@ajstorm ajstorm added the o-testcluster Issues found as part of DRT testing label Apr 11, 2024
@nicktrav nicktrav moved this from Incoming to Next in (Deprecated) Storage Apr 16, 2024
@jbowens
Copy link
Collaborator Author

jbowens commented Apr 25, 2024

Taking the rate:

Screenshot 2024-04-25 at 10 59 10 AM
rate(storage_wal_bytes_written{cluster="$cluster",instance=~"$instances"}[$__rate_interval])

The raw counter values:

Screenshot 2024-04-25 at 10 59 54 AM
storage_wal_bytes_written{cluster="$cluster",instance=~"$instances"}

I don't really understand the rate graph. The counter metric appears to violate monotonicity, which seems like it's probably the source of the issue. I don't have enough prometheus understanding to say why the small regression in monotonicity seems to result in such a large increase in rate.

jbowens added a commit to jbowens/pebble that referenced this issue Apr 25, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit to jbowens/pebble that referenced this issue Apr 29, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit that referenced this issue Apr 30, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close #3505.
(Deprecated) Storage automation moved this from Next to Done Apr 30, 2024
jbowens added a commit to jbowens/pebble that referenced this issue Apr 30, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit to jbowens/pebble that referenced this issue Apr 30, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit that referenced this issue Apr 30, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close #3505.
jbowens added a commit to jbowens/pebble that referenced this issue May 6, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit to jbowens/pebble that referenced this issue May 9, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close cockroachdb#3505.
jbowens added a commit that referenced this issue May 13, 2024
The Metrics.WAL.BytesWritten metric is intended to be a monotonically
increasing counter of all bytes written to the write-ahead log. Previously, it
was possible for this metric to violate monotonicity immediately after a WAL
rotation. The d.logSize value—which corresponds to the size of the current
WAL—was not reset to zero. It was only reset after the first write to the new
WAL.

Close #3505.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage bug Something isn't working o-testcluster Issues found as part of DRT testing T-storage
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants