Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: add metrics for proxy requests #120239

Merged
merged 2 commits into from Mar 25, 2024

Conversation

andrewbaptist
Copy link
Collaborator

@andrewbaptist andrewbaptist commented Mar 11, 2024

Proxy behavior was added in a previous commit. This commit adds 4 new
metrics to track the client and server side of proxying and the number
of errors that occured as a result of the proxy. These statistics will
normally be zero or close to zero. While there is a partial partition
the metric will be increased.

Epic: none

Release note: Adds four new metrics: distsender.rpc.proxy.sent,
distsender.rpc.proxy.err, distsender.rpc.proxy.forward.sent,
distsender.rpc.proxy.forward.err to track the number and outcome of
proxy requests. Operators should monitor and alert on
distsender.rpc.proxy.sent as it indicates there is likely a network
partition in the system.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the 2024-03-07-proxy-metrics branch 8 times, most recently from 7bb5517 to 88cb4c3 Compare March 15, 2024 12:10
@andrewbaptist andrewbaptist force-pushed the 2024-03-07-proxy-metrics branch 2 times, most recently from 9db45e1 to 47162bc Compare March 23, 2024 19:41
@andrewbaptist andrewbaptist marked this pull request as ready for review March 23, 2024 19:41
@andrewbaptist andrewbaptist requested review from a team as code owners March 23, 2024 19:41
@andrewbaptist andrewbaptist changed the title metrics for proxy requests kvclient: add metrics for proxy requests Mar 23, 2024
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)


pkg/kv/kvclient/kvcoord/dist_sender.go line 206 at r1 (raw file):

	metaDistSenderProxySentCount = metric.Metadata{
		Name:        "distsender.rpc.proxy.sent",
		Help:        `This counts the number of proxy attempts for Send requests which originated on this client`,

nit: consider making this more understandable to operators -- they don't know what Send requests or stacks are, and to them a client is a SQL client not a DistSender. Something along the lines of "Number of gateway attempts to proxy a request to an unreachable leaseholder via a follower replica." Same goes for the other descriptions.


pkg/ui/workspaces/db-console/src/views/cluster/containers/nodeGraphs/dashboards/networking.tsx line 207 at r2 (raw file):

    <LineGraph
      title="Proxy requests"

Should we try to present some of these in the same chart? For example, we could group either of proxy/forward or requests/errors into 2 charts rather than having 4 charts.


pkg/ui/workspaces/db-console/src/views/cluster/containers/nodeGraphs/dashboards/networking.tsx line 264 at r2 (raw file):

    <LineGraph
        title="Proxy forwards"

Should this be "Proxy forward errors"?

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, accidentally published before approving.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)

Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Thanks for the quick review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvclient/kvcoord/dist_sender.go line 206 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: consider making this more understandable to operators -- they don't know what Send requests or stacks are, and to them a client is a SQL client not a DistSender. Something along the lines of "Number of gateway attempts to proxy a request to an unreachable leaseholder via a follower replica." Same goes for the other descriptions.

I cleaned up this text. Good suggestion.


pkg/ui/workspaces/db-console/src/views/cluster/containers/nodeGraphs/dashboards/networking.tsx line 207 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Should we try to present some of these in the same chart? For example, we could group either of proxy/forward or requests/errors into 2 charts rather than having 4 charts.

I had a conversation with the UI team about this, and they recommended to keep seperate graphs. Multiple per-node lines on a single graph are discouraged as they don't scale well


pkg/ui/workspaces/db-console/src/views/cluster/containers/nodeGraphs/dashboards/networking.tsx line 264 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Should this be "Proxy forward errors"?

Yep - there were some other small typos I cleaned up as well. Thanks!

Proxy behavior was added in a previous commit. This commit adds 4 new
metrics to track the client and server side of proxying and the number
of errors that occured as a result of the proxy. These statistics will
normally be zero or close to zero. While there is a partial partition
the metric will be increased.

Epic: none

Release note (ops change): Adds four new metrics: distsender.rpc.proxy.sent,
distsender.rpc.proxy.err, distsender.rpc.proxy.forward.sent,
distsender.rpc.proxy.forward.err to track the number and outcome of
proxy requests. Operators should monitor and alert on
`distsender.rpc.proxy.sent` as it indicates there is likely a network
partition in the system.
Changes the network page to add two new graphs. The first shows the
number of proxy requests each node is sending. The second shows any
proxy errors that occur.

Epic: none

Release note (ui change): Adds two new graphs to the network page.
@andrewbaptist
Copy link
Collaborator Author

bors r=erikgrinaker

TFTR!

@craig
Copy link
Contributor

craig bot commented Mar 25, 2024

@craig craig bot merged commit 00a6257 into cockroachdb:master Mar 25, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants