Skip to content

Conversation

@hczhu-db
Copy link

@hczhu-db hczhu-db commented Dec 15, 2024

This is to prevent Receive server from begin overloaded

Tested in dev-aws-eu-west-1

[dev-aws-eu-west-1] [pantheon] [pantheon-db-rep0-0] > logs | rg pending
ts=2024-12-17T17:38:14.728273143Z caller=receive.go:272 level=info name=pantheon-db component=receive msg="set max pending gRPC write request in limiter" max_pending_requests=1000
image

@hczhu-db hczhu-db force-pushed the load-shedding branch 8 times, most recently from 174dd33 to d52aa34 Compare December 15, 2024 23:29
level.Info(logger).Log("msg", "set max pending gRPC write request in limiter", "max_pending_requests", conf.maxPendingGrpcWriteRequests)
}
limiter, err := receive.NewLimiterWithOptions(
conf.writeLimitsConfig,
Copy link
Collaborator

@jnyi jnyi Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider reuse the writeLimitsConfig so less interface changes?

Copy link
Author

@hczhu-db hczhu-db Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it'd be ideal to have this config field in writeLimitsConfig, but DB pods don't load writeLimitsConfig at all. Pantheon-writer pods load that config. I'll have to keep it this way.
The interface is not changed. receive.NewLimiter() stays the same. I added another function receive.NewLimiterWithOptions().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, that's fair, do you wanna add some unit tests for limiter to test the load shedding behavior?

Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments, great job, i think we should also track remote write pending writes using limiter

// Value 0 disables the feature.
maxPendingRequests int32
pendingRequests atomic.Int32
maxPendingRequestLimitHit prometheus.Counter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Shall we consider adding an alert around this metrics?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely once the counter is there.


// RemoteWrite implements the gRPC remote write handler for storepb.WriteableStore.
func (h *Handler) RemoteWrite(ctx context.Context, r *storepb.WriteRequest) (*storepb.WriteResponse, error) {
if h.Limiter.ShouldRejectNewRequest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, let's add a unit test for this behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in a follow-up PR while testing it in Dev. It's quite tricky to write a unit test for such a feature. I want to see how useful it's in Dev before spending time on it.

Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you for doing this!

@hczhu-db hczhu-db merged commit 2fecf4d into db_main Dec 17, 2024
14 checks passed
@hczhu-db hczhu-db deleted the load-shedding branch December 17, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants