Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: implement output buffering to reduce frequent log.VEvent calls #107421

Open
wenyihu6 opened this issue Jul 23, 2023 · 0 comments
Open
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects

Comments

@wenyihu6
Copy link
Contributor

wenyihu6 commented Jul 23, 2023

Is your feature request related to a problem? Please describe.

When investigating delays in an allocator simulator test, we notice that a
significant amount of time were spent on log.VEventf. The issue originates
from rankedCandidateListForRebalancing repeatedly calling
IsStoreReadyForRoutineReplicaTransfer on all stores that pass
StoreFilterThrottled . As a short term mitigation, we removed the
log.VEventf for less interesting cases where store are alive. However, this
could indicate a prevalent issue.

A longer term strategy is to delve into different sections of the allocator code
and implement output buffering at a higher level to reduce frequency of
log.VEventfcalls.

Jira issue: CRDB-30033

@wenyihu6 wenyihu6 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 23, 2023
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jul 23, 2023
…ineReplicaTransferInternal

When investigating an allocator simulator test, we observed a significant amount
of time spent by `log.VEventf`, consuming 64% of time (according to pprof CPU
analysis). This was mainly caused by `rankedCandidateListForRebalancing`
repeatedly calling `IsStoreReadyForRoutineReplicaTransfer` on all stores that
pass the `StoreFilterThrottled` filter. As a temporary mitigation, we deleted
`log.VEventf` in `isStoreReadyForRoutineReplicaTransferInternal` for less
interesting cases where stores are alive.  However, our longer term strategy is
to buffer such outputs in the allocator at a higher level to reduce
`log.VEventf` calls.

Related: cockroachdb#107421

Release note: None
@wenyihu6 wenyihu6 added the T-kv KV Team label Jul 23, 2023
@blathers-crl blathers-crl bot added this to Incoming in KV Jul 23, 2023
@wenyihu6 wenyihu6 moved this from Incoming to On Hold in KV Jul 23, 2023
@wenyihu6 wenyihu6 added the A-kv-distribution Relating to rebalancing and leasing. label Jul 23, 2023
@wenyihu6 wenyihu6 changed the title allocator: add output buffering to reduce frequent log.VEvent calls allocator: implement output buffering to reduce frequent log.VEvent calls Jul 23, 2023
craig bot pushed a commit that referenced this issue Jul 24, 2023
106316: concurrency: non-locking reads of RC txns ignore exclusive locks r=nvanbenschoten a=arulajmani

Non-locking reads issued by transactions that can tolerate write-skew
do not block on exclusive locks after this patch.

Resolves #94729

Release note: None

107420: allocator: remove log.VEventf for alive stores in isStoreReady… r=kvoli a=wenyihu6

When investigating an allocator simulator test, we observed a significant amount
of time spent by `log.VEventf`, consuming 64% of time (according to pprof CPU
analysis). This was mainly caused by `rankedCandidateListForRebalancing`
repeatedly calling `IsStoreReadyForRoutineReplicaTransfer` on all stores that
pass the `StoreFilterThrottled` filter. As a temporary mitigation, we deleted
`log.VEventf` in `isStoreReadyForRoutineReplicaTransferInternal` for less
interesting cases where stores are alive.  However, our longer term strategy is
to buffer such outputs in the allocator at a higher level to reduce
`log.VEventf` calls.

Informs: #107421

Release note: None

----

| Before| After |
| -| - |
| <img width="300" alt="Screenshot 2023-07-21 at 6 08 01 PM" src="https://github.com/cockroachdb/cockroach/assets/56973754/e544282c-9405-49b1-9469-c8e54330794b"> |<img width="300" alt="Screenshot 2023-07-22 at 11 08 33 PM" src="https://github.com/cockroachdb/cockroach/assets/56973754/02c7ee25-d3e4-4b01-9a91-131c05046895">

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: wenyihu6 <wenyi.hu@cockroachlabs.com>
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 31, 2023
When investigating delays in the allocator simulator test, we found that a
significant amount of time were spent on log.VEvent. Upon investigating, the log
events removed in this patch are those that occurred most frequently during
testing. I'm considering removing some of the log events here and would
appreciate a second pair of eye to better decide whether these log events are
useful for tracing.

See also: cockroachdb#107420
Part of: cockroachdb#107421
Release note: none
blathers-crl bot pushed a commit that referenced this issue Apr 15, 2024
…ineReplicaTransferInternal

When investigating an allocator simulator test, we observed a significant amount
of time spent by `log.VEventf`, consuming 64% of time (according to pprof CPU
analysis). This was mainly caused by `rankedCandidateListForRebalancing`
repeatedly calling `IsStoreReadyForRoutineReplicaTransfer` on all stores that
pass the `StoreFilterThrottled` filter. As a temporary mitigation, we deleted
`log.VEventf` in `isStoreReadyForRoutineReplicaTransferInternal` for less
interesting cases where stores are alive.  However, our longer term strategy is
to buffer such outputs in the allocator at a higher level to reduce
`log.VEventf` calls.

Related: #107421

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
KV
On Hold
Development

No branches or pull requests

1 participant