kvserver: remove below-raft throttling #57247

sumeerbhola · 2020-11-30T20:16:52Z

We currently throttle when adding sstables to the LSM due to background bulk operations. This throttling happens in two places, in Store.Send, prior to the proposal, and in addSSTablePreApply, when applying to the state machine at each replica. The latter is problematic in that it (a) delays applying later things in the raft log at the replica, and (b) blocks a worker on the raft scheduler, which can impact other ranges.

The Store.Send throttling only looks at the store health on the leaseholder, and we have seen occurrences where one node (or a few nodes) in a cluster has an unhealthy LSM store. If we maintained soft state in a node about the health of other stores in the system (this state would be slow changing), we could use it in Store.Send and remove the apply-time throttling.

This store health information can also be useful for throttling in the GC queue.

Jira issue: CRDB-2847

Epic CRDB-15069

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-11-30T20:16:54Z

Hi @sumeerbhola, please add a C-ategory label to your issue. Check out the label system docs.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl · 2021-12-08T18:15:51Z

cc @cockroachdb/bulk-io

erikgrinaker · 2021-12-13T14:04:18Z

@cockroachdb/repl discussed this a bit last week. We definitely need to avoid below-Raft throttling while holding the Raft mutex. Posting some preliminary thoughts below, nothing concrete yet.

We think that the store-level throttling likely happens a bit early -- it seems like the ideal throttling point would be right as the request is about to acquires latches in the concurrency manager. Requests can be hanging around for a long time between traversing the store and being evaluated, e.g. because they're waiting for latches. And throttling later than that basically means that we'll have to choose between head-of-line blocking (because we acquired latches), reordering problems (because we've already evaluated the request), or request reevaluation (to handle reordering).

We also felt like a distributed throttling scheme to take into account the health of follower stores would likely be too complex (and possibly more in the purview of the allocator, which should move replicas off of hot nodes). On followers, we can often defer command application as long as commands get committed to the log, which would allow us to do some level of local Raft throttling. But we would have to do this throttling without holding onto a scheduler goroutine, by returning early and backing off.

We'll also need to keep other throttling mechanisms in mind here, e.g. admission control and the proposal quota pool.

erikgrinaker · 2021-12-14T13:09:09Z

Note to self: if we were to do store-level throttling based on the state of other replica stores, we should consider using the existing store gossip mechanism:

cockroach/pkg/kv/kvserver/store.go

Lines 2388 to 2389 in 4422323

    
           // Gossip store descriptor. 
        
           return s.cfg.Gossip.AddInfoProto(gossipStoreKey, storeDesc, gossip.StoreTTL)

cockroach/pkg/roachpb/metadata.proto

Lines 392 to 401 in e1f30c6

    
           // StoreDescriptor holds store information including store attributes, node 
        
           // descriptor and store capacity. 
        
           message StoreDescriptor { 
        
             optional int32 store_id = 1 [(gogoproto.nullable) = false, 
        
                 (gogoproto.customname) = "StoreID", (gogoproto.casttype) = "StoreID"]; 
        
             optional Attributes attrs = 2 [(gogoproto.nullable) = false]; 
        
             optional NodeDescriptor node = 3 [(gogoproto.nullable) = false]; 
        
             optional StoreCapacity capacity = 4 [(gogoproto.nullable) = false]; 
        
             optional StoreProperties properties = 5 [(gogoproto.nullable) = false]; 
        
           }

Remote store info is maintained in StorePool (primarily for use by the allocator/rebalancer) by subscribing to gossip updates:

cockroach/pkg/kv/kvserver/store_pool.go

Lines 397 to 398 in 24e3c4a

    
           storeRegex := gossip.MakePrefixPattern(gossip.KeyStorePrefix) 
        
           g.RegisterCallback(storeRegex, sp.storeGossipUpdate, gossip.Redundant)

erikgrinaker · 2022-01-06T16:25:50Z

#74254 is related.

erikgrinaker · 2022-02-28T08:49:06Z

Store read amp is gossiped as of #77040.

tbg · 2022-05-31T12:46:03Z

Updating this issue to link it properly into the #79215.

tbg · 2022-06-02T11:30:41Z

#75066 (comment) is also concerned with removing below-raft throttling, so closing this one.

**This is for 23.2 only** We shouldn't artificially delay SST ingestion below raft because this exacerbates memory pressure (cockroachdb#81834). Anecdotally I see in my "typical experiment" (restores under I/O bandwidth constraints) that `PreIngestDelay` is mostly fairly small compared to the slowness that comes from write bandwidth overload itself, so at least in those experiments removing this has little to no effect. As we are also working on replication admission control[^1] and are looking into improving the memory footprint under unchecked overload[^2] now's a good time to rip this out as we'll be in a good place to address any detrimental fallout from doing so. [^1]: cockroachdb#95563 [^2]: cockroachdb#98576 Touches cockroachdb#81834. Touches cockroachdb#57247. Epic: none Release note: None

We shouldn't artificially delay SST ingestion below raft because this exacerbates memory pressure (cockroachdb#81834). Anecdotally I see in my "typical experiment" (restores under I/O bandwidth constraints) that `PreIngestDelay` is mostly fairly small compared to the slowness that comes from write bandwidth overload itself, so at least in those experiments removing this has little to no effect. As we are also working on replication admission control[^1] and are looking into improving the memory footprint under unchecked overload[^2] now's a good time to rip this out as we'll be in a good place to address any detrimental fallout from doing so. [^1]: cockroachdb#95563 [^2]: cockroachdb#98576 Touches cockroachdb#81834. Touches cockroachdb#57247. Epic: none Release note: None

sumeerbhola added A-disaster-recovery A-kv-replication Relating to Raft, consensus, and coordination. labels Nov 30, 2020

sumeerbhola added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 30, 2020

sumeerbhola mentioned this issue Nov 30, 2020

kv: add throttling for background GC operations based on store health #57248

Closed

mwang1026 added this to Triage in Disaster Recovery Backlog via automation Dec 8, 2020

mwang1026 added the T-disaster-recovery label Dec 8, 2020

mwang1026 added this to Incoming in Storage via automation Dec 15, 2020

mwang1026 removed this from Triage in Disaster Recovery Backlog Dec 15, 2020

nvanbenschoten mentioned this issue Apr 23, 2021

kv/sql: GC after TRUNCATE temporarily craters write throughput #62700

Closed

jlinder added the T-storage Storage Team label Jun 16, 2021

mwang1026 removed A-disaster-recovery T-disaster-recovery labels Jul 30, 2021

mwang1026 added A-kv-distribution Relating to rebalancing and leasing. T-kv KV Team and removed A-kv-replication Relating to Raft, consensus, and coordination. T-storage Storage Team labels Dec 8, 2021

blathers-crl bot added this to Incoming in KV Dec 8, 2021

mwang1026 added this to Triage in Disaster Recovery Backlog via automation Dec 8, 2021

blathers-crl bot added the T-disaster-recovery label Dec 8, 2021

mwang1026 removed the T-disaster-recovery label Dec 8, 2021

erikgrinaker added A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication KV Replication Team and removed A-kv-distribution Relating to rebalancing and leasing. T-kv KV Team labels Dec 9, 2021

erikgrinaker added this to Incoming in Replication via automation Dec 9, 2021

erikgrinaker removed this from Incoming in KV Dec 9, 2021

erikgrinaker removed this from Triage in Disaster Recovery Backlog Dec 9, 2021

mwang1026 moved this from Incoming to Backlog in Replication Dec 16, 2021

erikgrinaker mentioned this issue Dec 17, 2021

kvserver: improve AddSSTable throttling #73979

Closed

sumeerbhola mentioned this issue Jan 18, 2022

admission,kv,bulk: unify (local) store overload protection via admission control #75066

Closed

erikgrinaker moved this from Backlog to 22.2 in Replication Feb 25, 2022

erikgrinaker mentioned this issue Apr 1, 2022

kvserver: throttle writes on followers #79215

Closed

tbg changed the title ~~bulk, kv: improve throttling for background write operations to LSM store by using store state of all replicas~~ kvserver: remove below-raft throttling May 31, 2022

tbg closed this as completed Jun 2, 2022

Replication automation moved this from 22.2 to Done Jun 2, 2022

Storage automation moved this from Incoming to Done Jun 2, 2022

tbg mentioned this issue Mar 16, 2023

kvserver: remove below-raft PreIngestDelay during SST application #98762

Merged

tbg reopened this Mar 16, 2023

tbg self-assigned this Mar 16, 2023

craig bot closed this as completed in 479ee62 Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: remove below-raft throttling #57247

kvserver: remove below-raft throttling #57247

sumeerbhola commented Nov 30, 2020 •

edited by exalate-issue-sync bot

blathers-crl bot commented Nov 30, 2020

blathers-crl bot commented Dec 8, 2021

erikgrinaker commented Dec 13, 2021

erikgrinaker commented Dec 14, 2021

erikgrinaker commented Jan 6, 2022

erikgrinaker commented Feb 28, 2022

tbg commented May 31, 2022

tbg commented Jun 2, 2022

kvserver: remove below-raft throttling #57247

kvserver: remove below-raft throttling #57247

Comments

sumeerbhola commented Nov 30, 2020 • edited by exalate-issue-sync bot

blathers-crl bot commented Nov 30, 2020

blathers-crl bot commented Dec 8, 2021

erikgrinaker commented Dec 13, 2021

erikgrinaker commented Dec 14, 2021

erikgrinaker commented Jan 6, 2022

erikgrinaker commented Feb 28, 2022

tbg commented May 31, 2022

tbg commented Jun 2, 2022

sumeerbhola commented Nov 30, 2020 •

edited by exalate-issue-sync bot