Leader promotions exceed default timeout values when etcd is holding many leases #9496

cosgroveb · 2018-03-27T21:12:56Z

Hey folks,

This is a bit of a follow-up to the issue @mgates opened around improving lease performance #9418.

While testing out a cluster tracking many leases (millions) we noticed that if a leader election occurred, it would endlessly time out and fail to elect a leader leaving the cluster in an unhealthy state. After digging in and adding benchmarks to lessor.go it became apparent that the problem lies in the Promote method and, specifically, is another O(n) problem much like #9418.

Specifically, lessor iterates through the entire lease map several times and even sorts them (unsafeLeases).

These benchmarks show the effect pretty clearly:

func BenchmarkLessorPromote1(b *testing.B) { benchmarkLessorPromote(1, b) }                    
func BenchmarkLessorPromote10(b *testing.B) { benchmarkLessorPromote(10, b) }                  
func BenchmarkLessorPromote100(b *testing.B)  { benchmarkLessorPromote(100, b) }               
func BenchmarkLessorPromote1000(b *testing.B)  { benchmarkLessorPromote(1000, b) }             
func BenchmarkLessorPromote10000(b *testing.B)  { benchmarkLessorPromote(10000, b) }           
func BenchmarkLessorPromote100000(b *testing.B)  { benchmarkLessorPromote(100000, b) }         
}  
                                                                                               
func benchmarkLessorPromote(size int, b *testing.B) {                                          
  be, tmpPath := backend.NewDefaultTmpBackend()                                                
  le := newLessor(be, minLeaseTTL)                                                             
  defer le.Stop()                                                                              
  defer cleanup(be, tmpPath)                                                                   
  for i := 0; i < size; i++ {                                                                  
    le.Grant(LeaseID(i), int64(100+i))                                                         
  }                                                                                            
  b.ResetTimer()                                                                               
  for i := 0; i < b.N; i++ {                                                                   
    le.Promote(0)                                                                              
  }                                                                                            
}

Results:

goos: linux                                                              
goarch: amd64                                                            
BenchmarkLessorPromote1-16              10000000               142 ns/op 
BenchmarkLessorPromote10-16               100000             12679 ns/op 
BenchmarkLessorPromote100-16               10000            136276 ns/op 
BenchmarkLessorPromote1000-16               1000           1404405 ns/op 
BenchmarkLessorPromote10000-16                 3         401482438 ns/op 
BenchmarkLessorPromote100000-16                1        5157859968 ns/op 
PASS                                                                     
ok      _/home/pair/etcd/lease  16.093s

Obviously, we can take stabs at some improvements but given the complexity of this routine, we're not really confident that we understand the intent of every part of it.

We observed that a workaround does exist in making the leaseRevokeRate unusually large. Is it possible to make anything configurable here or perhaps feature switch this off? If not, can you folks provide some guidance that could help us make some performance improvements with confidence?

cc @mgates @jcalvert

The text was updated successfully, but these errors were encountered:

gyuho · 2018-03-27T21:17:09Z

Revoke rate is there to prevent spikes in Raft proposals.

We observed that a workaround does exist in making the leaseRevokeRate unusually large. Is it possible to make anything configurable here or perhaps feature switch this off?

Let's address slow revoke routine with #9418 first, and investigate further what's the next performance bottleneck.

Thanks!

…leases Add a persist expiry flag that allows the expiry field of leases to be persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496

…leases Add a persist expiry flag that allows the expiry field of leases to be persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496 Conflicts: embed/config.go embed/etcd.go etcdmain/config.go etcdserver/config.go snapshot/v3_snapshot.go

…leases Add a persist expiry flag that allows the expiry field of leases to be persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496

This moves lease pile-up reduction into a goroutine which mostly operates on a copy of the lease list, to avoid locking. This prevents timeouts when the lessor is locked for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496. Before: ``` BenchmarkLessorPromote1-16 500000 4036 ns/op BenchmarkLessorPromote10-16 500000 3932 ns/op BenchmarkLessorPromote100-16 500000 3954 ns/op BenchmarkLessorPromote1000-16 300000 3906 ns/op BenchmarkLessorPromote10000-16 300000 4639 ns/op BenchmarkLessorPromote100000-16 100 27216481 ns/op BenchmarkLessorPromote1000000-16 100 325164684 ns/op ``` After: ``` BenchmarkLessorPromote1-16 500000 3769 ns/op BenchmarkLessorPromote10-16 500000 3835 ns/op BenchmarkLessorPromote100-16 500000 3829 ns/op BenchmarkLessorPromote1000-16 500000 3665 ns/op BenchmarkLessorPromote10000-16 500000 3800 ns/op BenchmarkLessorPromote100000-16 300000 4114 ns/op BenchmarkLessorPromote1000000-16 300000 5143 ns/op ```

This moves lease pile-up reduction into a goroutine which mostly operates on a copy of the lease list, to avoid locking. This prevents timeouts when the lessor is locked for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496. We had a problem where when we had a lot of leases (100kish), and a leader election happened the lessor was locked for a long time causing timeouts when we tried to grant a lease. This changes it so that the lessor is locked in batches, which allows for creation of leases while the leader is initializing the lessor. It still won't start expiring leases until it's all done rewriting them, though. Before: ``` BenchmarkLessorPromote1-16 500000 4036 ns/op BenchmarkLessorPromote10-16 500000 3932 ns/op BenchmarkLessorPromote100-16 500000 3954 ns/op BenchmarkLessorPromote1000-16 300000 3906 ns/op BenchmarkLessorPromote10000-16 300000 4639 ns/op BenchmarkLessorPromote100000-16 100 27216481 ns/op BenchmarkLessorPromote1000000-16 100 325164684 ns/op ``` After: ``` BenchmarkLessorPromote1-16 500000 3769 ns/op BenchmarkLessorPromote10-16 500000 3835 ns/op BenchmarkLessorPromote100-16 500000 3829 ns/op BenchmarkLessorPromote1000-16 500000 3665 ns/op BenchmarkLessorPromote10000-16 500000 3800 ns/op BenchmarkLessorPromote100000-16 300000 4114 ns/op BenchmarkLessorPromote1000000-16 300000 5143 ns/op ```

stale · 2020-04-07T06:11:38Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

jcalvert mentioned this issue Apr 3, 2018

Add new flag to persist lease expiry #9526

Closed

gyuho added the area/performance label May 2, 2018

cosgroveb mentioned this issue May 7, 2018

lease: do lease pile-up reduction in the background #9699

Closed

echlebek mentioned this issue Oct 21, 2019

Stability and Performance Epic sensu/sensu-go#3348

Closed

stale bot added the stale label Apr 7, 2020

stale bot closed this as completed Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader promotions exceed default timeout values when etcd is holding many leases #9496

Leader promotions exceed default timeout values when etcd is holding many leases #9496

cosgroveb commented Mar 27, 2018

gyuho commented Mar 27, 2018

stale bot commented Apr 7, 2020

Leader promotions exceed default timeout values when etcd is holding many leases #9496

Leader promotions exceed default timeout values when etcd is holding many leases #9496

Comments

cosgroveb commented Mar 27, 2018

gyuho commented Mar 27, 2018

stale bot commented Apr 7, 2020