Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader promotions exceed default timeout values when etcd is holding many leases #9496

Closed
cosgroveb opened this issue Mar 27, 2018 · 2 comments

Comments

@cosgroveb
Copy link
Contributor

Hey folks,

This is a bit of a follow-up to the issue @mgates opened around improving lease performance #9418.

While testing out a cluster tracking many leases (millions) we noticed that if a leader election occurred, it would endlessly time out and fail to elect a leader leaving the cluster in an unhealthy state. After digging in and adding benchmarks to lessor.go it became apparent that the problem lies in the Promote method and, specifically, is another O(n) problem much like #9418.

Specifically, lessor iterates through the entire lease map several times and even sorts them (unsafeLeases).

These benchmarks show the effect pretty clearly:

func BenchmarkLessorPromote1(b *testing.B) { benchmarkLessorPromote(1, b) }                    
func BenchmarkLessorPromote10(b *testing.B) { benchmarkLessorPromote(10, b) }                  
func BenchmarkLessorPromote100(b *testing.B)  { benchmarkLessorPromote(100, b) }               
func BenchmarkLessorPromote1000(b *testing.B)  { benchmarkLessorPromote(1000, b) }             
func BenchmarkLessorPromote10000(b *testing.B)  { benchmarkLessorPromote(10000, b) }           
func BenchmarkLessorPromote100000(b *testing.B)  { benchmarkLessorPromote(100000, b) }         
}  
                                                                                               
func benchmarkLessorPromote(size int, b *testing.B) {                                          
  be, tmpPath := backend.NewDefaultTmpBackend()                                                
  le := newLessor(be, minLeaseTTL)                                                             
  defer le.Stop()                                                                              
  defer cleanup(be, tmpPath)                                                                   
  for i := 0; i < size; i++ {                                                                  
    le.Grant(LeaseID(i), int64(100+i))                                                         
  }                                                                                            
  b.ResetTimer()                                                                               
  for i := 0; i < b.N; i++ {                                                                   
    le.Promote(0)                                                                              
  }                                                                                            
}                                                                                              

Results:

goos: linux                                                              
goarch: amd64                                                            
BenchmarkLessorPromote1-16              10000000               142 ns/op 
BenchmarkLessorPromote10-16               100000             12679 ns/op 
BenchmarkLessorPromote100-16               10000            136276 ns/op 
BenchmarkLessorPromote1000-16               1000           1404405 ns/op 
BenchmarkLessorPromote10000-16                 3         401482438 ns/op 
BenchmarkLessorPromote100000-16                1        5157859968 ns/op 
PASS                                                                     
ok      _/home/pair/etcd/lease  16.093s                                  

Obviously, we can take stabs at some improvements but given the complexity of this routine, we're not really confident that we understand the intent of every part of it.

We observed that a workaround does exist in making the leaseRevokeRate unusually large. Is it possible to make anything configurable here or perhaps feature switch this off? If not, can you folks provide some guidance that could help us make some performance improvements with confidence?

cc @mgates @jcalvert

@gyuho
Copy link
Contributor

gyuho commented Mar 27, 2018

Revoke rate is there to prevent spikes in Raft proposals.

We observed that a workaround does exist in making the leaseRevokeRate unusually large. Is it possible to make anything configurable here or perhaps feature switch this off?

Let's address slow revoke routine with #9418 first, and investigate further what's the next performance bottleneck.

Thanks!

jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 3, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 3, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 3, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 3, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false.

Addresses etcd-io#9496
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 4, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false.

Addresses etcd-io#9496

Conflicts:
	embed/config.go
	embed/etcd.go
	etcdmain/config.go
	etcdserver/config.go
	snapshot/v3_snapshot.go
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Apr 23, 2018
…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false.

Addresses etcd-io#9496
mgates pushed a commit to mgates/etcd that referenced this issue May 4, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue May 11, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue May 25, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue May 25, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 6, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 6, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 6, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
jcalvert pushed a commit to mgates/etcd that referenced this issue Jun 7, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 12, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 12, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
mgates pushed a commit to mgates/etcd that referenced this issue Jun 12, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
jcalvert pushed a commit to jcalvert/etcd that referenced this issue Jul 2, 2018
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
@stale
Copy link

stale bot commented Apr 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 7, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants