Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new flag to persist lease expiry #9526

Closed
wants to merge 1 commit into from

Conversation

jcalvert
Copy link
Contributor

@jcalvert jcalvert commented Apr 3, 2018

This pull request provides an opt-in configuration flag to persist lease expiry. This is set up as a flag as the extra overhead would be undesirable for a cluster where leases are frequently renewed, as this does add persist overhead there. However, in some cases where the extra network/disk traffic is acceptable, this solves two problems:

  • In the case of using Etcd for distributed locking utilizing leases, a leader election currently resets the expiry since it is only held in memory. If a crashed resource held a lock while a leader election occurs (eg during a cluster upgrade) that resource could be held for up to twice as long as the specified TTL. This creates unexpected behavior and necessitates implementation of staleness checking in application code that uses etcd.

  • Addressing the issue in Leader promotions exceed default timeout values when etcd is holding many leases #9496 - when there are large numbers of leases (~10 million), the algorithm to prevent lease expiration pile-up upon Promote has significant time cost that can lead to timeouts, as it has to iterate over all leases. By persisting this expiry value, the current distribution of time remaining is maintained and pile-up will not occur.

This is achieved by adding the existing expiry on the lease type to the protobuf file for leases, and then when the flag is true, persisting the lease when that value updates(Grant, Renew), if and only if the server is configured to do so.

coauthored with @cosgroveb

@xiang90
Copy link
Contributor

xiang90 commented Apr 12, 2018

i would rather make this per lease basis instead of a global flag of the server.

@jcalvert
Copy link
Contributor Author

@xiang90

Our current use case for this change is one where we would not want a mixture of expiry modes. We currently do not envision that we would have one in the future, and hesitate to speculate on what that use case might be.

Working on a per lease basis would also require a client side change, which is a larger commitment than we're looking to make at this point. Additionally we would need to find an alternative way to address the performance issue in the second bullet point.

If you all think that a configuration flag isn't a reasonable way to do this, we're certainly willing to think on it and decide if we want to come back to it later.

@xiang90
Copy link
Contributor

xiang90 commented Apr 17, 2018

@jcalvert Persisting leases has a huge cost when you have a lot of keep-alived leases. For short leases, I doubt there is a need to persist them. For long leases, maybe there is a use case.

@jcalvert
Copy link
Contributor Author

@xiang90 Our use case is a very large number(10 million) of long lived leases that never receive a keep alive, as we use this as a TTL. This helps us avoid needing to embed timestamps and do checking in our client side code. We would be happy to explain more about this usecase if necessary!

…leases

Add a persist expiry flag that allows the expiry field of leases to be
persisted along with the ID and ttl. Defaults to false.

Addresses etcd-io#9496
@xiang90
Copy link
Contributor

xiang90 commented Apr 23, 2018

Our use case is a very large number(10 million) of long lived leases that never receive a keep alive

this is not really what current lease designs for. it designs for using short term timeout to indicate liveness. the best way to handle your use case is to add persistency at per lease basis.

@junhopark
Copy link

Hi @xiang90 - I wanted to let you know that on our team here, we've been discussing #9699 and still actively pursuing on getting that pull request merged eventually. What we ultimately decide to do with this particular PR (#9526) may depend on the outcome of #9699 and so if it's okay with you, we wanted to leave this PR opened. Can you please let us know if you'd like us to take an action on this PR soon? Thank you.

@xiang90
Copy link
Contributor

xiang90 commented Jul 24, 2018

this is already fixed by #9924

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants