New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle TTLs on update #68699
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: enj If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We don't use the watch cache for events, so shouldn't be a blocker there.
…On Fri, Sep 14, 2018 at 6:38 PM k8s-ci-robot ***@***.***> wrote:
[APPROVALNOTIFIER] This PR is *NOT APPROVED*
This pull-request has been approved by: *enj
<#68699#>*
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: *smarterclayton*
If they are not already assigned, you can assign the PR to them by writing /assign
@smarterclayton in a comment when ready.
The full list of commands accepted by this bot can be found here
<https://go.k8s.io/bot-commands>.
The pull request process is described here
<https://git.k8s.io/community/contributors/guide/owners.md#the-code-review-process>
Needs approval from an approver in each of these files:
- *pkg/registry/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/OWNERS>*
- *staging/src/k8s.io/apiserver/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/OWNERS>*
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#68699 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p4Qj6TQgC3-v41BjND_ZIFYrNJz8ks5ubC_MgaJpZM4WqCrE>
.
|
This is a lot of change. What's the actual bug this is fixing? |
Ah yeah, forgot about that. Will update the description.
Mostly just comments and tests...
// TTLFunc returns the TTL (time to live) that objects should be persisted
// with. The existing parameter is the current TTL or the default for this
// operation. The update parameter indicates whether this is an operation
// against an existing object.
//
// Objects that are persisted with a TTL are evicted once the TTL expires.
TTLFunc func(obj runtime.Object, existing uint64, update bool) (uint64, error) In the ETCD2 code, when The code as it stands today, with ETCD3, always sets |
992e415
to
93c7255
Compare
/assign @jpbetz |
@wenjiaswe: GitHub didn't allow me to request PR reviews from the following users: wenjiaswe. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This change updates the ETCD3 objState to store the associated lease ID. If the objState has an associated lease, the value is used to look up the remaining TTL. This value is used to populate the ResponseMeta.TTL field which is passed to TTLFuncs as the existing TTL value. This guarantees that an object with a TTLFunc can reliably use the existing TTL value on update. Furthermore, as an optimization, objects that use the existing TTL on update will reuse the same lease. If an object with a TTL is updated, it will ignore the suggestion from the watch cache and instead directly pull from ETCD. This could have some performance implications for TTL'd resources that rely on the watch cache as an optimization. No such resource exists in core Kubernetes today. Signed-off-by: Monis Khan <mkhan@redhat.com>
93c7255
to
d656cc0
Compare
Added a test to confirm the updated |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ETCD2 code, when
update
wastrue
, you could simplyreturn existing
if you wanted to leave the TTL value unchanged. Getting this same behavior with ETCD3 is more difficult since you have to deal with an extra API (leases) and the watch cache (data from the cache does not have lease/TTL information).The code as it stands today, with ETCD3, always sets
existing
to0
because it does not fetch the lease to retrieve the current TTL value. This change corrects that deficit as best as it can while preserving the watch cache optimization for all non-TTL'd objects.
@enj, Would you add this to the description? The explanation of how this should work (and does with etcd2) is helpful in understanding why this needs to be fixed.
@@ -49,6 +49,9 @@ func NewREST(optsGetter generic.RESTOptionsGetter, ttl uint64) *REST { | |||
NewListFunc: func() runtime.Object { return &api.EventList{} }, | |||
PredicateFunc: event.MatchEvent, | |||
TTLFunc: func(runtime.Object, uint64, bool) (uint64, error) { | |||
// this means that on every update, bump the TTL | |||
// so an event expires at last modified time + ttl | |||
// the last modified timestamp is not stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a bit of trouble groking the last line here. What are we trying to indicate?
For first two lines, Maybe just "Bump the TTL on every update so events expire at last modified time + TTL." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For first two lines, Maybe just "Bump the TTL on every update so events expire at last modified time + TTL." ?
👍
Had a bit of trouble groking the last line here. What are we trying to indicate?
I was thinking through how etcd TTLs are Kube API hostile (see bold):
- They do not respect Kube API semantics such as finalizers
- You cannot generally TTL a resource (requires code in the server). The GC comes close when you have an ownership hierarchy but that does not work when the owner is "time."
- TTLs are fuzzy - they expire at some point close to the requested TTL and the
leaseManager
makes this even more fuzzy. I am also unsure of the semantics of TTLs when etcd goes through leader election changes. - There is no way from the Kube API to determine if something has a lease attached to it and the remaining time
- Even when you know the amount a TTL will be bumped based on configuration, the Kube API does not track last modified time so you cannot calculate when the resource will expire
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janetkuo @caesarxuchao Thoughts on above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like events have a field called lastTimestamp
which is kept updated by the event sink code so I used that to make the comment clearer.
|
||
// always return a nil TTL when we have no TTL func so that the | ||
// storage interface can distinguish between no TTL, 0 TTL, 1+ TTL | ||
if e.TTLFunc == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we eliminate the copy paste of the e.TTLFunc == nil
checks? Maybe default TTL function that we use and does the right thing so that we don't have to special case e.TTLFunc == nil
plus modify calculateTTL
to return a pointer for TTL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will see what I can do, the semantics for TTL on create and update are different since create does not care about nil
whereas update does.
@@ -582,11 +582,16 @@ func (e *Store) Update(ctx context.Context, name string, objInfo rest.UpdatedObj | |||
return nil, nil, err | |||
} | |||
} | |||
|
|||
// always return a nil TTL when we have no TTL func so that the | |||
// storage interface can distinguish between no TTL, 0 TTL, 1+ TTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the semantic difference between "no TTL" and "0 TTL"? Might be worth clarifying here as it's not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see storage.Interface GuarenteedUpdate
has a listing of the semantics. Maybe reference that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the distinction is really important on update, but does not matter for create.
if st != nil && ttl == st.meta.TTL && st.lease != clientv3.NoLease { | ||
return []clientv3.OpOption{clientv3.WithLease(st.lease)}, nil | ||
} | ||
|
||
id, err := s.leaseManager.GetLease(ctx, ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference between s.leaseManager.GetLease(ctx, ttl)
and ttl == st.meta.TTL
is that the lease manager call would create options with a remaining TTL == to the provided ttl, but ttl == st.meta.TTL
will return the lease ID previously used and not extend the remaining TTL? I.e. for st.meta.TTL = 5
where the lease exists and has a remaining TTL of 3 seconds, because 2 seconds have already passed ttlOps(ctx, 5, st)
would return a ID of existing TTL, with 3 seconds remaining, but for another objState.meta.TTL = 30
where lease exists, ttlOps(ctx, 5, st)
returns a lease with remaining TTL of 5 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference between
s.leaseManager.GetLease(ctx, ttl)
andttl == st.meta.TTL
is that the lease manager call would create options with a remaining TTL == to the provided ttl, butttl == st.meta.TTL
will return the lease ID previously used and not extend the remaining TTL?
Yeah, without the new if
statement I added, even when the same TTL is requested, a new lease would get created and attached to the key (and the old lease would be unattached from the key). I assume the point of passing update
and existing
to the TTLFunc
is to allow flows that basically say "do not change the TTL once it is set." I doubt it is a super common flow, but it does not seem wise to keep creating new leases when the old one can be reused.
I.e. for
st.meta.TTL = 5
where the lease exists and has a remaining TTL of 3 seconds, because 2 seconds have already passedttlOps(ctx, 5, st)
would return a ID of existing TTL, with 3 seconds remaining
st.meta.TTL
is assigned to the remaining TTL value (not the originally requested TTL), meaning in this example it would be 3
. The TTLFunc
tells you how much time is remaining via the existing
parameter.
If it was based on the originally requested TTL, the bump case used by events would not work: you request 30s the first time, and if you request 30s again it would not bump and reuse the lease.
but for another
objState.meta.TTL = 30
where lease exists,ttlOps(ctx, 5, st)
returns a lease with remaining TTL of 5 seconds?
I believe this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference between
s.leaseManager.GetLease(ctx, ttl)
andttl == st.meta.TTL
is that the lease manager call would create options with a remaining TTL == to the provided ttl, butttl == st.meta.TTL
will return the lease ID previously used and not extend the remaining TTL?Yeah, without the new
if
statement I added, even when the same TTL is requested, a new lease would get created and attached to the key (and the old lease would be unattached from the key). I assume the point of passingupdate
andexisting
to theTTLFunc
is to allow flows that basically say "do not change the TTL once it is set." I doubt it is a super common flow, but it does not seem wise to keep creating new leases when the old one can be reused.
The lease manager already ensures that leases are reused when appropriate.
I.e. for
st.meta.TTL = 5
where the lease exists and has a remaining TTL of 3 seconds, because 2 seconds have already passedttlOps(ctx, 5, st)
would return a ID of existing TTL, with 3 seconds remaining
st.meta.TTL
is assigned to the remaining TTL value (not the originally requested TTL), meaning in this example it would be3
. TheTTLFunc
tells you how much time is remaining via theexisting
parameter.If it was based on the originally requested TTL, the bump case used by events would not work: you request 30s the first time, and if you request 30s again it would not bump and reuse the lease.
but for another
objState.meta.TTL = 30
where lease exists,ttlOps(ctx, 5, st)
returns a lease with remaining TTL of 5 seconds?I believe this is correct.
It feels like behavior violates the principal of least surprise. If the goal is to support a "continue with existing lease" option, I'm in favor or making it an more explicit than providing a TTL that matches the meta.TTL. One concrete example of how ttl == st.meta.TTL
might be suprising is that if one intends to renew the lease to a desired TTL to X seconds, so they return that value, but meta.TTL just happens to be X so instead they get the remaining time on the existing lease, not a full X seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lease manager already ensures that leases are reused when appropriate.
It just does simple math based on last lease it created. It cannot correctly reuse leases over time for the same key. It shares leases across keys created at close intervals within some approximation of the requested TTL.
One concrete example of how ttl == st.meta.TTL might be suprising is that if one intends to renew the lease to a desired TTL to X seconds, so they return that value, but meta.TTL just happens to be X so instead they get the remaining time on the existing lease, not a full X seconds.
In general the TTL value is fuzzy with the lease manager in play. Realistically one should expected the TTL to be within +/- 30 seconds of the requested time. You asked for some TTL (from "now") and we honored it approximately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just does simple math based on last lease it created. It cannot correctly reuse leases over time for the same key. It shares leases across keys created at close intervals within some approximation of the requested TTL.
We shouldn't need to care what etcd lease is used, only that the "at least as long as X" guarantee is adhered to. The lease manager achieves this and does it in a way that offers some efficiency gains.
In general the TTL value is fuzzy with the lease manager in play. Realistically one should expected the TTL to be within +/- 30 seconds of the requested time. You asked for some TTL (from "now") and we honored it approximately.
Anyone expecting +/- 30 seconds is in trouble, because that's not the guarantee. The guarantee provided by etcd is that the lease will expire no sooner than the TTL, hence that's the guarantee we've got in k8s and should focus on. In etcd leader elections extend out TTL without any theoretical bounds and we recently merged etcd-io/etcd#9924 to try to reduce the probability of this being an issue in practice.
The problem as I see it with this change is that if one returns 5 s as expected, they expect the lease to last at least 5 seconds, but if the st.meta.TTL and the existing lease only has 3 seconds left, then they got 3 seconds instead of the desired 5, which does not honor their intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem as I see it with this change is that if one returns 5 s as expected, they expect the lease to last at least 5 seconds, but if the st.meta.TTL and the existing lease only has 3 seconds left, then they got 3 seconds instead of the desired 5, which does not honor their intent.
I do not see how this would ever happen. If you asked for 5 and 3 s was remaining, you would get 5. If you said 3, you would get 3 and the lease would be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone expecting +/- 30 seconds is in trouble
I meant that the TTL requested from etcd would be within some range, not that etcd would honor it explicitly within that range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I have left this in (because it is easy to drop the if
statement and update the tests to not require lease reuse).
@@ -738,28 +747,56 @@ func (s *store) getStateFromObject(obj runtime.Object) (*objState, error) { | |||
return state, nil | |||
} | |||
|
|||
func (s *store) updateState(st *objState, userUpdate storage.UpdateFunc) (runtime.Object, uint64, error) { | |||
ret, ttlPtr, err := userUpdate(st.obj, *st.meta) | |||
var errStaleTTL = errors.New("updateState needs current objState for TTL calculation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to etcd3/errors.go
and make it a const?
Is 'stale' the right word here? That imply something has exceeded its shelf life. But the comment suggests the objState is missing needed data for a TTL calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errIncompleteTTL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. With a clear comment and error string that feels clear enough to me.
var ttl uint64 | ||
if ttlPtr != nil { | ||
// if userUpdate asserts a TTL and we are using a cached object, it means that we passed a zero TTL | ||
// to userUpdate as the existing object's TTL (because the cache does not know the TTL value). | ||
// send a specific error in this case to signal that we need to get the actual object from ETCD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: etcd should be lowercase per etcd community guidelines (it's not an acronym).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that is a fun bit of trivia.
var ttl uint64 | ||
if ttlPtr != nil { | ||
// if userUpdate asserts a TTL and we are using a cached object, it means that we passed a zero TTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding all the TTL states and their semantics difficult to keep track of. I'm concerned that this code is going to be difficult to maintain. Should introduce additional structs with more explicitly named states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write it out on paper and see if I can come up with something cleaner.
@@ -582,11 +582,16 @@ func (e *Store) Update(ctx context.Context, name string, objInfo rest.UpdatedObj | |||
return nil, nil, err | |||
} | |||
} | |||
|
|||
// always return a nil TTL when we have no TTL func so that the | |||
// storage interface can distinguish between no TTL, 0 TTL, 1+ TTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see storage.Interface GuarenteedUpdate
has a listing of the semantics. Maybe reference that?
Added to description, will add it to the |
Signed-off-by: Monis Khan <mkhan@redhat.com>
@enj Previously, you mentioned that this might not actually need the "reuse lease" functionality. Would it be safe to strip that out? It seems to be the main remaining cause of complexity in the logic. |
Signed-off-by: Monis Khan <mkhan@redhat.com>
@enj: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@enj: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This change updates the ETCD3 objState to store the associated lease ID. If the objState has an associated lease, the value is used to look up the remaining TTL. This value is used to populate the ResponseMeta.TTL field which is passed to TTLFuncs as the existing TTL value.
This guarantees that an object with a TTLFunc can reliably use the existing TTL value on update. Furthermore, as an optimization, objects that use the existing TTL on update will reuse the same lease.
If an object with a TTL is updated, it will ignore the suggestion from the watch cache and instead directly pull from ETCD. This could have some performance implications for TTL'd resources that rely on the watch cache as an optimization. No such resource exists in core Kubernetes today.
To understand the underlying issue, one must look at how TTLFunc is defined today:
In the ETCD2 code, when
update
wastrue
, you could simplyreturn existing
if you wanted to leave the TTL value unchanged. Getting this same behavior with ETCD3 is more difficult since you have to deal with an extra API (leases) and the watch cache (data from the cache does not have lease/TTL information).The code as it stands today, with ETCD3, always sets
existing
to0
because it does not fetch the lease to retrieve the current TTL value. This change corrects that deficit as best as it can while preserving the watch cache optimization for all non-TTL'd objects.Signed-off-by: Monis Khan mkhan@redhat.com
/kind bug
@kubernetes/sig-api-machinery-pr-reviews