Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FQDN: fix incorrect reaping of newly-expired IPs #31205

Merged
merged 2 commits into from Mar 11, 2024

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Mar 6, 2024

A bug was found where a low-TTL name was incorrectly reaped despite
being part of an active connection. After looking at logs, and
reproducing locally, it was determined that there is an unfortunate
interleaving between the DNS and CT GC loops.

The code attempts to prevent this issue by ensuring that names inserted
after CT GC has started are exempt from reaping. However, we don't
actually track the insertion time, we track the DNS TTL expiration time,
which is strictly in the past. In fact, it can be up to a minute in the
past. We shouldn't rely on timestamps anyways, as the scheduler can
always play tricks on us.

So, if a CT GC run has started and finished in the time between name
expiration and insertion in to zombies, the IP address is immediately
considered dead and unnecessarily reaped.

Timeline:

T1. name expires
T2. CT GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()

At T5, zombies.GC will remove IPs associated with name, because T2 > T1.

The solution is to use an explicit serial number to ensure that CTGC has
completed a full run before we are allowed to delete an IP. We actually
need to let CT GC run twice, as it may have started before this zombie
was added and thus not marked it alive.

Additionally, we already have a grace period, the idle connection
timeout, that gives applications a chance to re-use an expired IP.
However, we did not respect this grace period if the IP in question did
not have an entry in conntrack. So, pad deletion time by this grace
period as well, just to be sure this grace period applies to all
possible deletions.

Separately, don't tell FQDN that CT GC is complete if it encountered an error, or didn't do a full pass.

Fixes a bug where ToFQDN IPs may be garbage collected too early, disrupting existing connections.

@squeed squeed requested review from a team as code owners March 6, 2024 16:23
@squeed squeed requested review from aspsk and pippolo84 March 6, 2024 16:23
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 6, 2024
@squeed squeed added kind/bug This is a bug in the Cilium logic. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. area/fqdn Affects the FQDN policies feature labels Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 6, 2024
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I was sort of wondering about alternatives that avoid add extra fields that are not exported to DNSZombieMapping and ignored by the marshaller. Given the timeline you've detailed:

T1. name expires
T2. GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()

Have you considered modifying step T4 to pass T4 instead of T1 to the Upsert call? It seems a bit suspicious that the time value passed in is subject to a race.

To clarify, is T2 CT GC or FQDN GC?

pkg/fqdn/cache.go Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented Mar 7, 2024

Have you considered modifying step T4 to pass T4 instead of T1 to the Upsert call?

I initially didn't do this, because the expiration time is relevant for determining which name to reap in the event of an overflow. However, I did wind up taking this suggestion with a bit of extra logic to handle the expiration-in-the-future case (i.e. over limit).

It seems a bit suspicious that the time value passed in is subject to a race.

Be careful, lest the ghost of Leslie Lamport come down and cause race conditions in all of your code that relies on clocks :-). Remember, the scheduler can and will break all your assumptions. Besides, I am uncomfortable with the fiddliness of exactly how the CT GC timestamp is interpreted. If we want to ensure CT GC runs twice, I'd rather use a counter.

@squeed
Copy link
Contributor Author

squeed commented Mar 7, 2024

/test

@squeed
Copy link
Contributor Author

squeed commented Mar 7, 2024

Have you considered modifying step T4 to pass T4 instead of T1 to the Upsert call?

I initially didn't do this, because the expiration time is relevant for determining which name to reap in the event of an overflow. However, I did wind up taking this suggestion with a bit of extra logic to handle the expiration-in-the-future case (i.e. over limit).

And I just tried this, and it broke tests because of a subtle interaction with timestamps and cilium-dbg fqdn cache clean. And this is why I dislike relying on timestamps. So we can't change the expiration time from T1 to T4 without breaking fqdn cache clean, which tests rely on extensively.

Nevermind, I'm wrong. We can insert a time of T4, but not a time of T4 + grace. Complicated.

@squeed
Copy link
Contributor Author

squeed commented Mar 7, 2024

/test

@squeed
Copy link
Contributor Author

squeed commented Mar 7, 2024

It seems a bit suspicious that the time value passed in is subject to a race.

Be careful, lest the ghost of Leslie Lamport come down and cause race conditions in all of your code that relies on clocks :-).

Here's one way it could happen:

  1. FQDN.GC(T1) starts
  2. CTGCStart := T2. GC finishes.
  3. Zombie.Upsert(T1)
  4. Zombie.SetCTGCTime(T2)
  5. Zombie.GC() removes the IP, because T1 < T2

So, we really need the explicit conntrack generation -- it ensures a full CT GC pass has run, giving us a chance to mark the IP as alive. We also need the timestamps, so we can correctly compute grace period and prioritize newer connections in the event of overflow.

@chancez
Copy link
Contributor

chancez commented Mar 7, 2024

This is probably not useful as a drive-by comment, so feel free to ignore me, but don't forget to think about time going backwards too.

@stensonb
Copy link

stensonb commented Mar 9, 2024

Would this impact v1.14.2 and above? I'm pretty sure I'm seeing this in my clusters running v1.14.2.

Excited to see this fixed, either way. Thanks for your efforts @squeed

@squeed
Copy link
Contributor Author

squeed commented Mar 9, 2024

Would this impact v1.14.2 and above? I'm pretty sure I'm seeing this in my clusters running v1.14.2.

I’m not sure of the exact point version, but I’ve seen this in v1.14.5 clusters for sure.

@squeed squeed requested a review from christarazi March 11, 2024 09:59
@squeed squeed added needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.2 Mar 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.8 Mar 11, 2024
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! 💯
Just a non-blocking nit left inline.

pkg/fqdn/cache.go Outdated Show resolved Hide resolved
@squeed squeed requested a review from chancez March 11, 2024 11:26
A bug was found where a low-TTL name was incorrectly reaped despite
being part of an active connection. After looking at logs, and
reproducing locally, it was determined that there is an unfortunate
interleaving between the DNS and CT GC loops.

The code attempts to prevent this issue by ensuring that names inserted
after CT GC has started are exempt from reaping. However, we don't
actually track the insertion time, we track the DNS TTL expiration time,
which is strictly in the past. In fact, it can be up to a minute in the
past. We shouldn't rely on timestamps anyways, as the scheduler can
always play tricks on us.

So, if a CT GC run has started and finished in the time between name
expiration and insertion in to zombies, the IP address is immediately
considered dead and unnecessarily reaped.

Timeline:

T1. name expires
T2. CT GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()

At T5, zombies.GC will remove IPs associated with name, because T2 > T1.

The solution is to use an explicit serial number to ensure that CTGC has
completed a full run before we are allowed to delete an IP. We actually
need to let CT GC run twice, as it may have started before this zombie
was added and thus not marked it alive.

Additionally, we already have a grace period, the idle connection
timeout, that gives applications a chance to re-use an expired IP.
However, we did not respect this grace period if the IP in question did
not have an entry in conntrack. So, pad deletion time by this grace
period as well, just to be sure this grace period applies to all
possible deletions.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed
Copy link
Contributor Author

squeed commented Mar 11, 2024

/test

@squeed squeed added release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-blocker/1.15 This issue will prevent the release of the next version of Cilium. labels Mar 11, 2024
pkg/fqdn/cache.go Show resolved Hide resolved
@jrajahalme jrajahalme added this pull request to the merge queue Mar 11, 2024
@squeed squeed added the backport/author The backport will be carried out by the author of the PR. label Mar 11, 2024
Merged via the queue into cilium:main with commit b284170 Mar 11, 2024
62 checks passed
@squeed squeed added the backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. label Mar 11, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Mar 11, 2024
@jrajahalme jrajahalme removed the needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch label Mar 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport done to v1.15 in 1.15.2 Mar 11, 2024
@squeed squeed added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 12, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 12, 2024
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Mar 12, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.14 in 1.14.8 Mar 12, 2024
@stensonb
Copy link

@squeed just wanted to thank you for this! 1.14.8 works great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fqdn Affects the FQDN policies feature backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-blocker/1.15 This issue will prevent the release of the next version of Cilium. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
1.15.2
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

None yet

7 participants