Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service: Always allocate higher ID for svc/backend #18113

Merged
merged 1 commit into from Dec 3, 2021

Conversation

brb
Copy link
Member

@brb brb commented Dec 3, 2021

Previously, it was possible that a backend or a service would get
allocated ID, which would be ID_backend_A < ID < ID_backend_B. This
could have happened after cilium-agent restart, as the nextID was not
advanced upon the restoration of IDs.

This could have led to situations in which the per-packet LB could
selected a backend which did not belong to a requested service when the
following was fulfilled in the chronological order:

  1. Previously the same client made the request to the service and the
    backend with ID_x was chosen.
  2. The service endpoint (backend) with ID_x was removed.
  3. cilium-agent was restarted.
  4. A new service backend which does not belong to the initial service
    was created and got the ID_x allocated.
  5. The CT_SERVICE entry for the old connection was not removed by the CT
    GC.
  6. The same client made a new connection to the same service from the
    same src port.

The above led the lb{4,6}_local() to select the wrong backend, as it
found the CT_SERVICE entry with the backend ID_x.

The advancement of the nextID upon the restoration only partly mitigates
the issue. The real fix would be to introduce a match map which key
would be (svc_id, backend_id), and it would be populated by the agent.
The lb{4,6}_local() routines would consult the map to detect whether the
backend belongs to the service.

Previously, it was possible that a backend or a service would get
allocated ID, which would be ID_backend_A < ID < ID_backend_B. This
could have happened after cilium-agent restart, as the nextID was not
advanced upon the restoration of IDs.

This could have led to situations in which the per-packet LB could
selected a backend which did not belong to a requested service when the
following was fulfilled in the chronological order:

1. Previously the same client made the request to the service and the
   backend with ID_x was chosen.
2. The service endpoint (backend) with ID_x was removed.
3. cilium-agent was restarted.
4. A new service backend which does not belong to the initial service
   was created and got the ID_x allocated.
5. The CT_SERVICE entry for the old connection was not removed by the CT
   GC.
6. The same client made a new connection to the same service from the
   same src port.

The above led the lb{4,6}_local() to select the wrong backend, as it
found the CT_SERVICE entry with the backend ID_x.

The advancement of the nextID upon the restoration only partly mitigates
the issue. The real fix would be to introduce a match map which key
would be (svc_id, backend_id), and it would be populated by the agent.
The lb{4,6}_local() routines would consult the map to detect whether the
backend belongs to the service.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb added kind/bug This is a bug in the Cilium logic. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/loadbalancing labels Dec 3, 2021
@brb brb requested review from borkmann and a team December 3, 2021 15:21
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.11.0 Dec 3, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.6 Dec 3, 2021
@brb
Copy link
Member Author

brb commented Dec 3, 2021

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 3, 2021
@borkmann borkmann merged commit 33bd95c into master Dec 3, 2021
@borkmann borkmann deleted the pr/brb/fix-backend-id-alloc branch December 3, 2021 18:42
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.0 Dec 3, 2021
@joestringer joestringer added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Dec 5, 2021
@joestringer joestringer moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.0 Dec 5, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.6 Dec 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.6 Dec 10, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.6 Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
No open projects
1.10.6
Backport done to v1.10
1.11.0
Backport done to v1.11
Development

Successfully merging this pull request may close these issues.

None yet

4 participants