Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize identity allocation with CRD backend. #23064

Merged
merged 1 commit into from Jan 23, 2023

Conversation

alan-kut
Copy link
Contributor

Introduce by key indexer to prevent going through all the identities all the time.
GlobalIdentity is moved to a separate packet so it can be imported without cyclic dependency.

Signed-off-by: Alan Kutniewski kutniewski@google.com
Fixes: #22984

Optimize getting identity by key with CRD Backend by introducing indexer.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 12, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 12, 2023
@alan-kut alan-kut marked this pull request as ready for review January 12, 2023 14:51
@alan-kut alan-kut requested review from a team as code owners January 12, 2023 14:51
@alan-kut alan-kut force-pushed the ids-opt branch 2 times, most recently from 7e75539 to 59cd5d8 Compare January 13, 2023 09:17
@alan-kut
Copy link
Contributor Author

Travic CI - Pull Request - Build Errored:
It first failed with

The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install kernel-package gnupg libncurses5" failed and exited with 100 during .

and then succeeded.

Gateway API Conformance Test failed with:

Error from server (NotFound): pods "coredns-565d847f94-g6mkl" not found

I don't think it is related to changes in the PR.

@squeed
Copy link
Contributor

squeed commented Jan 16, 2023

Kicked travis; if it fails again, then something is wrong (obviously not caused by this PR).

Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice easy win, thanks!

A few nits, a few possible broader refactors, but I don't want to bikeshed :-)

@alan-kut alan-kut force-pushed the ids-opt branch 4 times, most recently from 372f769 to e9ce046 Compare January 16, 2023 15:23
@alan-kut alan-kut requested review from squeed and removed request for aditighag January 16, 2023 15:25
@alan-kut
Copy link
Contributor Author

I replied/fixed everything except for the indexing function. I'll try to determine whether it is safe to return error if the object is not a CiliumIdentity - it should be but I don't want to risk the store to panic

@aditighag
Copy link
Member

aditighag commented Jan 19, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unable to download helm chart v1.12 from GitHub

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

@aditighag aditighag requested a review from a team January 19, 2023 01:21
@aditighag
Copy link
Member

I triggered tests just to be sure there are no regressions.

@alan-kut
Copy link
Contributor Author

Cilium Datapath (ci-datapath) — Datapath tests failed

📋 Test Report
❌ 5/31 tests failed (12/237 actions), 0 tests skipped, 0 scenarios skipped:
connectivity test failed: 5 tests failed
Test [no-policies]:
  ❌ no-policies/pod-to-world/http-to-one.one.one.one-0: cilium-test/client-755fb678bd-76fh5 (10.244.3.33) -> one.one.one.one-http (one.one.one.one:80)
  ❌ no-policies/pod-to-world/https-to-one.one.one.one-0: cilium-test/client-755fb678bd-76fh5 (10.244.3.33) -> one.one.one.one-https (one.one.one.one:443)
  ❌ no-policies/pod-to-world/https-to-one.one.one.one-index-0: cilium-test/client-755fb678bd-76fh5 (10.244.3.33) -> one.one.one.one-https-index (one.one.one.one:443)
  ❌ no-policies/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-http (one.one.one.one:80)
  ❌ no-policies/pod-to-world/https-to-one.one.one.one-1: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-https (one.one.one.one:443)
  ❌ no-policies/pod-to-world/https-to-one.one.one.one-index-1: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-https-index (one.one.one.one:443)
Test [to-entities-world]:
  ❌ to-entities-world/pod-to-world/http-to-one.one.one.one-0: cilium-test/client-755fb678bd-76fh5 (10.244.3.33) -> one.one.one.one-http (one.one.one.one:80)
  ❌ to-entities-world/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-http (one.one.one.one:80)
Test [client-egress-l7]:
  ❌ client-egress-l7/pod-to-world/http-to-one.one.one.one-0: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-http (one.one.one.one:80)
Test [client-egress-l7-named-port]:
  ❌ client-egress-l7-named-port/pod-to-world/http-to-one.one.one.one-0: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-http (one.one.one.one:80)
Test [to-fqdns]:
  ❌ to-fqdns/pod-to-world/http-to-one.one.one.one-0: cilium-test/client-755fb678bd-76fh5 (10.244.3.33) -> one.one.one.one-http (one.one.one.one:80)
  ❌ to-fqdns/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-5b97d7bc66-t4nb4 (10.244.3.66) -> one.one.one.one-http (one.one.one.one:80)
Error: Process completed with exit code 1.

@alan-kut
Copy link
Contributor Author

All those curls terminated with exit code 28 which is time out.

@squeed
Copy link
Contributor

squeed commented Jan 19, 2023

test failures should be fixed by #23171

@alan-kut
Copy link
Contributor Author

I'll wait for it then and then rebase once it's merged.

@squeed
Copy link
Contributor

squeed commented Jan 23, 2023

/test

@aditighag aditighag added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 23, 2023
@aditighag
Copy link
Member

🚢

@ldelossa ldelossa merged commit 78cbaf0 into cilium:master Jan 23, 2023
@christarazi christarazi added area/daemon Impacts operation of the Cilium daemon. kind/performance There is a performance impact of this. kind/enhancement This would improve or streamline existing functionality. labels Jan 24, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jan 24, 2023
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064 ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064 ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

pkg/allocator: Improve 'Key allocation attempt failed' handling for CRD mode

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 5, 2024
…RD mode

[ upstream commit e39fcae ]

[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 8, 2024
…RD mode

[ upstream commit e39fcae ]

[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
antonipp pushed a commit to DataDog/cilium that referenced this pull request Jan 17, 2024
…RD mode

[ upstream commit e39fcae ]

[ Backporter's notes: Ignoring changes from cilium#23064, replaced maps.Clone() with Go 1.19 compatible code ]

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
julianwiedmann pushed a commit that referenced this pull request Feb 7, 2024
…RD mode

[ upstream commit e39fcae ]

[ Backporter's notes: Ignoring changes from #23064, replaced maps.Clone() with Go 1.19 compatible code ]

In CRD mode, the Cilium agent uses CRD to create identities. After an
identity is created, the agent acquires a reference for that key. This
involves fetching the CRD from the local Kubernetes cache and checking
for an annotation applied by cilium-operator to mark the identity for
deletion. However, there may be a delay before the Cilium Identity is
cached locally, leading to the 'Key allocation attempt failed' error. This
patch ensures that we fallback to the newly allocated Cilium Identity if
it's not found in the Kubernetes cache.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/community-contribution This was a contribution made by a community member. kind/enhancement This would improve or streamline existing functionality. kind/performance There is a performance impact of this. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFP: Optimization of identity allocation with CRD as a backend
6 participants