Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate CiliumNode watcher #17873

Merged
merged 6 commits into from Nov 22, 2021

Conversation

aanm
Copy link
Member

@aanm aanm commented Nov 13, 2021

In certain cases the Cilium Operator could end up with two CiliumNode
k8s watchers. This PR re-writes the code in a way that a single
watcher will be used for all cases.

@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact. labels Nov 13, 2021
@aanm
Copy link
Member Author

aanm commented Nov 13, 2021

/test

@aanm aanm force-pushed the pr/fix-cilium-node-watchers branch from 2368af8 to c3eaf9d Compare November 13, 2021 21:23
@aanm
Copy link
Member Author

aanm commented Nov 13, 2021

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks ClusterIP Connectivity Checks service on same node

Failure Output

FAIL: cannot curl to service IP from host: time-> DNS: '0.000017()', Connect: '0.000000',Transfer '0.000000', total '5.001987'command terminated with exit code 28

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create a new GitHub issue to track it.

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK on the top level, but I was not happy with 3 sets of handler funcs which all repeat the same conversion and error handling.

operator/cilium_node.go Outdated Show resolved Hide resolved
operator/cilium_node.go Outdated Show resolved Hide resolved
}
} else {
// nodeManager not nil thus the events will be processed by
// ciliumNodeKVStore and the nodeManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code duplication here, would it be better to only have one set of resource event handler funcs that would work both with nil and non-nil nodeManager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, I was worried on optimizing the code over readbility. Please see the commit operator: simplify duplicated code in the event handling after this one where I've simplified the code a little bit.

operator/cilium_node.go Outdated Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from docs perspective. Looks like Jarno already reviewed the core functionality here, so I skipped that for now. Let me know if you would like additional review.

@aanm
Copy link
Member Author

aanm commented Nov 16, 2021

@joestringer @ldelossa since this code changed so much in this PR I decided to go ahead and added in the last commit to have the workqueue pattern that we have discussed a couple weeks ago. Let me know what you think. /cc @christarazi FYI

@aanm
Copy link
Member Author

aanm commented Nov 16, 2021

/test

@aanm aanm added the area/operator Impacts the cilium-operator component label Nov 17, 2021
@aanm
Copy link
Member Author

aanm commented Nov 18, 2021

/test

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Host firewall With native routing

Failure Output

FAIL: Failed to reach 10.128.0.61:80 from testclient-qx6jt

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.21-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing

Failure Output

FAIL: Connectivity test between nodes failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.19 so I can create a new GitHub issue to track it.

Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feedback was handled. Approving contingent on other's reviews.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm sure the more I look, the more I could find ways we can incrementally tidy up the code but I think it'll just become bikeshedding at that point. 🚢

operator/cilium_node.go Outdated Show resolved Hide resolved
@aanm
Copy link
Member Author

aanm commented Nov 19, 2021

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes IPv6 masquerading across K8s nodes

Failure Output

FAIL: Can not connect to url "\"http://[fd03::310]:80/\"" from pod(test-k8s2-7f96d84c65-tn2w6)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create a new GitHub issue to track it.

@aanm
Copy link
Member Author

aanm commented Nov 20, 2021

/mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9

👍 created #17942

@aanm
Copy link
Member Author

aanm commented Nov 20, 2021

/test-1.22-4.9

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, one question below regarding workqueue handling.

operator/cilium_node.go Outdated Show resolved Hide resolved
In certain cases the Cilium Operator could end up with two CiliumNode
k8s watchers. This commit re-writes the code in a way that a single
watcher will be used for all cases.

Signed-off-by: André Martins <andre@cilium.io>
The code present in k8s_node was not specific to a k8s node. Thus, this
commit moved the code into a better appropriate file, cilium_node.go

Signed-off-by: André Martins <andre@cilium.io>
This flag is no longer used in the code so we can safely remove it.

Fixes: 3aacf41 ("add capability to disable CNP NodeStatus updates")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
As this code suffers a major rewrite, this commit introduces the right
way of handling Kubernetes events, with a workqueue.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm force-pushed the pr/fix-cilium-node-watchers branch from 25377ca to ab6ec99 Compare November 22, 2021 22:30
@aanm
Copy link
Member Author

aanm commented Nov 22, 2021

/test

@aanm aanm added the release-blocker/1.11 This issue will prevent the release of the next version of Cilium. label Nov 22, 2021
@aanm
Copy link
Member Author

aanm commented Nov 22, 2021

I'll merge this PR since there was only one line added and the previous CI run was green IIRC.:

--- a/operator/cilium_node.go
+++ b/operator/cilium_node.go
@@ -232,9 +232,6 @@ func processNextWorkItem(queue workqueue.RateLimitingInterface, syncHandler func
                return true
        }
 
+       log.WithError(err).Errorf("sync %q failed with %v", key, err)
+       queue.AddRateLimited(key)
+
        return true
 }

This line will only be executed on error cases which is unlikely to be caught by our CI.

@aanm aanm merged commit f313f7d into cilium:master Nov 22, 2021
@aanm aanm deleted the pr/fix-cilium-node-watchers branch November 22, 2021 23:34
sayboras added a commit to sayboras/cilium that referenced this pull request Apr 20, 2022
This flag is deprecated in 1.11, so remove it for 1.12 now.

Relates to cilium#17873

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator Impacts the cilium-operator component kind/bug This is a bug in the Cilium logic. release-blocker/1.11 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants