Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates informer pkg to use TransformFunc() #25604

Merged
merged 1 commit into from Jun 14, 2023

Conversation

danehans
Copy link
Contributor

@danehans danehans commented May 22, 2023

  • Removes ConvertFunc in favor of TransformFunc for object transformation. Informers are now created with TransformFunc instead of ConvertFunc to mutate certain resources in K8s Caches.
  • Renames ConvertXXX functions to TransformXXX.
  • Adds tests to pkg/k8s/init_test.go to improve test coverage of transformer functions.

Fixes: #24923

@danehans danehans requested review from a team as code owners May 22, 2023 22:08
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 22, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 22, 2023
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few comments to address below.

operator/watchers/cilium_endpoint.go Outdated Show resolved Hide resolved
operator/cmd/root.go Outdated Show resolved Hide resolved
operator/cmd/root.go Show resolved Hide resolved
@christarazi christarazi added kind/enhancement This would improve or streamline existing functionality. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. release-note/misc This PR makes changes that have no direct user impact. labels May 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 23, 2023
Comment on lines -449 to -451
nodeInterface := k8s.ConvertToNode(k8sNode)
typesNode := nodeInterface.(*k8sTypes.Node)
k8sNodeParsed := k8s.ParseNode(typesNode, source.Unspec)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This transformation was removed since #25282 updated GetK8sNode() to return a slim node instead of a k8s node.

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Changes LGTM

Just a couple of nits inline. Could you please also extend the commit message with the context around this change?

pkg/k8s/informer/informer.go Outdated Show resolved Hide resolved
pkg/k8s/informer/informer.go Outdated Show resolved Hide resolved
@danehans
Copy link
Contributor Author

@giorio94 thanks for the /approval. I updated the PR description and commit 44598e0 resolves your nits.

@giorio94
Copy link
Member

@giorio94 thanks for the /approval. I updated the PR description and commit 44598e0 resolves your nits.

Thanks! Could you please also extend the commit message with a bit of description?

if transformer != nil {
var err error
if obj, err = transformer(d.Object); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behavior of the informer when we return an error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The informer will not process the object. For example, the clustermesh-apiserver creates an informer that transforms k8s nodes to cilium nodes. If TransformToCiliumNode() returns an error, the provided object is not created/updated. We could set RetryOnError to re-enqueue the object. Thoughts?

@danehans
Copy link
Contributor Author

danehans commented Jun 2, 2023

@christarazi and others, is there anything else needed to merge this PR?

@danehans
Copy link
Contributor Author

danehans commented Jun 5, 2023

Commit c2078e9 rebases due to merge conflicts from #25049.

@gandro
Copy link
Member

gandro commented Jun 6, 2023

/test

Removes ConvertFunc in favor of k8s upstream TransformFunc for object
transformation. Informers are now created with TransformFunc instead
of ConvertFunc to mutate certain resources in K8s Caches.

Renames ConvertXXX functions to TransformXXX.

Adds tests to improve test coverage of transformer functions.

Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
@danehans
Copy link
Contributor Author

danehans commented Jun 6, 2023

Commit 0b535aa rebases to resolve a merge conflict.

@christarazi
Copy link
Member

christarazi commented Jun 6, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unexpected missed tail call

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/527/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@danehans
Copy link
Contributor Author

danehans commented Jun 8, 2023

@christarazi thanks for kicking off the tests. The test-1.26-net-next job is failing due to:

/home/jenkins/workspace/Cilium-PR-K8s-1.26-kernel-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:515
Unexpected missed tail call
Expected
    <int>: 1
to be ==
    <int>: 0
/home/jenkins/workspace/Cilium-PR-K8s-1.26-kernel-net-next/src/github.com/cilium/cilium/test/k8s/updates.go:129

It appears the upgrade was unsuccessful (^ calls assertUpgradeSuccessful()), possibly due to one of the agents failing readiness probes:

22:18:19 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-rwcn6': Exitcode: 1 
Err: exit status 1
Stdout:
 	 
Stderr:
 	 Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
	 Error: Cannot get status/probe: Put "[http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe":](http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe) dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 
	 command terminated with exit code 1

I don't believe this is related to the PR so I wanted to get your input on how to proceed.

@christarazi
Copy link
Member

It's flake #24514.

@christarazi
Copy link
Member

Let's see if the next test run can pass without flakes.

@christarazi
Copy link
Member

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 10, 2023
@danehans
Copy link
Contributor Author

@christarazi when you have a moment, can you please advise me on how to move this PR forward?

@joestringer
Copy link
Member

@danehans once the reviews are all approved from relevant code owner areas and the required CI tests are all passing, the maintainer's little helper marks the PR ready-to-merge like above. Someone then comes through and does last sanity checks and merges the PR. Thanks for the contribution!

@joestringer joestringer merged commit 12ea4bb into cilium:main Jun 14, 2023
62 of 63 checks passed
@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

Hi all!

Are we sure this is fine, and tested that it indeed works for the DeletedFinalStateUnknown state?

I would expect with the current version of client-go, the object type inside cache.DeletedFinalStateUnknown in TransformToCCNP to be either *cilium_v2.CiliumClusterwideNetworkPolicy, or *types.SlimCNP. With the newest fixes in client-go (not yet in cilium), I would expect it to always be *types.SlimCNP. It seems like the current implementation with this PR won't properly handle that situation.

When applying this patch (the one below), I would expect it to pass, but it does not. Looks like pre this PR, this would work as expected, and the test would pass.

diff --git a/pkg/k8s/factory_functions_test.go b/pkg/k8s/factory_functions_test.go
index 9478107618..8f0dfa720b 100644
--- a/pkg/k8s/factory_functions_test.go
+++ b/pkg/k8s/factory_functions_test.go
@@ -1271,6 +1271,24 @@ func (s *K8sSuite) Test_TransformToCCNP(c *C) {
 			},
 			expected: true,
 		},
+		{
+			name: "delete final state unknown transformation v2",
+			args: args{
+				obj: cache.DeletedFinalStateUnknown{
+					Key: "foo",
+					Obj: &types.SlimCNP{
+						CiliumNetworkPolicy: &v2.CiliumNetworkPolicy{},
+					},
+				},
+			},
+			want: cache.DeletedFinalStateUnknown{
+				Key: "foo",
+				Obj: &types.SlimCNP{
+					CiliumNetworkPolicy: &v2.CiliumNetworkPolicy{},
+				},
+			},
+			expected: true,
+		},
 		{
 			name: "delete final state unknown transformation",
 			args: args{

With this PR we get the following err;

    factory_functions_test.go:1331:
        ... obtained *errors.errorString = &errors.errorString{s:"unknown object type *types.SlimCNP"} ("unknown object type *types.SlimCNP")
        ... expected = nil
        ...   any(
        +       e"unknown object type *types.SlimCNP",
          )

When applying the following diff the tests pass again;

diff --git a/pkg/k8s/factory_functions.go b/pkg/k8s/factory_functions.go
index 58a6cef2ef..9e8225634b 100644
--- a/pkg/k8s/factory_functions.go
+++ b/pkg/k8s/factory_functions.go
@@ -573,6 +573,11 @@ func TransformToCCNP(obj interface{}) (interface{}, error) {
 			},
 		}, nil
 	case cache.DeletedFinalStateUnknown:
+		_, isSlim := concreteObj.Obj.(*types.SlimCNP)
+		if isSlim {
+			return concreteObj, nil
+		}
+
 		ccnp, ok := concreteObj.Obj.(*cilium_v2.CiliumClusterwideNetworkPolicy)
 		if !ok {
 			return nil, fmt.Errorf("unknown object type %T", concreteObj.Obj)

Ooor am I missing something..? been a few weeks since I was deep into this

@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

cc @joestringer @danehans @christarazi ^

@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

Also, feel free to ping me on PRs around this part of client-go, given I re(wrote) some of it a few months ago to fix some policy leaks we saw in cilium 😄

@danehans
Copy link
Contributor Author

danehans commented Jun 14, 2023

@odinuge thanks for the feedback. I think I see the issue that you describe. With this PR, nil is being returned instead of the provided obj when obj is not a CiliumClusterwideNetworkPolicy or a DeletedFinalStateUnknown with a CiliumClusterwideNetworkPolicy.

If obj is a SlimCNP, that is inconsistent with the godocs for this function (prior to this PR) and inconsistent with other TransformXXX functions. Why call TransformXXX() if the obj is already a SlimXXX? Instead, pass a nil transformer when calling any of the New informer functions.

Should we allow unexpected objects to be returned to the caller or should all TransformXXX be updated to return the provided obj and a nil error if obj is a SlimXXX?

@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

Thanks for looking into this @danehans!

Yeah, the old godocs were pretty ambiguous, and the k8s behavior in client-go that the informer mimics, was pretty inconsistent previously. It still has some legacy it has to deal with, and i think thats the reason for the current function . I have also spent a lot of time wrapping my head around this. So I very much agree this is a bit confusing 😅

I think it all boils down to this;

var err error
if obj, err = transformer(d.Object); err != nil {
return err
}

There, we always call the transform funcs, just like client-go. So even if the object is an DeletedFinalStateUnknown, we run the transformer.

I should also have been explicit that my comment applies equally to all those transformer functions that return a different type than they receive; not only the CCNP one. 😄

Should we maybe open a new issue to track this? What do you think @danehans? If you to implement a fix here, I'm happy to review! 😄 (tho. I'm heading out of the office for the next two weeks 🌴 🇳🇴 )

@danehans
Copy link
Contributor Author

The code above that you reference is within a if transformer != nil {, hence my previous comment about callers not setting the transformer if transformation is unneeded. For backward compatibility purposes, I'll update each func to check for the appropriate SlimXX object and return the object without transformation and a nil error.

@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

The code above that you reference is within a if transformer != nil {, hence my previous comment about callers not setting the transformer if transformation is unneeded.

But for eg CCNPs, we do want to convert the objects, so we have to add the/an converter func to the informer for that (and the other) resources.

For backward compatibility purposes, I'll update each func to check for the appropriate SlimXX object and return the object without transformation and a nil error.

We will continue to need it. When we switch to use the client-go deltaFifo transformer config, we should be able to rewrite the converter funcs to return the input value directly, together with no error, when the input is of the type DeletedFinalStateUnknown. There is some docs Daniel wrote about how it works inside the deltaFIFO here.

@danehans
Copy link
Contributor Author

I assigned myself #26240 and will push a PR to fix it.

@odinuge
Copy link
Member

odinuge commented Jun 14, 2023

Thanks @danehans!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make use of the TransFunc introduced in k8s 1.27
9 participants