Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: RuntimeFQDNPolicies Enforces ToFQDNs policy: Docker containers mistmach with Cilium Endpoints #14463

Closed
tklauser opened this issue Dec 21, 2020 · 6 comments · Fixed by #16068
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!
Projects

Comments

@tklauser
Copy link
Member

Seen on #14460

/home/jenkins/workspace/Cilium-PR-Runtime-4.9/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:423
Docker containers mistmach with Cilium Endpoints
Expected
    <*errors.errorString | 0xc0039f2ec0>: {
        s: "ContainerID eb9a28f3fc0178cd189694a92fcba479009112b9f00d7b13873102edf5a5d759 is not present in the endpoint list",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-Runtime-4.9/runtime-gopath/src/github.com/cilium/cilium/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:64

test_results_Cilium-PR-Runtime-4.9_3104_BDD-Test-PR.zip

https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/3104/

@tklauser tklauser added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Dec 21, 2020
@jrajahalme
Copy link
Member

@pchaigno
Copy link
Member

pchaigno commented Feb 3, 2021

Hit again at #14712.

@jrajahalme
Copy link
Member

Again in #15458

@jrajahalme
Copy link
Member

Hit again in #15725

@pchaigno
Copy link
Member

pchaigno commented May 3, 2021

Analysis

Function ValidateEndpointsAreCorrect is failing here because one of the Docker containers is missing from the cilium endpoint list output. This function compares container-ids and the container with Cilium endpoint ID 3158 doesn't (yet?) have a container-id in cilium endpoint list.

If we take another endpoint with a container-id, for example 3172, we can see in Cilium logs that the container-id is only assigned some time after the endpoint creation:

Dec 21 01:58:14 runtime cilium-agent[10828]: level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3172 ipv4= ipv6= k8sPodName=/ subsys=endpoint
Dec 21 01:58:14 runtime cilium-agent[10828]: level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3172 identityLabels="reserved:init" ipv4= ipv6= k8sPodName=/ subsys=endpoint
Dec 21 01:58:14 runtime cilium-agent[10828]: level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3172 identity=5 identityLabels="reserved:init" ipv4= ipv6= k8sPodName=/ oldIdentity="no identity" subsys=endpoint
[...]
Dec 21 01:58:48 runtime cilium-agent[10828]: level=info msg="API request released by rate limiter" burst=3 limit=0.20/s maxWaitDuration=15s maxWaitDurationLimiter=14.999958083s name=endpoint-patch parallelRequests=2 subsys=rate uuid=7ec2173f-263d-4afb-9282-4b940cd4f879 waitDurationLimiter=3.165267985s waitDurationTotal=3.16572926s
Dec 21 01:58:48 runtime cilium-agent[10828]: level=info msg="Patch endpoint request" addressing="<nil>" containerID=5a02a0b129c8ecf551e588ec9e01d206eed1dcab0ace45a3e33be0e29a314540 datapathConfiguration="<nil>" endpointID="docker-endpoint:4a905ebfbf44e29e3aefa4ee196397cd6e74d015b22b12c41f83ad287805cdb2" interface= k8sPodName=/ labels="[container:app=test container:id.app3]" subsys=daemon
Dec 21 01:58:48 runtime cilium-agent[10828]: level=info msg="Resolving identity labels (non-blocking)" containerID= datapathPolicyRevision=10 desiredPolicyRevision=10 endpointID=3172 identity=5 identityLabels="container:app=test,container:id.app3" ipv4= ipv6= k8sPodName=/ subsys=endpoint
Dec 21 01:58:48 runtime cilium-agent[10828]: level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=10 desiredPolicyRevision=10 endpointID=3172 identity=1306 identityLabels="container:app=test,container:id.app3" ipv4= ipv6= k8sPodName=/ oldIdentity=5 subsys=endpoint

Maybe endpoint 3158 just didn't receive its container-id yet? The fact it still has a reserved:init identity seems to confirm that.

Solution

Now, I'm unsure if we should run ValidateEndpointsAreCorrect in a loop until it succeeds or timeouts, or if we should just ignore reserved:init endpoints in that function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!
Projects
No open projects
CI Force
  
Fixed / Done
Development

Successfully merging a pull request may close this issue.

3 participants