Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Conformance Runtime (ci-runtime): Runtime (agent): DNS proxy policy works if Cilium stops #33249

Closed
gandro opened this issue Jun 19, 2024 · 3 comments · Fixed by #33272
Closed
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! release-blocker/1.16 This issue will prevent the release of the next version of Cilium.

Comments

@gandro
Copy link
Member

gandro commented Jun 19, 2024

CI failure


/home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:515
07:49:15 STEP: Setting up policy: /root/go/src/github.com/cilium/cilium/test/policy_644568a1.json
07:49:16 STEP: Setting up policy: /root/go/src/github.com/cilium/cilium/test/policy_a304e84d.json
07:49:17 STEP: Curl from "app1" to "http://world1.cilium.test/"
07:49:17 STEP: Curl from "app1" to "http://world1.outside.test/" should fail
07:49:22 STEP: Dumping IP cache before Cilium is stopped
Local scope identities in IP cache before Cilium restart: map[172.18.0.2/32:16777223 192.168.1.42/32:16777222 fe80::/10:16777219 fe80::1/128:16777220 fe80::2/128:16777221]
07:49:23 STEP: Stopping Cilium
cilium.test selectors before restart: 16777218
16777223

07:49:23 STEP: Testing connectivity from "app1" to the IP "172.18.0.2" without DNS request
07:49:23 STEP: Curl from "app1" to "http://world1.cilium.test/" with Cilium down
07:49:28 STEP: Testing that invalid traffic is still block when Cilium is down%!(EXTRA string=app1, string=172.18.0.7)
07:49:33 STEP: Starting Cilium again
07:49:33 STEP: Restarting Cilium
07:49:45 STEP: Dumping IP cache after Cilium is restarted
Local scope identities in IP cache after Cilium restart: map[172.18.0.2/32:16777223 192.168.1.42/32:16777222 fe80::/10:16777219 fe80::1/128:16777220 fe80::2/128:16777221]
07:49:45 STEP: Setting up policy: /root/go/src/github.com/cilium/cilium/test/policy_c64253b9.json
cilium.test selectors after restart: 16777218
16777224

FAIL: Expected
    <string>: 16777218\n16777224\n
to equal
    <string>: 16777218\n16777223\n
=== Test Finished at 2024-06-19T07:49:45Z====

• Failure [41.399 seconds]
RuntimeAgentFQDNPolicies
/home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
  DNS proxy policy works if Cilium stops [It]
  /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:515

  Expected
      <string>: 16777218\n16777224\n
  to equal
      <string>: 16777218\n16777223\n

  /home/runner/work/cilium/cilium/test/runtime/fqdn.go:1139

logs_25069779201.zip

@gandro gandro added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jun 19, 2024
@gandro gandro self-assigned this Jun 19, 2024
@gandro
Copy link
Member Author

gandro commented Jun 19, 2024

@gandro gandro added the release-blocker/1.16 This issue will prevent the release of the next version of Cilium. label Jun 19, 2024
@gandro
Copy link
Member Author

gandro commented Jun 19, 2024

I think the following is happening right now:

  1. We do a lookup for world1.cilium.io, it returns exactly 1 IP per IP family. We allocate one FQDN identity (fqdn:world1.cilium.io,reserved:world-ipvN) per family
  2. We shut down Cilium, where it didn't manage to checkpoint the latest FQDN identities
  3. Cilium comes up again, it sees the IP(s) in IPCache and withholds the numeric identity. Because it has no labels and the identity is unique to the IP, it adds a cidr label. The IP is therefore added as a "restored" CIDR idenity to the new cache
  4. FQDN policy is re-imported (after endpoint regeneration!), because it now adds a fqdn:world1.cilium.io label to the IP, we allocate a new numeric identity for the IP
  5. The test check numeric identities and is unhappy about the fact that identities changed

@gandro
Copy link
Member Author

gandro commented Jun 19, 2024

The "good" news is that this identity change shouldn't actually cause drops, because it's happening after restoration and this is mainly a false positive in terms of test failure. Still, we want to fix either the test or the check-pointing (or both)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! release-blocker/1.16 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant