Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: K8sDatapathConfig Host firewall: Managed to reach #16159

Closed
christarazi opened this issue May 14, 2021 · 7 comments
Closed

CI: K8sDatapathConfig Host firewall: Managed to reach #16159

christarazi opened this issue May 14, 2021 · 7 comments
Labels
area/CI Continuous Integration testing issue or flake area/host-firewall Impacts the host firewall or the host endpoint. ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects

Comments

@christarazi
Copy link
Member

christarazi commented May 14, 2021

CI failure

/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.19/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:518
Managed to reach 192.168.36.12:69 from testclient-g6zfn
Expected command: kubectl exec -n 202105141738k8sdatapathconfighostfirewallwithvxlan testclient-g6zfn -- curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 tftp://192.168.36.12:69/hello -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
To have failed, but it was successful:
Exitcode: 0 
Stdout:
 	 
	 Hostname: k8s2
	 
	 Request Information:
	 	client_address=10.0.0.168
	 	client_port=43811
	 	real path=/hello
	 	request_scheme=tftp
	 
	 time-> DNS: '0.000016()', Connect: '0.000031',Transfer '0.000000', total '0.006178'
Stderr:
 	 

/usr/local/go/src/runtime/asm_amd64.s:1371

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.19/419/testReport/junit/Suite-k8s-1/20/K8sDatapathConfig_Host_firewall_With_VXLAN/

00898f73_K8sDatapathConfig_Host_firewall_With_VXLAN.zip

Seems to be the sibling of #15575

@christarazi christarazi added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! area/host-firewall Impacts the host firewall or the host endpoint. labels May 14, 2021
@pchaigno
Copy link
Member

pchaigno commented May 17, 2021

This is all a bit strange...

The Hubble capture shows policy verdicts for that connections:

$ cat /tmp/hubble.json | ./hubble observe --to-port 69 --from-ip 10.0.0.168 -t policy-verdict
May 14 17:39:18.408: 10.0.0.168:43811 -> 192.168.36.12:69 L3-Only FORWARDED (UDP)
May 14 17:39:18.408: 10.0.0.168:43811 -> 192.168.36.12:69 L3-Only FORWARDED (UDP)
Click to show more verbose JSON output.
{
  "time": "2021-05-14T17:39:18.408304832Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "5e:bc:77:59:a5:10",
    "destination": "5a:bd:cf:ff:c5:16"
  },
  "IP": {
    "source": "10.0.0.168",
    "destination": "192.168.36.12",
    "ipVersion": "IPv4"
  },
  "l4": {
    "UDP": {
      "source_port": 43811,
      "destination_port": 69
    }
  },
  "source": {
    "identity": 2,
    "labels": [
      "reserved:world"
    ]
  },
  "destination": {
    "identity": 1,
    "labels": [
      "reserved:host"
    ]
  },
  "Type": "L3_L4",
  "node_name": "k8s2",
  "event_type": {
    "type": 5
  },
  "traffic_direction": "INGRESS",
  "policy_match_type": 1,
  "is_reply": false,
  "Summary": "UDP"
}
{
  "time": "2021-05-14T17:39:18.408305735Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "5e:bc:77:59:a5:10",
    "destination": "5a:bd:cf:ff:c5:16"
  },
  "IP": {
    "source": "10.0.0.168",
    "destination": "192.168.36.12",
    "ipVersion": "IPv4"
  },
  "l4": {
    "UDP": {
      "source_port": 43811,
      "destination_port": 69
    }
  },
  "source": {
    "identity": 2,
    "labels": [
      "reserved:world"
    ]
  },
  "destination": {
    "identity": 1,
    "labels": [
      "reserved:host"
    ]
  },
  "Type": "L3_L4",
  "node_name": "k8s2",
  "event_type": {
    "type": 5
  },
  "traffic_direction": "INGRESS",
  "policy_match_type": 1,
  "is_reply": false,
  "Summary": "UDP"
}

Two things are weird here:

  1. The source identity should be transmitted via the tunnel metadata so I would expect it to be the pod's identity. Maybe the pod didn't receive its identity yet but in that case I would expect unmanaged, not world.
  2. It says it matched on a L3 + L4 rule, but if the connections were allowed because of the world source identity, it should match on a L3 rule only. AFAIK, we don't have any policy allowing port 69 (since it should be denied).

The first one is very likely the reason for the flake. Unfortunately, we don't have the Hubble flows on the source node:

requested data has been overwritten and is no longer available

If we had that, we could check the resolved source identity for those packets and maybe understand where world is coming from.

@pchaigno
Copy link
Member

Taking another sysdump gets us a bit further:
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-5.4/103/testReport/junit/Suite-k8s-1/19/K8sDatapathConfig_Host_firewall_With_VXLAN/ with sysdump
fb9445d1_K8sDatapathConfig_Host_firewall_With_VXLAN.7z.zip.

On the destination node:

{
  "time": "2021-05-17T11:05:50.333292524Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "3a:9f:09:8d:41:92",
    "destination": "a6:8f:dc:66:a2:d3"
  },
  "IP": {
    "source": "10.0.1.242",
    "destination": "192.168.36.12",
    "ipVersion": "IPv4"
  },
  "l4": {
    "UDP": {
      "source_port": 38673,
      "destination_port": 69
    }
  },
  "source": {
    "identity": 2,
    "labels": [
      "reserved:world"
    ]
  },
  "destination": {
    "identity": 1,
    "labels": [
      "reserved:host"
    ]
  },
  "Type": "L3_L4",
  "node_name": "k8s2",
  "event_type": {
    "type": 5
  },
  "traffic_direction": "INGRESS",
  "policy_match_type": 1,
  "Summary": "UDP"
}

On the source node:

{
  "time": "2021-05-17T11:05:50.333832826Z",
  "verdict": "FORWARDED",
  "ethernet": {
    "source": "2a:4c:7d:b7:8a:cd",
    "destination": "86:99:45:40:69:54"
  },
  "IP": {
    "source": "10.0.1.242",
    "destination": "192.168.36.12",
    "ipVersion": "IPv4"
  },
  "l4": {
    "UDP": {
      "source_port": 38673,
      "destination_port": 69
    }
  },
  "source": {
    "ID": 1773,
    "identity": 7367,
    "namespace": "202105171105k8sdatapathconfighostfirewallwithvxlan",
    "labels": [
      "k8s:io.cilium.k8s.policy.cluster=default",
      "k8s:io.cilium.k8s.policy.serviceaccount=default",
      "k8s:io.kubernetes.pod.namespace=202105171105k8sdatapathconfighostfirewallwithvxlan",
      "k8s:zgroup=testClient"
    ],
    "pod_name": "testclient-vd92d"
  },
  "destination": {
    "identity": 6,
    "labels": [
      "reserved:remote-node"
    ]
  },
  "Type": "L3_L4",
  "node_name": "k8s1",
  "event_type": {
    "type": 4,
    "sub_type": 4
  },
  "trace_observation_point": "TO_OVERLAY",
  "Summary": "UDP"
}

So the source identity (7367) is somehow lost even though it was set in the tunnel metadata (happens just before the TRACE_TO_OVERLAY obs point).

@stale
Copy link

stale bot commented Jul 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 21, 2021
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 21, 2021
@brb brb added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Feb 17, 2022
@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022
@pchaigno pchaigno removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022
@pchaigno pchaigno changed the title CI: K8sDatapathConfig Host firewall With VXLAN: Managed to reach CI: K8sDatapathConfig Host firewall: Managed to reach Aug 16, 2022
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 16, 2022
@pchaigno pchaigno removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 16, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 16, 2022
@github-actions
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/host-firewall Impacts the host firewall or the host endpoint. ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
No open projects
CI Force
  
Awaiting triage
Development

No branches or pull requests

3 participants