Intermittent report of dropped logs from hubble. #15445

gtsjamesbond · 2021-03-24T05:37:55Z

Hello,
we’ve been noticing some unusual behavior with cilium/hubble and we’re curious if anyone else has seen anything similar. Essentially, we’ve written network policies that allow traffic to a particular CIDR block/port, but we’re seeing intermittent drop logs from hubble with the log code 133 (policy interruption) where we know the traffic should be allowed (and usually is allowed). However, we haven’t actually actually noticed any service interruption due to those drops, so we’re unsure about whether these are real intermittent cilium drops or if hubble might be falsely reporting drops. Has anyone seen anything similar, whether the cause might be false reporting by hubble or real intermittent drops by cilium? We’re running cilium v1.8.6 on kubernetes 1.18.

here’s an example of a drop:

"attributes": {
    "source": {
        "namespace": "studio-live",
        "identity": 42150,
        "ID": 37,
        "labels": [
            "k8s:app.kubernetes.io/managed-by=spinnaker",
            "k8s:app.kubernetes.io/name=baas",
            "k8s:app=baas-config",
            "k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-v9rth",
            "k8s:io.cilium.k8s.policy.cluster=default",
            "k8s:io.cilium.k8s.policy.serviceaccount=default",
            "k8s:io.kubernetes.pod.namespace=studio-live",
            "k8s:studio-context=live"
        ],
        "pod_name": "baas-config-d84bc885f-8s564"
    },
    "IP": {
        "ipVersion": "IPv4",
        "destination": "10.8.2.59",
        "source": "10.8.1.121"
    },
    "l4": {
        "TCP": {
            "destination_port": 26257,
            "source_port": 48794,
            "flags": {
                "PSH": true,
                "ACK": true
            }
        }
    },
    "destination": {
        "identity": 2,
        "labels": [
            "reserved:world"
        ]
    },
    "node_name": "ip-10-8-1-11.us-east-2.compute.internal",
    "traffic_direction": "EGRESS",
    "ethernet": {
        "destination": "2e:cf:00:b3:44:3c",
        "source": "86:4f:19:2d:9b:ef"
    },
    "Type": "L3_L4",
    "event_type": {
        "sub_type": 133,
        "type": 1
    },
    "verdict": "DROPPED",
    "Summary": "TCP Flags: ACK, PSH",
    "time": "2021-03-19T18:08:08.011207109Z",
    "drop_reason": 133
}

policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  annotations:
    artifact.spinnaker.io/location: studio-live
    artifact.spinnaker.io/name: network-config
    artifact.spinnaker.io/type: kubernetes/CiliumNetworkPolicy.cilium.io
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cilium.io/v2","kind":"CiliumNetworkPolicy", [...]}
    moniker.spinnaker.io/application: baas
    moniker.spinnaker.io/cluster: CiliumNetworkPolicy.cilium.io network-config
  creationTimestamp: "2021-01-24T15:04:42Z"
  generation: 4
  labels:
    app.kubernetes.io/managed-by: spinnaker
    app.kubernetes.io/name: baas
  managedFields:
  - apiVersion: cilium.io/v2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:artifact.spinnaker.io/location: {}
          f:artifact.spinnaker.io/name: {}
          f:artifact.spinnaker.io/type: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:moniker.spinnaker.io/application: {}
          f:moniker.spinnaker.io/cluster: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
      f:spec:
        .: {}
        f:egress: {}
        f:endpointSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:app: {}
    manager: kubectl
    operation: Update
    time: "2021-03-06T23:00:22Z"
  name: network-config
  namespace: studio-live
  resourceVersion: "64708762"
  selfLink: /apis/cilium.io/v2/namespaces/studio-live/ciliumnetworkpolicies/network-config
  uid: af4b7347-921f-436d-ab8f-102933da6bf7
spec:
  egress:
  - toCIDRSet:
    - cidr: 10.0.0.0/8
  - toCIDR:
    - 169.254.169.254/32
    - 184.32.0.0/16
  - toFQDNs:
    - matchName: bondtech-docker-local.jfrog.io
    - matchPattern: baas-*.*.*.rds.amazonaws.com
[...]
  - toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: '*'
    - ports:
      - port: "8125"
        protocol: UDP
    - ports:
      - port: "6000"
        protocol: TCP
      - port: "7000"
        protocol: TCP
  - toEndpoints:
    - matchLabels:
        app: baas-auth
    - matchLabels:
        k8s:app: cockroachdb
        k8s:io.kubernetes.pod.namespace: crdb-dev3
[...]
  endpointSelector:
    matchLabels:
      app: baas-config

How to reproduce the issue

Intermittent. This seems to happen randomly so we do not have a concrete way to reproduce at will. We have asked in the slack and after initially triaging with @pchaigno and @aditighag the recommendation was to gather a dump and post a bug report for further review.
cilium-sysdump-prod2-20210324-012008.zip
cilium-sysdump-prod3-20210324-012151.zip

The text was updated successfully, but these errors were encountered:

pchaigno · 2021-03-24T09:18:27Z

What is the second sysdump (prod3)?

Was 10.8.2.59 associated to a pod at any point? In the FQDN cache, I see that cockroachdb.crdb-prod2.svc.cluster.local. resolved to 10.8.2.59. If it was associated to a pod, was that pod matched by:

      - matchLabels:
          k8s:app: cockroachdb
          k8s:io.kubernetes.pod.namespace: crdb-prod2

Were the policy drops happening when the sysdump was taken?

gtsjamesbond · 2021-03-24T15:26:43Z

Hey @pchaigno
Sorry for the confusion. We have two clusters running cilium and we were/are seeing the drops across both. So where we are seeing the drops logged are mainly to cockroach db, rds db and an external fdns. 10.8.2.59 was assigned to one of our cockroach pods, but I believe it was since been cycled/replaced. the cockroach pod should have matched that label. If there is any additional clarification/information that we need to provide, please let me know.

Regards.

pchaigno · 2021-03-24T17:17:12Z

10.8.2.59 was assigned to one of our cockroach pods, but I believe it was since been cycled/replaced.

When was the sysdump collected? In the following series of events correct?

Drops are happening.
Pod 10.8.2.59 is deleted.
Sysdump is collected.

It would be nice to get a sysdump before the pod is deleted, ideally while the drops are still happening.

gtsjamesbond · 2021-03-24T17:55:08Z

Hey @pchaigno,
I believe that sequence is correct. That just happened to be the initial request that we opened our slack conversation with. We can attempt to capture a more recent failure and dump as we see these frequently enough. I'll post an update with updated information.

Regards.

aditighag · 2021-03-24T17:59:08Z

@gtsjamesbond Just to confirm, there are no real packet drops, but only Hubble reported drops, correct? I think this context was missing in the Slack conversation. I'm not sure if we need the sig-policy label in this PR if that's the case.

gtsjamesbond · 2021-03-24T18:04:14Z

@aditighag so this is one thing that we were unsure of, we are getting the hubble reports of drops, BUT the application is behaving as expected (like there are no drops), so this is leading us to believe something is not correct, or we are misinterpreting the meaning of the drop reports. This is one of the things we wanted to verify, namely if the drop reports that we are seeing are infact accurate, or false reports. If false reports, then we would like to determine why to see if there is any way to "tune these out", so that we arent alerted if we dont need to be.

Regards.

gtsjamesbond · 2021-03-24T18:12:10Z

@pchaigno @aditighag
Below is a recent drop
`

{

destination { |
ID549identity20476labels [k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | ID | 549 | identity | 20476 | labels [ | | k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | k8s:app.kubernetes.io/managed-by=spinnaker, | k8s:app.kubernetes.io/name=baas, | k8s:app=baas-auth, | k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9, | k8s:io.cilium.k8s.policy.cluster=default, | k8s:io.cilium.k8s.policy.serviceaccount=default, | k8s:io.kubernetes.pod.namespace=studio-sandbox, | k8s:studio-context=sandbox
ID | 549
identity | 20476
labels [ |
k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | k8s:app.kubernetes.io/managed-by=spinnaker, | k8s:app.kubernetes.io/name=baas, | k8s:app=baas-auth, | k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9, | k8s:io.cilium.k8s.policy.cluster=default, | k8s:io.cilium.k8s.policy.serviceaccount=default, | k8s:io.kubernetes.pod.namespace=studio-sandbox, | k8s:studio-context=sandbox
k8s:app.kubernetes.io/managed-by=spinnaker,k8s:app.kubernetes.io/name=baas,k8s:app=baas-auth,k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=studio-sandbox,k8s:studio-context=sandbox | k8s:app.kubernetes.io/managed-by=spinnaker, | k8s:app.kubernetes.io/name=baas, | k8s:app=baas-auth, | k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9, | k8s:io.cilium.k8s.policy.cluster=default, | k8s:io.cilium.k8s.policy.serviceaccount=default, | k8s:io.kubernetes.pod.namespace=studio-sandbox, | k8s:studio-context=sandbox
k8s:app.kubernetes.io/managed-by=spinnaker,
k8s:app.kubernetes.io/name=baas,
k8s:app=baas-auth,
k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,
k8s:io.cilium.k8s.policy.cluster=default,
k8s:io.cilium.k8s.policy.serviceaccount=default,
k8s:io.kubernetes.pod.namespace=studio-sandbox,
k8s:studio-context=sandbox
`

$ kubectl get pods -A -o wide | grep -i "10.8.4.89" studio-sandbox baas-auth-58db55c49-tb2x6
I have attached a current sysdump.
cilium-sysdump-20210324-135815.zip

Regards.

aditighag · 2021-03-24T18:20:25Z

we are getting the hubble reports of drops, BUT the application is behaving as expected (like there are no drops)

So if pod1 on node1 is talking to pod2 on node2, you see Hubble reported drops on node1, but the packets reach node2/pod2?

pchaigno · 2021-03-24T18:48:24Z

we are getting the hubble reports of drops, BUT the application is behaving as expected (like there are no drops), so this is leading us to believe something is not correct

If the drops are intermittent, they might not cause connectivity issues. That seems more likely than Hubble getting packet drop notifications out of nowhere 🙂

gtsjamesbond · 2021-03-24T18:53:51Z

@pchaigno I see your point. What is of more interest to our team, is why we would get the drop to begin with if the target IP should be covered by one of the rules? Is what we are seeing an artifact of a network issue manifesting? or CNP actually causing a drop? We want to make sure that we have the proper interpretation of the drop logs that we are seeing.

We also see this intermittently with a target we have identified using FQDN. Any ideas there?

Regards.

pchaigno · 2021-03-25T10:39:26Z

why we would get the drop to begin with if the target IP should be covered by one of the rules? Is what we are seeing an artifact of a network issue manifesting? or CNP actually causing a drop?

My guess would be that the drops are caused by some transient issue in the resolution of the security identity. The destination identity reported by Hubble is reserved:world, which is sort of a default for when we can't associate another identity.

We can attempt to capture a more recent failure and dump as we see these frequently enough.

I think that would help understand why the identity resolution failed.

gtsjamesbond · 2021-03-25T15:53:27Z

@pchaigno here are two drops logged in the last hour to different destinations, and also the sysdump


> `{
> destination {	
> identity
> 	
> 
> 2
> 
> labels [	
> 
> reserved:world
> 
> ]
> }
> destination_names [	
> 
> cockroachdb.crdb-prod3.svc.cluster.local
> 
> ]
> drop_reason
> 	
> 
> 133
> 
> ethernet {	
> destination
> 	
> 
> 42:bb:c6:65:d5:80
> 
> source
> 	
> 
> de:3a:da:05:fb:38
> 
> }
> event_type {	
> sub_type
> 	
> 
> 133
> 
> type
> 	
> 
> 1
> 
> }
> IP {	
> destination
> 	
> 
> 10.8.3.79
> 
> ipVersion
> 	
> 
> IPv4
> 
> source
> 	
> 
> 10.8.5.49
> 
> }
> l4 {	
> TCP {	
> destination_port
> 	
> 
> 26257
> 
> flags {	
> ACK
> 	
> 
> true
> 
> PSH
> 	
> 
> true
> 
> }
> source_port
> 	
> 
> 58578
> 
> }
> }
> node_name
> 	
> 
> ip-10-8-5-103.us-east-2.compute.internal
> 
> source {	
> ID
> 	
> 
> 786
> 
> identity
> 	
> 
> 12971
> 
> labels [	
> 
> k8s:app.kubernetes.io/managed-by=spinnaker,
> 
> k8s:app.kubernetes.io/name=baas,
> 
> k8s:app=baas-config,
> 
> k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,
> 
> k8s:io.cilium.k8s.policy.cluster=default,
> 
> k8s:io.cilium.k8s.policy.serviceaccount=default,
> 
> k8s:io.kubernetes.pod.namespace=studio-live,
> 
> k8s:studio-context=live
> 
> ]
> namespace
> 	
> 
> studio-live
> 
> pod_name
> 	
> 
> baas-config-6f6d88ff9b-g8x6s
> 
> }
> Summary
> 	
> 
> TCP Flags: ACK, PSH
> 
> time
> 	
> 
> 2021-03-25T15:48:36.581529800Z
> 
> traffic_direction
> 	
> 
> EGRESS
> 
> Type
> 	
> 
> L3_L4
> 
> verdict
> 	
> 
> DROPPED
> 
> }`
> 
> 
> `{
> destination {	
> identity
> 	
> 
> 2
> 
> labels [	
> 
> reserved:world
> 
> ]
> }
> destination_names [	
> 
> rabbitmq.studio-live.svc.cluster.local
> 
> ]
> destination_service {	
> name
> 	
> 
> rabbitmq
> 
> namespace
> 	
> 
> studio-live
> 
> }
> drop_reason
> 	
> 
> 133
> 
> ethernet {	
> destination
> 	
> 
> 32:73:93:80:d3:68
> 
> source
> 	
> 
> 6e:f5:ae:67:4a:2e
> 
> }
> event_type {	
> sub_type
> 	
> 
> 133
> 
> type
> 	
> 
> 1
> 
> }
> IP {	
> destination
> 	
> 
> 172.20.244.191
> 
> ipVersion
> 	
> 
> IPv4
> 
> source
> 	
> 
> 10.8.5.18
> 
> }
> l4 {	
> TCP {	
> destination_port
> 	
> 
> 5672
> 
> flags {	
> SYN
> 	
> 
> true
> 
> }
> source_port
> 	
> 
> 38390
> 
> }
> }
> node_name
> 	
> 
> ip-10-8-5-103.us-east-2.compute.internal
> 
> source {	
> ID
> 	
> 
> 587
> 
> identity
> 	
> 
> 30337
> 
> labels [	
> 
> k8s:app.kubernetes.io/managed-by=spinnaker,
> 
> k8s:app.kubernetes.io/name=baas,
> 
> k8s:app=baas-bsa,
> 
> k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-f78q9,
> 
> k8s:io.cilium.k8s.policy.cluster=default,
> 
> k8s:io.cilium.k8s.policy.serviceaccount=default,
> 
> k8s:io.kubernetes.pod.namespace=studio-live,
> 
> k8s:studio-context=live
> 
> ]
> namespace
> 	
> 
> studio-live
> 
> pod_name
> 	
> 
> baas-bsa-6c5c9965f8-t8qkf
> 
> }
> Summary
> 	
> 
> TCP Flags: SYN
> 
> time
> 	
> 
> 2021-03-25T15:42:58.661531335Z
> 
> traffic_direction
> 	
> 
> EGRESS
> 
> Type
> 	
> 
> L3_L4
> 
> verdict
> 	
> 
> DROPPED
> 
> }`

pchaigno · 2021-03-25T17:31:01Z

@gtsjamesbond I think you forgot to attach the sysdump. The Hubble log is also ill-formatted.

gtsjamesbond · 2021-03-25T20:26:50Z

@pchaigno Attaching dump again.
cilium-sysdump-20210325-114456.zip

gtsjamesbond · 2021-03-29T15:10:13Z

@pchaigno @aditighag , were you able to gain any insight from the latest attached logs uploaded on March 25?

Regards.

aditighag · 2021-03-29T17:01:33Z

@pchaigno @aditighag , were you able to gain any insight from the latest attached logs uploaded on March 25?

Regards.

Busy with the 1.10 release, I'll try to take a look (most likely) after the release unless someone else beats me to it.

stale · 2021-06-03T20:53:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

errordeveloper · 2021-06-08T08:47:01Z

Busy with the 1.10 release, I'll try to take a look (most likely) after the release unless someone else beats me to it.

@aditighag as you showed some interest, I've assigned this to you :)

othmane399 · 2021-10-22T17:16:41Z

have the same behavior on some pods

aditighag · 2021-10-22T18:29:22Z

have the same behavior on some pods

@othmane399 What cilium version is this? We've had a couple of fixes around identity propagation delays - #16302 that could be potentially related. Can you check if you have the fix?

othmane399 · 2021-10-22T18:33:28Z

@aditighag I'm in 1.10.3 so basically yes I have the changes..

joaoubaldo · 2022-03-09T14:46:27Z

I am experiencing something very similar to this. In my case, the drops reported by Hubble are real (see tests below).
Running Cilium 1.10.3 on EKS 1.20.

For context, we use Traefik (as a DaemonSet) to forward traffic to some-service Pods/Endpoints and from time to time, our metrics show an increasing number of HTTP 504's for ingresses.
When these 504's are happening, Traefik logs show messages such as:
'504 Gateway Timeout' caused by: dial tcp 10.210.145.199:3000: i/o timeout
We then confirm that the IP is assigned to a Pod:

$ kubectl get pods -o wide | grep some-service
some-service-6ccccf645d-222k9                     1/1     Running   0          3h12m   10.210.169.205   ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-5t9qk                     1/1     Running   0          3h12m   10.210.112.82    ip-10-210-117-158.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-6l4k6                     1/1     Running   0          137m    10.210.129.206   ip-10-210-128-32.eu-west-1.compute.internal    <none>           <none>
some-service-6ccccf645d-9snjn                     1/1     Running   2          3h12m   10.210.145.199   ip-10-210-147-20.eu-west-1.compute.internal    <none>           <none> <---- curling this from another pod results in DROPPED
some-service-6ccccf645d-bgpk6                     1/1     Running   0          156m    10.210.185.161   ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-btr8l                     1/1     Running   0          3h12m   10.210.124.231   ip-10-210-117-158.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-c9mdv                     1/1     Running   0          154m    10.210.164.170   ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-gtlpj                     1/1     Running   0          125m    10.210.135.205   ip-10-210-128-32.eu-west-1.compute.internal    <none>           <none>
some-service-6ccccf645d-kqvxb                     1/1     Running   0          3h12m   10.210.166.35    ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-ln4v5                     1/1     Running   0          125m    10.210.171.201   ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-rfmf4                     1/1     Running   0          142m    10.210.126.142   ip-10-210-117-158.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-wvc7s                     1/1     Running   2          3h11m   10.210.140.27    ip-10-210-147-20.eu-west-1.compute.internal    <none>           <none>
some-service-6ccccf645d-xfbtp                     1/1     Running   0          3h12m   10.210.109.77    ip-10-210-117-158.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-xfmdq                     1/1     Running   0          136m    10.210.132.141   ip-10-210-128-32.eu-west-1.compute.internal    <none>           <none>
some-service-6ccccf645d-z7gf7                     1/1     Running   0          154m    10.210.184.122   ip-10-210-184-168.eu-west-1.compute.internal   <none>           <none>
some-service-6ccccf645d-ztm87                     1/1     Running   0          143m    10.210.116.247   ip-10-210-117-158.eu-west-1.compute.internal   <none>           <none>

Making a HTTP request to 10.210.145.199:3000 from any Traefik Pod fails with Cilium dropping the packet. hubble observe shows

Mar  9 14:09:02.709: traefik/traefik-dhzrk:45498 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)
Mar  9 14:09:02.709: traefik/traefik-dhzrk:45498 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)
Mar  9 14:09:04.484: traefik/traefik-dhzrk:46534 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)
Mar  9 14:09:04.484: traefik/traefik-dhzrk:46534 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)
Mar  9 14:09:05.493: traefik/traefik-dhzrk:46534 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)
Mar  9 14:09:05.493: traefik/traefik-dhzrk:46534 <> 10.210.145.199:3000 Policy denied DROPPED (TCP Flags: SYN)

However a request to another some-service Pod IP (e.g. 10.210.164.170) succeeds.

The relevant Cilium Endpoint

  {
    "id": 1114,
    "spec": {
      "label-configuration": {},
      "options": {
        "Conntrack": "Enabled",
        "ConntrackAccounting": "Enabled",
        "ConntrackLocal": "Disabled",
        "Debug": "Disabled",
        "DebugLB": "Disabled",
        "DebugPolicy": "Disabled",
        "DropNotification": "Enabled",
        "MonitorAggregationLevel": "Medium",
        "NAT46": "Disabled",
        "PolicyAuditMode": "Disabled",
        "PolicyVerdictNotification": "Enabled",
        "TraceNotification": "Enabled"
      }
    },
    "status": {
      "controllers": [
        {
          "configuration": {
            "error-retry": true,
            "error-retry-base": "2s"
          },
          "name": "endpoint-1114-regeneration-recovery",
          "status": {
            "last-failure-timestamp": "0001-01-01T00:00:00.000Z",
            "last-success-timestamp": "0001-01-01T00:00:00.000Z"
          },
          "uuid": "95f459bf-b865-4323-af63-1566bedabadc"
        },
        {
          "configuration": {
            "error-retry": true,
            "interval": "5m0s"
          },
          "name": "resolve-identity-1114",
          "status": {
            "last-failure-timestamp": "0001-01-01T00:00:00.000Z",
            "last-success-timestamp": "2022-03-09T12:16:18.025Z",
            "success-count": 40
          },
          "uuid": "1c4fa874-3084-47c3-9f78-682d2b2f08fa"
        },
        {
          "configuration": {
            "error-retry": true,
            "interval": "1m0s"
          },
          "name": "sync-policymap-1114",
          "status": {
            "last-failure-timestamp": "0001-01-01T00:00:00.000Z",
            "last-success-timestamp": "2022-03-09T12:18:17.522Z",
            "success-count": 1062
          },
          "uuid": "9c7beb4f-0854-4d85-bce0-64679702ff01"
        },
        {
          "configuration": {
            "error-retry": true,
            "interval": "10s"
          },
          "name": "sync-to-k8s-ciliumendpoint (1114)",
          "status": {
            "last-failure-timestamp": "0001-01-01T00:00:00.000Z",
            "last-success-timestamp": "2022-03-09T12:18:38.699Z",
            "success-count": 1186
          },
          "uuid": "67de00c4-ba2b-4cc8-9c22-a576f26a9353"
        }
      ],
      "external-identifiers": {
        "container-id": "9723d07a45d0b8891c64eb3d715d1f17c4defaef00995ddfe4df4d3e74f58fd3",
        "k8s-namespace": "some-namespace",
        "k8s-pod-name": "some-service-6ccccf645d-9snjn",
        "pod-name": "some-namespace/some-service-6ccccf645d-9snjn"
      },
      "health": {
        "bpf": "OK",
        "connected": true,
        "overallHealth": "OK",
        "policy": "OK"
      },
      "identity": {
        "id": 38850,
        "labels": [
          "k8s:serviceSelector=some-service-selector",
          "k8s:app.kubernetes.io/name=some-service",
          "k8s:io.cilium.k8s.namespace.labels.functional_area=engineering",
          "k8s:io.kubernetes.pod.namespace=some-namespace",
          "k8s:io.cilium.k8s.policy.cluster=default",
          "k8s:io.cilium.k8s.namespace.labels.team=some-namespace",
          "k8s:app.kubernetes.io/managed-by=spinnaker",
          "k8s:io.cilium.k8s.policy.serviceaccount=some-service",
          "k8s:app.kubernetes.io/instance=some-service",
          "k8s:frontedBy=some-service-svc"
        ],
        "labelsSHA256": "3613c13e2efceb441c618ffc4c04ce3044f7737db2ffe6b932d897511ddc58a1"
      },
      "labels": {
        "derived": [
          "k8s:pod-template-hash=6ccccf645d"
        ],
        "realized": {},
        "security-relevant": [
          "k8s:app.kubernetes.io/instance=some-service",
          "k8s:app.kubernetes.io/managed-by=spinnaker",
          "k8s:app.kubernetes.io/name=some-service",
          "k8s:frontedBy=some-service-svc",
          "k8s:io.cilium.k8s.namespace.labels.functional_area=engineering",
          "k8s:io.cilium.k8s.namespace.labels.team=some-namespace",
          "k8s:io.cilium.k8s.policy.cluster=default",
          "k8s:io.cilium.k8s.policy.serviceaccount=some-service",
          "k8s:io.kubernetes.pod.namespace=some-namespace",
          "k8s:serviceSelector=some-service-selector"
        ]
      },
      "log": [
        {
          "code": "OK",
          "message": "Successfully regenerated endpoint program (Reason: one or more identities created or deleted)",
          "state": "ready",
          "timestamp": "2022-03-09T12:17:17Z"
        }
      ],
      "namedPorts": [
        {
          "name": "http",
          "port": 3000,
          "protocol": "TCP"
        }
      ],
      "networking": {
        "addressing": [
          {
            "ipv4": "10.210.145.199"
          }
        ],
        "host-mac": "d2:83:14:7e:b8:5d",
        "interface-index": 483,
        "interface-name": "lxc96c6117f9e34",
        "mac": "6a:66:e9:05:5d:c0"
      },
      "policy": {
        "proxy-statistics": [],
        "realized": {
          "allowed-egress-identities": [
            0
          ],
          "allowed-ingress-identities": [
            0,
            1
          ],
          "build": 28,
          "cidr-policy": {
            "egress": [],
            "ingress": []
          },
          "denied-egress-identities": null,
          "denied-ingress-identities": null,
          "id": 38850,
          "l4": {
            "egress": [],
            "ingress": []
          },
          "policy-enabled": "none",
          "policy-revision": 28
        },
        "spec": {
          "allowed-egress-identities": [
            0
          ],
          "allowed-ingress-identities": [
            1,
            0
          ],
          "build": 28,
          "cidr-policy": {
            "egress": [],
            "ingress": []
          },
          "denied-egress-identities": null,
          "denied-ingress-identities": null,
          "id": 38850,
          "l4": {
            "egress": [],
            "ingress": []
          },
          "policy-enabled": "none",
          "policy-revision": 28
        }
      },
      "realized": {
        "label-configuration": {},
        "options": {
          "Conntrack": "Enabled",
          "ConntrackAccounting": "Enabled",
          "ConntrackLocal": "Disabled",
          "Debug": "Disabled",
          "DebugLB": "Disabled",
          "DebugPolicy": "Disabled",
          "DropNotification": "Enabled",
          "MonitorAggregationLevel": "Medium",
          "NAT46": "Disabled",
          "PolicyAuditMode": "Disabled",
          "PolicyVerdictNotification": "Enabled",
          "TraceNotification": "Enabled"
        }
      },
      "state": "ready"
    }
  }
]

CiliumNetworkPolicy applied to Traefik

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: traefik
  namespace: traefik
spec:
  egress:
  - toEntities:
    - cluster
  - toPorts:
    - ports:
      - port: "443"
    toServices:
    - k8sService:
        namespace: default
        serviceName: kubernetes
  - toPorts:
    - ports:
      - port: "8126"
        protocol: TCP
      - port: "8125"
        protocol: UDP
    toServices:
    - k8sServiceSelector:
        namespace: datadog
        selector:
          matchLabels:
            agent.datadoghq.com/name: datadog
            app.kubernetes.io/instance: agent
  - toFQDNs:
    - matchName: update.traefik.io
    toPorts:
    - ports:
      - port: "443"
  - toEndpoints:
    - matchLabels:
        io.kubernetes.pod.namespace: kube-system
        k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      rules:
        dns:
        - matchPattern: '*'
  - toEndpoints:
    - matchLabels:
        io.kubernetes.pod.namespace: engineering
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: traefik
  ingress:
  - fromEntities:
    - host
    - remote-node
    toPorts:
    - ports:
      - port: "8082"
        protocol: TCP
      - port: "8443"
      - port: "8445"
  - fromEntities:
    - world
    toPorts:
    - ports:
      - port: "8082"
      - port: "8443"
      - port: "8444"
      - port: "8445"

Any assistance on how to further troubleshoot is very welcomed.

Thank you

pchaigno · 2022-03-10T10:12:15Z

@joaoubaldo Could you please open a new issue with the same information as above plus a full dump of those Hubble drops (-o jsonpb)? A Cilium sysdump of the cluster would also help.

github-actions · 2022-07-09T02:18:27Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-09-14T02:23:31Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-09-29T02:19:42Z

This issue has not seen any activity since it was marked stale.
Closing.

luanphantiki · 2023-01-10T07:32:15Z

I have the same issue, it was random, not repeating, and the TCP flag is always not SYN. My version:

root@xxx:/home/cilium# cilium version
Client: 1.12.2 c7516b9 2022-09-14T15:25:06+02:00 go version go1.18.6 linux/amd64
Daemon: 1.12.2 c7516b9 2022-09-14T15:25:06+02:00 go version go1.18.6 linux/amd64

smeeklai · 2023-12-19T08:07:39Z

Anybody knows how I can have the same drop traffic log as the author? What to enable please?

joaoubaldo · 2023-12-22T11:24:01Z

Anybody knows how I can have the same drop traffic log as the author? What to enable please?

@smeeklai you can get that info with something like hubble observe -t drop --output jsonpb

gtsjamesbond added the kind/bug This is a bug in the Cilium logic. label Mar 24, 2021

pchaigno added the need-more-info More information is required to further debug or fix the issue. label Mar 24, 2021

stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 3, 2021

errordeveloper assigned aditighag Jun 8, 2021

stale bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 8, 2021

pchaigno unassigned aditighag Oct 22, 2021

github-actions bot added stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jul 9, 2022

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 14, 2022

github-actions bot closed this as completed Sep 29, 2022

luanphantiki mentioned this issue Jan 13, 2023

Unexpected DROPPED with FQDN policies #23086

Closed

2 tasks

smeeklai mentioned this issue Dec 20, 2023

Is there a way to log all dropped verdict by policy deny? cilium/hubble#1332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent report of dropped logs from hubble. #15445

Intermittent report of dropped logs from hubble. #15445

gtsjamesbond commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

aditighag commented Mar 24, 2021 •

edited

gtsjamesbond commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

aditighag commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021 •

edited

pchaigno commented Mar 25, 2021

gtsjamesbond commented Mar 25, 2021 •

edited

pchaigno commented Mar 25, 2021

gtsjamesbond commented Mar 25, 2021

gtsjamesbond commented Mar 29, 2021

aditighag commented Mar 29, 2021

stale bot commented Jun 3, 2021

errordeveloper commented Jun 8, 2021

othmane399 commented Oct 22, 2021

aditighag commented Oct 22, 2021

othmane399 commented Oct 22, 2021

joaoubaldo commented Mar 9, 2022

pchaigno commented Mar 10, 2022

github-actions bot commented Jul 9, 2022

github-actions bot commented Sep 14, 2022

github-actions bot commented Sep 29, 2022

luanphantiki commented Jan 10, 2023

smeeklai commented Dec 19, 2023

joaoubaldo commented Dec 22, 2023

Intermittent report of dropped logs from hubble. #15445

Intermittent report of dropped logs from hubble. #15445

Comments

gtsjamesbond commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

aditighag commented Mar 24, 2021 • edited

gtsjamesbond commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021

{

aditighag commented Mar 24, 2021

pchaigno commented Mar 24, 2021

gtsjamesbond commented Mar 24, 2021 • edited

pchaigno commented Mar 25, 2021

gtsjamesbond commented Mar 25, 2021 • edited

pchaigno commented Mar 25, 2021

gtsjamesbond commented Mar 25, 2021

gtsjamesbond commented Mar 29, 2021

aditighag commented Mar 29, 2021

stale bot commented Jun 3, 2021

errordeveloper commented Jun 8, 2021

othmane399 commented Oct 22, 2021

aditighag commented Oct 22, 2021

othmane399 commented Oct 22, 2021

joaoubaldo commented Mar 9, 2022

pchaigno commented Mar 10, 2022

github-actions bot commented Jul 9, 2022

github-actions bot commented Sep 14, 2022

github-actions bot commented Sep 29, 2022

luanphantiki commented Jan 10, 2023

smeeklai commented Dec 19, 2023

joaoubaldo commented Dec 22, 2023

aditighag commented Mar 24, 2021 •

edited

gtsjamesbond commented Mar 24, 2021 •

edited

gtsjamesbond commented Mar 25, 2021 •

edited