Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing new BPF failed: invalid LXC MAC: invalid MAC address #14100

Closed
ArthurChiao opened this issue Nov 20, 2020 · 5 comments · Fixed by #14114
Closed

Exposing new BPF failed: invalid LXC MAC: invalid MAC address #14100

ArthurChiao opened this issue Nov 20, 2020 · 5 comments · Fixed by #14114
Labels
area/host-firewall Impacts the host firewall or the host endpoint. kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@ArthurChiao
Copy link
Contributor

Caused by security-relevant-labels configuration changes. Not sure if this is the expected behavior, or a bug.

General Information

  • Cilium version: 1.8.4
  • Kernel version: 4.19.118

Dig inside

According to https://docs.cilium.io/en/v1.8/operations/scalability/identity-relevant-labels/#configuring-identity-relevant-labels,
after specifying a security relevant label list to cilium agent and restart the agent, all endpoints on this node will be regenerated with the given labels + default labels.

One problem is that, if reserved:.host (or reserved:.*) is not included in the provided label list, the cilium_host endpoint will lose reserved:host label. This is ok for now.

But when rolling back the change, that is, removing the specified label list and restarting the agent, it will raise the above errors.
Note that subsequently, all pods' traffic on this node will be interrupted.

My understanding is that IsHost() returns false for cilium_host endpoint during this case: https://github.com/cilium/cilium/blob/master/pkg/endpoint/bpf.go#L629

In the next, it will treat this endpoint as a normal endpoint, and try to get its lxcMAC, then failed as cilium_host has no corresponding lxcMAC.

Try-outs

Quickly tested that follow two ways could avoid the problem:

  1. Via config: add reserved:host (or reserved:.*) to the custom label list.
  2. Via patch: add reserved:host (or reserved:.*) to the (hard-coded) default label list, and re-compile cilium agent, https://github.com/cilium/cilium/blob/master/pkg/labelsfilter/filter.go#L163

Thanks!

@joestringer

@ArthurChiao ArthurChiao added the kind/bug This is a bug in the Cilium logic. label Nov 20, 2020
@kkourt kkourt assigned kkourt and aanm and unassigned kkourt Nov 20, 2020
@aanm
Copy link
Member

aanm commented Nov 20, 2020

@ArthurChiao did you miss some outputs? The text that you have written in the PR description says "it will raise the above errors." but I don't understand which errors you are referring to.

@aanm aanm added the need-more-info More information is required to further debug or fix the issue. label Nov 20, 2020
@ArthurChiao
Copy link
Contributor Author

Sorry, "the above errors" means the title Exposing new BPF failed: invalid LXC MAC: invalid MAC address.

Full error messages look like this:

level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=0s bpfWaitForELF=0s bpfWriteELF=0s buildDuration="573.714µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=3 endpoi
ntID=1826 error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address " identity=281659 ipv4= ipv6= k8sPodName=/ mapSync="15.369µs" policyCalculation="8.24µs" prepareBuild="230.602µs" proxyConfiguration
="16.706µs" proxyPolicyCalculation="98.879µs" proxyWaitForAck=0s reason="one or more identities created or deleted" subsys=endpoint waitingForCTClean=507ns waitingForLock="1.929µs"
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=3 endpointID=1826 error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address " identity=281659
 ipv4= ipv6= k8sPodName=/ subsys=endpoint

@aanm aanm assigned pchaigno and unassigned aanm Nov 20, 2020
@joestringer joestringer added area/host-firewall Impacts the host firewall or the host endpoint. and removed need-more-info More information is required to further debug or fix the issue. labels Nov 20, 2020
@pchaigno pchaigno removed their assignment Nov 20, 2020
@pchaigno
Copy link
Member

Thanks for the report @ArthurChiao!
Could you send a pull request with the fix? Happy to help on any blocker (and to review of course)!

@pchaigno
Copy link
Member

pchaigno commented Dec 9, 2020

@ArthurChiao Could you detail how you specified the list of security-relevant labels? I'm unable to reproduce with our --labels flag unless I explicitly exclude reserved:host. I don't think the patch at #14114 currently protects against explicit exclusion of reserved:host though.

@ArthurChiao
Copy link
Contributor Author

ArthurChiao commented Dec 10, 2020

All my previous tests were based on config-map,

$ ls
bpf-ct-global-any-max   clustermesh-config               enable-health-checking  enable-node-port                    k8s-kubeconfig-path     masquerade             sidecar-istio-proxy-image
bpf-ct-global-tcp-max   cluster-name                     enable-hubble           flannel-manage-existing-containers  kube-proxy-replacement  monitor-aggregation    tofqdns-enable-poller
clean-cilium-bpf-state  custom-cni-conf                  enable-ipv4             flannel-master-device               kvstore                 policy-audit-mode      tunnel
clean-cilium-state      debug                            enable-ipv6             flannel-uninstall-on-exit           kvstore-opt             preallocate-bpf-maps   wait-bpf-mount
cluster-id              enable-endpoint-health-checking  enable-legacy-services  identity-allocation-mode            labels                  prometheus-serve-addr

# workable label list
$ cat labels
reserved:.* k8s:!io.cilium.k8s.namespace.labels.* k8s:io.cilium.k8s.policy k8s:app k8s:name [our-specific-labels] 
# broken label list
$ cat labels
k8s:!io.cilium.k8s.namespace.labels.* k8s:io.cilium.k8s.policy k8s:app k8s:name [our-specific-labels] 

where, [our-specific-labels] is a list with all labels in the format like k8s:xxx.ourcorp.com/yyy. I can send the intact label list or any other configuration via DM if you are interested. @pchaigno Thanks!

@borkmann borkmann added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 12, 2021
ArthurChiao added a commit to ctripcloud/cilium that referenced this issue Jan 23, 2021
Fix cilium#14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
aanm pushed a commit that referenced this issue Jan 25, 2021
Fix #14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
michi-covalent pushed a commit that referenced this issue Feb 11, 2021
[ upstream commit 16e8f2f ]

Fix #14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
pchaigno pushed a commit that referenced this issue Feb 12, 2021
[ upstream commit 16e8f2f ]

Fix #14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
borkmann pushed a commit that referenced this issue Feb 12, 2021
[ upstream commit 16e8f2f ]

Fix #14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
aanm pushed a commit that referenced this issue Feb 15, 2021
[ upstream commit 16e8f2f ]

Fix #14100

Identity relevant labels is a label prefix list combined of two parts:

1. base part:
  1.1. Read from a user specified (--label-prefix-file) json file if this
      file is provided. Default: `--label-prefix-file=""`.
  1.2 If `--label-prefix-file=""`, read from a default hardcoded list
      (`func defaultLabelPrefixCfg()`).
2. additional part: read from user inputs (--labels), default `--labels=""`

When `--label-prefix-file=""` (default) but `--labels=<custom-list>` provided,
if `reserved:host` (or `reserved:.*`) is not included in the above
`<custom-list>`, the `cilium_host` endpoint will lose its `reserved:host`
label.

When rolling back to the default configuration, that is, setting `--labels=""`
and restarting the agent, cilium agent will raise errors like following:

```
level=warning msg="Regeneration of endpoint failed" .. error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
level=error msg="endpoint regeneration failed" ..  error="Exposing new BPF failed: invalid LXC MAC: invalid MAC address "
```

And subsequently, all pods' traffic on this node will be interrupted.

This is because the agent relies on this label to distinguish `cilium_host`
endpoint from normal endpoints, and the former has no `lxcMAC`.
We should never exclude reserved labels from default label list.
Add reserved labels to the default label list could solve the problem.

Appendix:

Sample custom label file (--label-prefix-file) to overwrite the default base
label list:

```
{
    "version": 1,
    "valid-prefixes": [
    {
            "source": "k8s",
            "prefix": "io.kubernetes.pod.namespace"
    }, {
            "source": "k8s",
          "prefix": ":io.cilium.k8s.namespace.labels"
    }, {
            "source": "k8s",
            "prefix": "app.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "k8s!:io.kubernetes"
    },{
            "source": "k8s",
            "prefix": "!kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!.*beta.kubernetes.io"
    },{
            "source": "k8s",
            "prefix": "!k8s.io"
    },{
            "source": "k8s",
            "prefix": "!pod-template-generation"
    },{
            "source": "k8s",
            "prefix": "!pod-template-hash"
    },{
            "source": "k8s",
            "prefix": "!controller-revision-hash"
    },{
            "source": "k8s",
            "prefix": "!annotation.*"
    },{
            "source": "k8s",
            "prefix": "!etcd_node"
    ]
}
```

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/host-firewall Impacts the host firewall or the host endpoint. kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@kkourt @borkmann @joestringer @pchaigno @aanm @ArthurChiao and others