Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass vlan tagged packets back to the kernel stack. #16772

Merged
merged 1 commit into from Jul 23, 2021
Merged

Pass vlan tagged packets back to the kernel stack. #16772

merged 1 commit into from Jul 23, 2021

Conversation

kvaster
Copy link
Contributor

@kvaster kvaster commented Jul 5, 2021

VLAN packets will be catched with bpf on main interface first. We
need to passs it back to the kernel stack. VLAN info/tag will be stripped and
packet will be reenqueued on proper interface or dropped.

Also we do not need to process VLAN packets on egress, cause either
we've already processed such packets in case bpf program is attached
to VLAN interface or we do not need to process such packets at all in
other way.

VLAN tagged packets will be processed in the folowing way:

  • If vlan tag is zero packet will be processed as usual (cause it is targeted for main interface)
  • For each device will be allowed only corresponding vlan tags.
  • By default will be allowed only tags from devices controlled by cilium.
  • Any other tags may be allowed via option (--vlan-bpf-bypass), --vlan-bpf-bypass 0 may be used for allowing all available vlan tags.
  • All other vlan tags will be always dropped for security reason.

Concerns:

  • Probably ifindexes and vlan tags should be sorted to make configuration predictable.
  • Probably something like * should be used insteadof 0 for allowing all available vlan tags.

Previous pull request and discussion: #15534

Fixes: #14579

Signed-off-by: Viktor Kuzmin kvaster@gmail.com

Make NodePort BPF to work on VLAN devices

@kvaster kvaster requested review from a team July 5, 2021 06:23
@kvaster kvaster requested review from a team as code owners July 5, 2021 06:23
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 5, 2021
@kvaster kvaster requested review from borkmann and twpayne July 5, 2021 06:23
@twpayne twpayne added dont-merge/needs-release-note release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Jul 5, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 5, 2021
@twpayne
Copy link
Contributor

twpayne commented Jul 5, 2021

Please can you add a release note to the PR description, as documented in step 11 here.

@kvaster
Copy link
Contributor Author

kvaster commented Jul 6, 2021

fixed!

@kvaster kvaster requested a review from twpayne July 7, 2021 11:14
@twpayne
Copy link
Contributor

twpayne commented Jul 8, 2021

test-me-please

@kvaster
Copy link
Contributor Author

kvaster commented Jul 8, 2021

I've updated PR to fix some CI problems - cmdref docs.

@kvaster
Copy link
Contributor Author

kvaster commented Jul 12, 2021

Any updates ?

Copy link
Contributor

@twpayne twpayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a CLI perspective, but I need the experienced datapath folk to review the datapath aspects.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much closer to striking the right balance between providing the configurability you're looking for without surprising users too much with traffic bypassing Cilium logic.

I've requested a few changes below to make the change easier to reason about, both from a user perspective (semantics/naming and traceability) but also from a developer perspective (make it super simple to follow the flow of logic, test it so we can be sure that different variations of the option produce the desired behaviour).

The only thing I didn't raise below is the complexity concern. I strongly suspect that if you configure too many VLANs in this option, bpf program load will fail and the Cilium container will crash out because it cannot configure the datapath in the way you have requested. This could happen either when you first enable the option (because you specified too many tags), or it could plausibly happen later on when you upgrade to a new release, because each release can change the datapath. When you upgrade, there could be just enough additional BPF instructions introduced by this logic that when combined with the rest of the new datapath logic, it pushes the program complexity over some limit. I don't know where that limit is, and this PR is not proposing any testing for it. For now if this is a super-advanced option, maybe you're willing to take on that risk and deal with the consequences / roll back & report issues upstream when you hit them during upgrade. But we may want to think a bit more about integration testing and how we can detect such problems earlier in the development cycle so we can start the discussions early on how to address them.

pkg/datapath/linux/config/config.go Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
pkg/datapath/linux/config/config.go Outdated Show resolved Hide resolved
@kvaster
Copy link
Contributor Author

kvaster commented Jul 12, 2021

From my point of view it's rather safe to just bypass any non-zero vlan tag to kernel and allow it to do it's job, bit cilium policy is to control everything on devices controlled by cilium. In 99% situations there will be no or 1-2 vlan tags for 1-2 devices. But maybe it's a good idea to generate just return CTX_ACT_OK if 'all' vlans are requested.

bpf/bpf_host.c Outdated Show resolved Hide resolved
@kvaster
Copy link
Contributor Author

kvaster commented Jul 12, 2021

Thanks for comments. Will get back tomorrow with changes.

@kvaster kvaster requested a review from a team as a code owner July 13, 2021 17:47
@maintainer-s-little-helper
Copy link

Commit 3e74eb655d512e97d6a87478787341ba0e7126ca does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. and removed dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. labels Jul 16, 2021
@kvaster
Copy link
Contributor Author

kvaster commented Jul 16, 2021

I've pressed 'update branch' by an accident... Sorry...

@kvaster
Copy link
Contributor Author

kvaster commented Jul 17, 2021

What are next steps ? Is there anything more I can do to help this finalize ?

@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 20, 2021
@brb
Copy link
Member

brb commented Jul 20, 2021

Thanks. Marking as ready to merge.

@kvaster As a follow up, could you extend the kube-proxy-replacement guide by documenting your change?

@errordeveloper errordeveloper removed their assignment Jul 20, 2021
@errordeveloper errordeveloper removed their request for review July 20, 2021 08:46
@errordeveloper
Copy link
Contributor

Unasigning myself as this PR has 3 reviews already.

@brb
Copy link
Member

brb commented Jul 20, 2021

CC'ing @pchaigno and @jrfastab for visibility. This PR should fix host-fw and IPSec on vlan devices.

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a hubble and api perspective. LGTM. The datapath stuff makes sense to me too.

@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 20, 2021
@joestringer
Copy link
Member

The bot has marked not ready-to-merge because there are still @cilium/helm and @cilium/hubble reviews outstanding. @errordeveloper , it looks like you were pinged for those aspects (including the hubble API changes). Could you take a brief look specifically at those areas or consider raising this in the sig-hubble so that someone with the relevant expertise can take a look?

@kvaster
Copy link
Contributor Author

kvaster commented Jul 21, 2021

Thanks. Marking as ready to merge.

@kvaster As a follow up, could you extend the kube-proxy-replacement guide by documenting your change?

I will do this as a follow up PR!

Copy link
Contributor

@errordeveloper errordeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf and helm changes LGTM!

@kvaster
Copy link
Contributor Author

kvaster commented Jul 23, 2021

It seems that all reviews are passed...

// if it's vlan device and we're controlling vlan main device
// and either all vlans are allowed, or we're controlling vlan device or vlan is explicitly allowed
if ok && devices[vlan.ParentIndex] && (devices[vlan.Index] || allowedVlans[vlan.VlanId]) {
vlansByIfIndex[vlan.ParentIndex] = append(vlansByIfIndex[vlan.ParentIndex], vlan.VlanId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appending of vlanIDs that may not be ordered introduced a new unit test flake #17104.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvaster would you be able to look into this? Usually the solution is just to sort the slice once it's been assembled but before it is used to generate code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, looks like @jrajahalme beat you to it :) Feel free to review though: #17105

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was away... Seems problem was already solved... I will make one more PR soon as discussed - docs update.

@jhutchins
Copy link

Is there an upcoming 1.10 release that will include this fix in the near future?

@joestringer
Copy link
Member

Hi @jhutchins , typically more invasive features / changes are scheduled for the next upcoming release in order to minimize the risk of breakage for existing users. This will be included in the upcoming v1.11 release.

kvaster added a commit to kvaster/cilium that referenced this pull request Oct 7, 2021
Follow-up for PR: cilium#16772

Signed-off-by: Viktor Kuzmin <kvaster@gmail.com>
aanm pushed a commit that referenced this pull request Oct 13, 2021
Follow-up for PR: #16772

Signed-off-by: Viktor Kuzmin <kvaster@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodePort doesn't work correctly when VLANs are involved
10 participants