Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow icmp fragmentation needed agent option #8218

Merged
merged 1 commit into from
Aug 27, 2019

Conversation

fristonio
Copy link
Member

@fristonio fristonio commented Jun 5, 2019


This change is Reviewable

@fristonio fristonio requested review from a team June 5, 2019 12:50
@fristonio fristonio requested a review from a team as a code owner June 5, 2019 12:50
pkg/option/config.go Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Jun 5, 2019

Coverage Status

Coverage decreased (-0.004%) to 44.066% when pulling 28e9724 on fristonio:allow-icmp-frag-needed into c703ce7 on cilium:master.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Please see my comments below.

daemon/daemon_main.go Outdated Show resolved Hide resolved
bpf/lib/policy.h Outdated Show resolved Hide resolved
@fristonio fristonio force-pushed the allow-icmp-frag-needed branch 2 times, most recently from d0faf34 to cdb39aa Compare June 5, 2019 15:15
@tgraf tgraf added this to the 1.6.0 milestone Jun 5, 2019
@brb brb added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Jun 5, 2019
@tgraf
Copy link
Member

tgraf commented Jun 6, 2019

This looks fantastic

@tgraf
Copy link
Member

tgraf commented Jun 6, 2019

test-me-please

@tgraf tgraf added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jun 6, 2019
@tgraf
Copy link
Member

tgraf commented Jun 6, 2019

@fristonio

00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="In file included from /var/lib/cilium/bpf/bpf_overlay.c:37:" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="/var/lib/cilium/bpf/lib/policy.h:147:18: error: variable has incomplete type 'struct icmphdr'" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                struct icmphdr icmphdr;" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                               ^" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="/var/lib/cilium/bpf/lib/policy.h:147:10: note: forward declaration of 'struct icmphdr'" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                struct icmphdr icmphdr;" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                       ^" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="/var/lib/cilium/bpf/lib/policy.h:158:22: error: use of undeclared identifier 'ICMP_DEST_UNREACH'" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                if(icmphdr.type == ICMP_DEST_UNREACH && icmphdr.code == ICMP_FRAG_NEEDED)" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                                   ^" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="/var/lib/cilium/bpf/lib/policy.h:158:59: error: use of undeclared identifier 'ICMP_FRAG_NEEDED'" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                if(icmphdr.type == ICMP_DEST_UNREACH && icmphdr.code == ICMP_FRAG_NEEDED)" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="                                                                        ^" subsys=daemon
00:55:59.097      runtime: Jun 06 08:51:33 runtime cilium-agent[28160]: level=warning msg="3 errors generated." subsys=daemon

It will make sense to add ALLOW_ICMP_FRAG_NEEDED to the following list in bpf/Makefile:

LXC_OPTIONS = \
         -DSKIP_DEBUG \
         -DHAVE_LPM_MAP_TYPE \
         -DHAVE_LRU_MAP_TYPE \
         -DENABLE_IPV4 \
         -DENABLE_IPV4:-DHAVE_LPM_MAP_TYPE \
         -DENABLE_IPV4:-DHAVE_LPM_MAP_TYPE:-DHAVE_LRU_MAP_TYPE \
         -DENABLE_IPV6 \
         -DENABLE_IPV6:-DHAVE_LPM_MAP_TYPE \
         -DENABLE_IPV6:-DHAVE_LPM_MAP_TYPE:-DHAVE_LRU_MAP_TYPE \
         -DENABLE_IPV6:-DENABLE_IPV4 \
         -DENABLE_IPV6:-DENABLE_IPV4:-DENABLE_ROUTING \
         -DENABLE_IPV4:-DENABLE_IPV6:-DHAVE_LPM_MAP_TYPE:-DHAVE_LRU_MAP_TYPE \
         -DENABLE_HOST_REDIRECT:-DENABLE_IPV4:-DENABLE_IPV6 \
         -DENABLE_HOST_REDIRECT:-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_NAT46

@tgraf
Copy link
Member

tgraf commented Jun 6, 2019

You can then test-compile your code by running make in bpf/

@fristonio fristonio requested a review from a team as a code owner June 6, 2019 19:40
@tgraf
Copy link
Member

tgraf commented Jun 6, 2019

test-me-please

@fristonio
Copy link
Member Author

Will take a look at this as soon. as I get access to my laptop.

@vadorovsky
Copy link
Member

test-me-please

@fristonio
Copy link
Member Author

@tgraf I could not figure out the source of error from the CI logs. It's showing Cancelling nested steps due to timeout and error is in K8s 1.10 net-next vm provisioning fail. I am not sure if this is due to my PR or something else. Can you take a look?

@ianvernon
Copy link
Member

@fristonio we've had some issues with timeouts affecting our CI - these are actively being worked on. I'll run the build again and see if the issue you observed reproduces. thanks for your patience!

@ianvernon
Copy link
Member

test-me-please

@fristonio
Copy link
Member Author

Thanks a lot @ianvernon
The build still errored and was again due to a timeout.

06:08:02      Test NAT46 connectivity between containers [It]
06:08:02      /home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:410
06:08:02  
06:08:02      Endpoints not ready after timeout
[2019-06-09T00:38:02.852Z]     Expected
[2019-06-09T00:38:02.852Z]         <bool>: false
[2019-06-09T00:38:02.852Z]     to be true

But this was while checking NAT46 connectivity, I am not sure if this is due to code changes or a timeout problem like before.
I will try to debug the issue locally once and see if I can find something.

@tgraf
Copy link
Member

tgraf commented Jun 10, 2019

The program is now over the instruction limit:

Jun 09 00:33:49 runtime cilium-agent[15492]: level=error msg="Command execution failed" cmd="[tc filter replace dev lxc9e98214713e4 ingress prio 1 handle 1 bpf da obj 2074_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning msg="Prog section '2/10' rejected: Invalid argument (22)!" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning msg=" - Type:         3" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning msg=" - Attach Type:  0" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning msg=" - Instructions: 4107 (11 over limit)" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning msg=" - License:      GPL" subsys=datapath-loader
Jun 09 00:33:49 runtime cilium-agent[15492]: level=warning subsys=datapath-loader
Jun 09 00:33:49 runt

I have a local change to split LB into a separate section to relax this. I'll push it and you can rebase on top of that.

@fristonio
Copy link
Member Author

By instruction limits you mean the max instruction limits(4096) on a BPF program right?
Also, how do we split LB into separate sections, is this splitting done using tail calls?
I would like to explore a bit more regarding this, can you point me to the code sections for this?

@tgraf
Copy link
Member

tgraf commented Jun 11, 2019

By instruction limits you mean the max instruction limits(4096) on a BPF program right?

Yes

Also, how do we split LB into separate sections, is this splitting done using tail calls?

Exactly

I would like to explore a bit more regarding this, can you point me to the code sections for this?

Sure, I've pushed the branch here:
https://github.com/cilium/cilium/compare/pr/tgraf/split-lxc-prog

@borkmann
Copy link
Member

test-me-please

@brb
Copy link
Member

brb commented Jun 13, 2019

CI failed due to the same issue with instr count:

Jun 12 15:45:35 runtime cilium-agent[17494]: level=warning msg="Prog section '2/10' rejected: Invalid argument (22)!" subsys=datapath-loader
Jun 12 15:45:35 runtime cilium-agent[17494]: level=warning msg=" - Type:         3" subsys=datapath-loader
Jun 12 15:45:35 runtime cilium-agent[17494]: level=warning msg=" - Attach Type:  0" subsys=datapath-loader
Jun 12 15:45:35 runtime cilium-agent[17494]: level=warning msg=" - Instructions: 4107 (11 over limit)" subsys=datapath-loader
Jun 12 15:45:35 runtime cilium-agent[17494]: level=warning msg=" - License:      GPL" subsys=datapath-loader

@joestringer
Copy link
Member

@brb yeah, the LXC prog splitout will need to be done first. @fristonio , will you look into that? Happy to provide pointers/help if you have any questions.

@fristonio
Copy link
Member Author

Yeah sure, I would love to. I am not very experienced with BPF but will surely try to help out.

@aanm
Copy link
Member

aanm commented Jun 14, 2019

@fristonio needs rebase against master.

@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jun 14, 2019
@vadorovsky
Copy link
Member

test-me-please

@aanm
Copy link
Member

aanm commented Jun 19, 2019

@fristonio needs a new rebase. It seems the tests failed last time. Can you rebase so we can trigger the CI again?

@brb
Copy link
Member

brb commented Jul 2, 2019

@fristonio Let us know if we should do the rebase.

@fristonio
Copy link
Member Author

Sorry, I was a bit busy the past week.
Also, I think since this change is not in the master yet the tests will fail again.

@tgraf
Copy link
Member

tgraf commented Jul 9, 2019

test-me-please

@stale
Copy link

stale bot commented Aug 8, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 8, 2019
@stale stale bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 13, 2019
pkg/option/config.go Outdated Show resolved Hide resolved
@tgraf
Copy link
Member

tgraf commented Aug 26, 2019

@fristonio This looks ready to merge, just needs a rebase.

@fristonio
Copy link
Member Author

@tgraf I have rebased the pull request, but I am not sure if this will pass the ginkgo tests as I think that the split-lxc-prog branch changes are still not in master.

@tgraf tgraf removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Aug 27, 2019
@tgraf
Copy link
Member

tgraf commented Aug 27, 2019

test-me-please

* Add flag allow-icmp-frag-needed for cilium-agent to allow ICMP type3
code 4 packets in policy.
* This will allow TCP Path MTU discovery not to fail.

Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>
@tgraf tgraf merged commit 7435ff3 into cilium:master Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NetPol] Allow ICMP messages of a certain type/code
10 participants