Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: significantly improve capacity of TCP CT tables #10518

Merged
merged 1 commit into from
Mar 25, 2020

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Mar 9, 2020

ct_create{4,6}() inserts related entries into the TCP CT tables given
the map is usually in the form of ct_create4(get_ct_map4(&tuple)) or
ct_create6(get_ct_map6(&tuple)). Similarly, the lookup parts are in
form of ct_lookup4(get_ct_map4(&tuple)) or ct_lookup6(get_ct_map6(&tuple)).

However, the tuples' nexthdr usually points to the one in the packet.
This means, we can /never/ find a related entry since it sits in the TCP
CT tables, but their lookup is always in the ANY table instead.

Fix the insertions by adding to the CT_MAP_ANY{4,6} tables and by that
implicityly double the capacity of TCP CT tables.

Go even beyond that by not creating related entries for CT_SERVICE entries.

It does not make sense to create CT_SERVICE entries with related flag
since we don't translate ICMP there anyway. Save overhead and don't add
them to the maps (same for NodePort/NAT related ones).

Fixes: 750b3f9 ("bpf: Split connection tracking for TCP and non-TCP")
Signed-off-by: Daniel Borkmann daniel@iogearbox.net


This change is Reviewable

@borkmann borkmann added pending-review sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Mar 9, 2020
@borkmann borkmann requested review from joestringer and a team March 9, 2020 19:44
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Mar 9, 2020
@borkmann
Copy link
Member Author

borkmann commented Mar 9, 2020

test-me-please

@borkmann borkmann requested review from tklauser and brb March 9, 2020 19:47
@coveralls
Copy link

coveralls commented Mar 9, 2020

Coverage Status

Coverage decreased (-0.02%) to 45.435% when pulling eddafe1 on pr/fix-icmp-related into cfc5e8c on master.

@borkmann
Copy link
Member Author

borkmann commented Mar 9, 2020

Hitting #10517 as well now in CI:

2020-03-09T20:55:14.995873841Z level=debug msg="NAME tc STATUS R PID 1629 CPU: 80.46% MEM: 0.27% CMDLINE: tc filter replace dev lxc0af759b50da1 ingress prio 1 handle 1 bpf da obj 633_next/bpf_lxc.o sec from-container MEM-EXT: RSS: 10 VMS: 24 Data: 0 Stack: 0 Locked: 0 Swap: 0" endpointID=3505 subsys=endpoint
2020-03-09T20:55:15.234964416Z level=error msg="Command execution failed" cmd="[tc filter replace dev lxc0f91cdb14e63 ingress prio 1 handle 1 bpf da obj 3505_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
2020-03-09T20:55:15.235043078Z level=warning msg="Log buffer too small to dump verifier log 16777215 bytes (10 tries)!" subsys=datapath-loader
2020-03-09T20:55:15.235104785Z level=warning msg="Error fetching program/map!" subsys=datapath-loader
2020-03-09T20:55:15.235151382Z level=warning msg="Unable to load program" subsys=datapath-loader

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! One comment.

bpf/lib/conntrack.h Show resolved Hide resolved
@brb
Copy link
Member

brb commented Mar 10, 2020

Do we consider this as a bug? Asking, as we might need to set needs-backport labels accordingly.

@borkmann
Copy link
Member Author

Do we consider this as a bug? Asking, as we might need to set needs-backport labels accordingly.

No-one reported an issue at this point, but we potentially could backport.

@borkmann
Copy link
Member Author

test-me-please

@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.2 Mar 10, 2020
@borkmann
Copy link
Member Author

test-me-please

@borkmann
Copy link
Member Author

provision failure again

@borkmann
Copy link
Member Author

test-me-please

@borkmann
Copy link
Member Author

test-me-please

@christarazi
Copy link
Member

Build expired. Triggering again...

@christarazi
Copy link
Member

christarazi commented Mar 24, 2020

@tklauser
Copy link
Member

tklauser commented Mar 24, 2020

test-me-please

net-next vm provisioning fail: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/18158/

@tklauser
Copy link
Member

tklauser commented Mar 24, 2020

@tklauser
Copy link
Member

tklauser commented Mar 24, 2020

test-me-please

Hit #9330 again, it seems: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/18182/

@tklauser
Copy link
Member

test-me-please

ct_create{4,6}() inserts related entries into the TCP CT tables given
the map is usually in the form of ct_create4(get_ct_map4(&tuple)) or
ct_create6(get_ct_map6(&tuple)). Similarly, the lookup parts are in
form of ct_lookup4(get_ct_map4(&tuple)) or ct_lookup6(get_ct_map6(&tuple)).

However, the tuples' nexthdr usually points to the one in the packet.
This means, we can /never/ find a related entry since it sits in the TCP
CT tables, but their lookup is always in the ANY table instead.

Fix the insertions by adding to the CT_MAP_ANY{4,6} tables and by that
implicityly double the capacity of TCP CT tables.

Go even beyond that by not creating related entries for CT_SERVICE entries.

It does not make sense to create CT_SERVICE entries with related flag
since we don't translate ICMP there anyway. Save overhead and don't add
them to the maps (same for NodePort/NAT related ones).

Fixes: 750b3f9 ("bpf: Split connection tracking for TCP and non-TCP")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

borkmann commented Mar 25, 2020

test-me-please

net-next vm provisioning fail: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/18221/

@tklauser
Copy link
Member

test-me-please

1 similar comment
@borkmann
Copy link
Member Author

test-me-please

@borkmann
Copy link
Member Author

Hit #10231

@borkmann
Copy link
Member Author

test-me-please

@borkmann borkmann merged commit 424ea70 into master Mar 25, 2020
1.8.0 automation moved this from In progress to Merged Mar 25, 2020
@borkmann borkmann deleted the pr/fix-icmp-related branch March 25, 2020 22:52
@borkmann borkmann moved this from In progress (1.8) to Done in 1.9 kube-proxy removal & general dp optimization Mar 26, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.2 Apr 1, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.2 Apr 1, 2020
@joestringer joestringer moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.2 Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.7.2
Backport done to v1.7
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants