Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: Switch to LPM policy map #23885

Merged
merged 8 commits into from Apr 25, 2023

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Feb 20, 2023

Change to use LPM trie map for endpoint policy, as this allows prefix based wildcarding of protocol and port, when placed at the end of the map key. This helps eliminate two policy map lookups, which should help offset the performance penalty of a more complex map lookup.

This involves both changing the map type, and extending the policymap key with the prefix length field that is customary to all LPM maps. At the same time we need to reorganise the key fields so that the fields we plan to wildcard are at the end of the key and in the order of the port being the last field and protocol before it, as it can be wildcarded also when the protocol is not wildcarded, but the protocol can be wildcarded only if the port is also (completely) wildcarded.

With this new LPM policy map we still need a second lookup with an explicitly wildcarded security ID, as it can be wildcarded also when protocol and/or port are being matched.

As Cilium agent can re-populate policy maps from scratch there is no need for any specific upgrade/downgrade support for the map type and key layout change. Old programs keep using the old policy maps they were started with, while new programs for all endpoints upon the start of the upgraded Cilium agent will the new maps when they are loaded.

This change is a prerequisite for port range support in network policy.

@jrajahalme jrajahalme added kind/enhancement This would improve or streamline existing functionality. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Feb 20, 2023
@jrajahalme jrajahalme requested review from a team as code owners February 20, 2023 12:31
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 20, 2023
@jrajahalme jrajahalme marked this pull request as draft February 20, 2023 12:31
@jrajahalme jrajahalme marked this pull request as ready for review February 20, 2023 12:59
@jrajahalme jrajahalme added the release-note/misc This PR makes changes that have no direct user impact. label Feb 20, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 20, 2023
@jrajahalme
Copy link
Member Author

jrajahalme commented Feb 20, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Found 15 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Found 15 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent bits LGTM, but i see some CI failures that look maybe relevant to these changes:

Tests upgrade and downgrade from a Cilium stable image to master:
Key-size mismatch for BPF map" file-path=/sys/fs/bpf/tc/globals/cilium_policy_00835 new=8 old=12 subsys=bpf

Comment on lines 164 to 165
} else if prefix > 8 {
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care here about values > 24? i.e. are they valid, or are we using prefixlen to bound the value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats how it seems to me as well. i guess my question could be rephrased a bit clearer as:
is it possible to encounter a value > 24 that can cause some unexpected behavior here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added safety checks, thanks!

Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also a couple of questions, sorry in advance if they are too stupid, I'm new to this code area.

bpf/lib/maps.h Show resolved Hide resolved
bpf/lib/common.h Show resolved Hide resolved
bpf/lib/policy.h Outdated Show resolved Hide resolved
bpf/lib/policy.h Outdated Show resolved Hide resolved
Comment on lines 164 to 165
} else if prefix > 8 {
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.

pkg/maps/policymap/policymap.go Outdated Show resolved Hide resolved
cilium/cmd/helpers.go Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Apr 1, 2023

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Apr 1, 2023
@jrajahalme
Copy link
Member Author

Replaced the last commit with a simpler method of adding an exception to the fail-on-specific warning code for the policy map.

@pchaigno
Copy link
Member

Does that mean all users will get a non-avoidable, non-actionable warning on upgrades and downgrades? 😬

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to use LPM trie map for endpoint policy, as this allows prefix
based wildcarding of protocol and port, when placed at the end of the map
key.

This commit is missing a lot of context. The commit shuffles some of the fields in policy key around. So how do we ensure that the datapath is in sync with the control plane state? I would've expected to see some map versioning code where the agent would populate the v2 policy map with the new key layout, and then atomically swap the old map. If the commit takes a different approach, then it needs to be explicitly called out in the commit description and code.

Also, please mention in the code why the port and protocol fields need to be at the end of the key struct.

@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 20, 2023

Also, please mention in the code why the port and protocol fields need to be at the end of the key struct.

Added comment right before the moved protocol and dport fields:

	/*
	 * 'protocol' and 'port' can be wildcarded as indicated by 'lpm_key',
	 * hence they need to be at the end of the key. The order of these fields is
	 * significant as protocol can only be wildcarded if dport is also fully
	 * wildcarded. 'protocol' is never partially wildcarded, so it is either fully
	 * wildcarded or not wildcarded at all. 'dport' can be partially wildcarded.
	 */

(will push this change once the current CI run finishes)

@jrajahalme
Copy link
Member Author

This commit is missing a lot of context. The commit shuffles some of the fields in policy key around. So how do we ensure that the datapath is in sync with the control plane state? I would've expected to see some map versioning code where the agent would populate the v2 policy map with the new key layout, and then atomically swap the old map. If the commit takes a different approach, then it needs to be explicitly called out in the commit description and code.

A bpf program will keep on using the (policy) map it was loaded with, even when the userspace creates a new map on the same path for the upgrade. The new map is then only used by the new bpf program once it is loaded. This means that versioning does not need to be explicit, and a given version of Cilium agent does not need to understand the (policy) map layout of another version.

@jrajahalme
Copy link
Member Author

/test

@jrajahalme
Copy link
Member Author

@aditighag Elaborated on the LPM comment a bit, now it is placed before the key definition and reads:

/*
 * Longest-prefix match map lookup only matches the number of bits from the
 * beginning of the key stored in the map indicated by the 'lpm_key' field in
 * the same stored map key, not including the 'lpm_key' field itself. Note that
 * the 'lpm_key' value passed in the lookup function argument needs to be a
 * "full prefix" (POLICY_FULL_PREFIX defined below).
 * 
 * Since we need to be able to wildcard 'sec_label' independently on 'protocol'
 * and 'dport' fields, we'll need to do that explicitly with a separate lookup
 * where 'sec_label' is zero. For the 'protocol' and 'port' we can use the
 * longest-prefix match by placing them at the end ot the key in this specific
 * order, as we want to be able to wildcard those fields in a specific pattern:
 * 'protocol' can only be wildcarded if dport is also fully wildcarded.
 * 'protocol' is never partially wildcarded, so it is either fully wildcarded or
 * not wildcarded at all. 'dport' can be partially wildcarded, but only when
 * 'protocol' is fully specified. This follows the logic that the destination
 * port is a property of a transport protocol and can not be specified without
 * also specifying the protocol.
 */

I also moved the definition of POLICY_FULL_PREFIX to be right after the key definition to keep them together.

@jrajahalme
Copy link
Member Author

Does that mean all users will get a non-avoidable, non-actionable warning on upgrades and downgrades? 😬

For now, yes, until bpf.Map is refactored to let the caller do the warning only when needed.

@jrajahalme
Copy link
Member Author

Removed trailing whitespace.

@jrajahalme
Copy link
Member Author

/test

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding detailed comments around the port and protocol fields.
Re: map layout change: I suppose the agent populates a new map based on the revised key/value entries, and then swaps out the old map? While this PR doesn't change the logic explicitly, it does change the map type in addition to map key layout.
(The context needs to go in the PR/commit description. )

* order, as we want to be able to wildcard those fields in a specific pattern:
* 'protocol' can only be wildcarded if dport is also fully wildcarded.
* 'protocol' is never partially wildcarded, so it is either fully wildcarded or
* not wildcarded at all. 'dport' can be partially wildcarded, but only when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does partially wildcarded mean? Does it indicate a port range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a prerequisite for implementing port ranges, yes.

pkg/endpoint/bpf.go Show resolved Hide resolved
pkg/maps/policymap/policymap.go Outdated Show resolved Hide resolved
cilium/cmd/bpf_policy_get.go Outdated Show resolved Hide resolved
bpf/lib/common.h Show resolved Hide resolved
Reduce confusion by always keeping the policymap key and value in network
byteorder.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use PolicyEntry level accessor for IsDeny() so that the callers do not need to
know about the flags explicitly.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not export flags definition, simplify via using custom type instead of
depending on conversions to uint8.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use string for protocol in policymap key conversion to disambiguate port
and protocol numbers, and always include it in the string.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Change to use LPM trie map for endpoint policy, as this allows prefix
based wildcarding of protocol and port, when placed at the end of the map
key. This helps eliminate two policy map lookups, which should help
offset the performance penalty of a more complex map lookup.

This change is a prerequisite for port range support in network policy.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Add policymap unit testing to validate these policymap key & value invariants:
- protocol/nexthdr can only be wildcarded if destination port is wildcarded
- value has wildcardNexthdr flag set if and only if key wildcards Nexthdr
- value has wildcardDestPort flag set if and only if key wildcards DestPort
- deny entries have no redirection nor auth type

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
…rate

bpf unit tests failed with a null pointer access verifier error (R1
invalid mem access 'inv'). Apparently the assignment of 'l4policy' to
'policy' was confusing, so separate the code path for 'l4policy' to keep
verifier happy.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not fail tests for policy map key or value size mismatches, as in case
of policy maps the maps are regenerated from the agent without any help
from the datapath.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 24, 2023

/test

Job 'Cilium-PR-K8s-1.27-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN and endpoint routes

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-nnjzn

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.27-kernel-net-next/126/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.27-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Policy engine changes look good for my code owners. Thanks!

pkg/maps/policymap/policymap.go Show resolved Hide resolved
@jrajahalme jrajahalme merged commit 84e356d into cilium:main Apr 25, 2023
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants