New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datapath: Switch to LPM policy map #23885
datapath: Switch to LPM policy map #23885
Conversation
79b2706
to
a862cf9
Compare
/test Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agent bits LGTM, but i see some CI failures that look maybe relevant to these changes:
Tests upgrade and downgrade from a Cilium stable image to master:
Key-size mismatch for BPF map" file-path=/sys/fs/bpf/tc/globals/cilium_policy_00835 new=8 old=12 subsys=bpf
cilium/cmd/bpf_policy_get.go
Outdated
} else if prefix > 8 { | ||
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we care here about values > 24? i.e. are they valid, or are we using prefixlen to bound the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats how it seems to me as well. i guess my question could be rephrased a bit clearer as:
is it possible to encounter a value > 24 that can cause some unexpected behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added safety checks, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also a couple of questions, sorry in advance if they are too stupid, I'm new to this code area.
cilium/cmd/bpf_policy_get.go
Outdated
} else if prefix > 8 { | ||
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.
This pull request has been automatically marked as stale because it |
a862cf9
to
e49e7b3
Compare
Replaced the last commit with a simpler method of adding an exception to the fail-on-specific warning code for the policy map. |
Does that mean all users will get a non-avoidable, non-actionable warning on upgrades and downgrades? 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to use LPM trie map for endpoint policy, as this allows prefix
based wildcarding of protocol and port, when placed at the end of the map
key.
This commit is missing a lot of context. The commit shuffles some of the fields in policy key around. So how do we ensure that the datapath is in sync with the control plane state? I would've expected to see some map versioning code where the agent would populate the v2 policy map with the new key layout, and then atomically swap the old map. If the commit takes a different approach, then it needs to be explicitly called out in the commit description and code.
Also, please mention in the code why the port and protocol fields need to be at the end of the key struct.
Added comment right before the moved protocol and dport fields:
(will push this change once the current CI run finishes) |
A bpf program will keep on using the (policy) map it was loaded with, even when the userspace creates a new map on the same path for the upgrade. The new map is then only used by the new bpf program once it is loaded. This means that versioning does not need to be explicit, and a given version of Cilium agent does not need to understand the (policy) map layout of another version. |
/test |
@aditighag Elaborated on the LPM comment a bit, now it is placed before the key definition and reads:
I also moved the definition of POLICY_FULL_PREFIX to be right after the key definition to keep them together. |
6916c92
to
0142baf
Compare
For now, yes, until |
0142baf
to
04fd126
Compare
Removed trailing whitespace. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding detailed comments around the port and protocol fields.
Re: map layout change: I suppose the agent populates a new map based on the revised key/value entries, and then swaps out the old map? While this PR doesn't change the logic explicitly, it does change the map type in addition to map key layout.
(The context needs to go in the PR/commit description. )
* order, as we want to be able to wildcard those fields in a specific pattern: | ||
* 'protocol' can only be wildcarded if dport is also fully wildcarded. | ||
* 'protocol' is never partially wildcarded, so it is either fully wildcarded or | ||
* not wildcarded at all. 'dport' can be partially wildcarded, but only when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does partially wildcarded mean? Does it indicate a port range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a prerequisite for implementing port ranges, yes.
Reduce confusion by always keeping the policymap key and value in network byteorder. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use PolicyEntry level accessor for IsDeny() so that the callers do not need to know about the flags explicitly. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not export flags definition, simplify via using custom type instead of depending on conversions to uint8. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use string for protocol in policymap key conversion to disambiguate port and protocol numbers, and always include it in the string. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Change to use LPM trie map for endpoint policy, as this allows prefix based wildcarding of protocol and port, when placed at the end of the map key. This helps eliminate two policy map lookups, which should help offset the performance penalty of a more complex map lookup. This change is a prerequisite for port range support in network policy. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
04fd126
to
42fc378
Compare
Add policymap unit testing to validate these policymap key & value invariants: - protocol/nexthdr can only be wildcarded if destination port is wildcarded - value has wildcardNexthdr flag set if and only if key wildcards Nexthdr - value has wildcardDestPort flag set if and only if key wildcards DestPort - deny entries have no redirection nor auth type Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
…rate bpf unit tests failed with a null pointer access verifier error (R1 invalid mem access 'inv'). Apparently the assignment of 'l4policy' to 'policy' was confusing, so separate the code path for 'l4policy' to keep verifier happy. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not fail tests for policy map key or value size mismatches, as in case of policy maps the maps are regenerated from the agent without any help from the datapath. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
42fc378
to
6e93afb
Compare
/test Job 'Cilium-PR-K8s-1.27-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.27-kernel-net-next/126/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Policy engine changes look good for my code owners. Thanks!
Change to use LPM trie map for endpoint policy, as this allows prefix based wildcarding of protocol and port, when placed at the end of the map key. This helps eliminate two policy map lookups, which should help offset the performance penalty of a more complex map lookup.
This involves both changing the map type, and extending the policymap key with the prefix length field that is customary to all LPM maps. At the same time we need to reorganise the key fields so that the fields we plan to wildcard are at the end of the key and in the order of the port being the last field and protocol before it, as it can be wildcarded also when the protocol is not wildcarded, but the protocol can be wildcarded only if the port is also (completely) wildcarded.
With this new LPM policy map we still need a second lookup with an explicitly wildcarded security ID, as it can be wildcarded also when protocol and/or port are being matched.
As Cilium agent can re-populate policy maps from scratch there is no need for any specific upgrade/downgrade support for the map type and key layout change. Old programs keep using the old policy maps they were started with, while new programs for all endpoints upon the start of the upgraded Cilium agent will the new maps when they are loaded.
This change is a prerequisite for port range support in network policy.