Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for ip-masq-agent #26435

Merged
merged 7 commits into from Jun 26, 2023
Merged

Fixes for ip-masq-agent #26435

merged 7 commits into from Jun 26, 2023

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Jun 23, 2023

It looks like I didn't test the ip-masq-agent quite enough when adding support for IPv6 - in particular, I didn't test with IPv4 only. Sebastian reported that when IPv6 is off, ip-masq-agent is broken. This PR contains a number of fixes to address that.

Fixes: #26262

Fix some map handling logic as well as some issues with CLI commands related to ip-masq-agent, introduced with IPv6 support

@qmonnet qmonnet added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 23, 2023
@qmonnet qmonnet force-pushed the pr/ipmasq-fixes branch 2 times, most recently from bb4044f to 5bd49ed Compare June 23, 2023 01:41
@qmonnet
Copy link
Member Author

qmonnet commented Jun 23, 2023

/test

@qmonnet
Copy link
Member Author

qmonnet commented Jun 23, 2023

/test

@qmonnet qmonnet requested a review from gandro June 23, 2023 12:14
@qmonnet qmonnet marked this pull request as ready for review June 23, 2023 12:14
@qmonnet qmonnet requested review from a team as code owners June 23, 2023 12:14
@borkmann borkmann requested a review from brb June 23, 2023 12:32
@@ -361,6 +361,18 @@ var _ = Describe("K8sDatapathConfig", func() {
testIPMasqAgent()
})

It("DirectRouting, IPv4 only", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI, we want to move this to CLI connectivity test in v1.15.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, this shouldn't be an issue I think? Or do you have an alternative to suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR it's not an issue. I'm expecting the move to happen early in the v1.15 dev cycle.

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributing lgtm

@qmonnet
Copy link
Member Author

qmonnet commented Jun 23, 2023

The first two commits from the previous iteration ended up here by mistake (they're from #26454). I removed them, the rest is unchanged.

@qmonnet
Copy link
Member Author

qmonnet commented Jun 23, 2023

/test

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on changes in examples/

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for ci-structure. It would be good to have a tracking issue for migrating the test out of Ginkgo.

@qmonnet
Copy link
Member Author

qmonnet commented Jun 24, 2023

Ok, I added an item to #26074. Where would be the best location for this test? Basically I want to make sure that Cilium deploys well with the ip-masq-agent on IPv4 only (and that we don't reintroduce this IPv6-map-related warning that would make it unhealthy) and that the feature works as expected. Would connectivity tests be the way to go (as a follow-up)?

pkg/datapath/maps/map.go Outdated Show resolved Hide resolved
@@ -26,7 +26,7 @@ var bpfIPMasqListCmd = &cobra.Command{
Run: func(cmd *cobra.Command, args []string) {
common.RequireRootPrivilege("cilium bpf ipmasq list")

cidrs, err := (&ipmasq.IPMasqBPFMap{}).Dump()
cidrs, err := (&ipmasq.IPMasqBPFMap{}).DumpForProtocols(getMasqueradingProtocols())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMasqueradingProtocols seems like a pretty big hammer to me. Could you instead call IPMasq6Map and IPMasq4Map (ignoring any "file doesn't exist" error) before Dump()?

Copy link
Member Author

@qmonnet qmonnet Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline -- This won't work, because we're trying to Open() the map in DumpWithCallback(), so if we initialise ipmasq.ipMasq6Map by calling ipmasq.IPMasq6Map() in the CLI, we get an error:

root@kind-worker:/home/cilium# cilium bpf ipmasq list
error dumping contents of map: loading pinned map /sys/fs/bpf/tc/globals/cilium_ipmasq_v6: no such file or directory

I'll follow your alternate suggestion instead, using the result of ipv4Needed := ipmasq.IPMasq4Map().Open() == nil (and v6) to check whether the maps are open and get the booleans to pass to DumpForProtocols(), instead of going for the API request.

Thanks for the review!

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for fixing this!

@qmonnet
Copy link
Member Author

qmonnet commented Jun 26, 2023

/test

@qmonnet
Copy link
Member Author

qmonnet commented Jun 26, 2023

I forgot to update my commit log after addressing Lorenz's feedback. I updated and triggered the tests again...
/test

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qmonnet Nice work! Just a few picks...

daemon/cmd/datapath.go Outdated Show resolved Hide resolved
pkg/datapath/maps/map.go Show resolved Hide resolved
pkg/maps/ipmasq/ipmasq.go Show resolved Hide resolved
pkg/maps/ipmasq/ipmasq.go Show resolved Hide resolved
pkg/maps/ipmasq/ipmasq.go Show resolved Hide resolved
@@ -121,6 +131,10 @@ func (*IPMasqBPFMap) Dump() ([]net.IPNet, error) {
return cidrs, nil
}

func (*IPMasqBPFMap) Dump() ([]net.IPNet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still used? Also no comment on exported method.

Perhaps a larger discussion here but this sounds like prime candidates for generics.
There is very little use for this type as shown below as since this is just a marker type that just keep checking what kind of ips are at play and whether the correct matching configs are set.

Note: in this case we need an instance that is passed in on the fly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still used, in the agent. The method didn't have a comment so far and I didn't think of adding one; but I can do, for both.

As for the generics, it makes sense, but have we been using them so far in the code base? This PR is a set of fixes, I'd as well avoid moving things too much or introducing generics in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally Quentin, I understand. Just a quick note, for potential refacs at some other time.

Fix a bug introduced with the support for IPv6 ip-masq-agent. The macros
for IPv6 are not at the right location, and they will be set if IPv4
masquerading is set, even if it is disabled for IPv6. Let's move the
code to its right location.

Fixes: 7fd9a0d ("ipmasq: Enable IPv6 support for ip-masq-agent")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
When adding support for IPv6 ip-masq-agent, the related maps were not
handled correctly in RemoveDisabledMaps(), resulting in maps being kept,
even if option.Config.EnableIPv(4|6)Masquerade is disabled. Let's fix
the conditions to sweep these maps.

Fixes: c5348d2 ("ipmasq: Implement ip-masq-agent support for IPv6")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Before we introduced support for the IPv6 ip-masq-agent, only IPv4 would
support it. Therefore, if the ip-masq-agent agent was enabled in the
config, this meant that we had validated that IPv4 masquerading was set,
and we could afford creating the related map on the combination of
options (IPv4 + ip-masq-agent enabled).

Now that we have IPv6 support, having IPv4 and the ip-masq-agent enabled
no longer means we're using it for IPv4, given that we may chose to
enable masquerading for IPv6 only. So we need to update the conditions
in the datapath package, to create these maps only when necessary.

Fixes: 7fd9a0d ("ipmasq: Enable IPv6 support for ip-masq-agent")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
If we don't pay attention to what IP protocols should get masqueraded,
and try to interact with ip-masq-agent maps without precautions, we can
end up in a situation where we try to open a map that does not exist.

For example, if the cluster runs with IPv4 only, and we try to add the
default prefixes (IPv4 and IPv6) to the ip-masq-agent, then we try to
Update() an entry in the IPv6 map for the ip-masq-agent. But this map
does not exist (we run with IPv4 only and never created it). So we get a
warning in the logs, preventing Cilium to be healthy:

    loading pinned map /sys/fs/bpf/tc/globals/cilium_ipmasq_v6: no such file or directory

Fix this by restricting the operations to the IP versions for which we
have masquerading enabled.

Fixes: c5348d2 ("ipmasq: Implement ip-masq-agent support for IPv6")
Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
When adding support for IPv6 ip-masq-agent in Cilium, we made sure to
Dump() the contents of the maps only for the map that had been defined
(IPv4 and/or IPv6 versions).

This works well in the agent, but not from the CLI, which also calls
Dump() for "cilium bpf ipmasq list". In that case, the variable
referencing the maps have never been set, and we end up exiting from the
function without dumping any contents.

To fix this, we need the CLI to be able to chose what maps should be
needed. The ipmasq package does not expose the maps directly, so we want
to pass it the values for option.Config.EnableIPv(4|6)Masquerade
instead. These configuration options are not directly available from the
CLI. Instead, we check whether the IPv4 and IPv6 ip-masq-agent maps are
open, and let the ipmasq package know which maps to dump accordingly.

Fixes: c5348d2 ("ipmasq: Implement ip-masq-agent support for IPv6")
Reported-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
YAML indentation is not correct, and the example file cannot be applied
in a straightforward fashion. Let's fix it.

Fixes: 2dc829b ("docs: Simplify example how to config ipmasq via ConfigMap")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
In order to detect issues occurring with ip-masq-agent when IPv4 only is
enabled, add a new test to deploy the ip-masq-agent without IPv6.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet
Copy link
Member Author

qmonnet commented Jun 26, 2023

I addressed the feedback from Fernand - I factored a check in daemon/cmd/datapath.go, and added comments for the two exported Dump*() functions - and rebased.

@qmonnet
Copy link
Member Author

qmonnet commented Jun 26, 2023

/test

@qmonnet qmonnet requested a review from derailed June 26, 2023 21:15
Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qmonnet LGTM. Thank you for the updates!

@qmonnet
Copy link
Member Author

qmonnet commented Jun 26, 2023

All codeowners are covered by reviews, and CI is green. Let's get these fixes merged

@qmonnet qmonnet merged commit fa67780 into cilium:main Jun 26, 2023
65 checks passed
@qmonnet qmonnet deleted the pr/ipmasq-fixes branch June 26, 2023 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ip-masq-agent broken for IPv4-only
8 participants