Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend KeepConfiguration= functionality #409

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chr4
Copy link
Contributor

@chr4 chr4 commented Sep 19, 2023

Description

With systemd-v243 (which is the default since focal), the "CriticalConnection" attribute was
revamped into "KeepConfiguration" [1], addressing several issues, especially in high-availability
environents [2][3].

The new "KeepConfiguration" setting can take multiple arguments, according to the documentation
[4]:

Takes a boolean or one of "static", "dhcp-on-stop", "dhcp". When "static", systemd-networkd will
not drop static addresses and routes on starting up process. When set to "dhcp-on-stop",
systemd-networkd will not drop addresses and routes on stopping the daemon. When "dhcp", the
addresses and routes provided by a DHCP server will never be dropped even if the DHCP lease
expires. This is contrary to the DHCP specification, but may be the best choice if, e.g., the
root filesystem relies on this connection. The setting "dhcp" implies "dhcp-on-stop", and "yes"
implies "dhcp" and "static". Defaults to "no".

If I understand the code correctly, it still uses "CriticalConnection" when setting "critical: true" in the netplan configuration.

Closes: https://bugs.launchpad.net/netplan/+bug/1896799

Checklist

  • Runs make check successfully.
  • Retains 100% code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

@chr4
Copy link
Contributor Author

chr4 commented Sep 19, 2023

This is a first draft - not tested yet. Feedback welcome!

@chr4 chr4 force-pushed the feat/keep-configuration branch 6 times, most recently from f66728c to 49a24de Compare September 19, 2023 13:26
@chr4
Copy link
Contributor Author

chr4 commented Sep 19, 2023

Not sure how to fix the Python SIGSEGVs?

@daniloegea
Copy link
Collaborator

Hello @chr4,

Thank you for bringing this to our attention and also for your PR, it's much appreciated.

Yes, we are still emitting the following configuration when critical: true is used in Netplan:

[DHCP]
CriticalConnection=true

As you already said, According to systemd's NEWS file this option was deprecated some time ago:

The CriticalConnection= setting in .network files is now deprecated,
and replaced by a new KeepConfiguration= setting which allows more
detailed configuration of the IP configuration to keep in place.

So yes, we need to get rid of CriticalConnection in favor of KeepConfiguration.

The problem is that we can't get rid of the critical key in Netplan, because it would break Netplan for everybody using this option.

Instead of replacing it with keep-configuration, I'd suggest a much simpler approach: adapt critical to accept all the new values supported by KeepConfiguration, including true and false.

This would make the patch simpler and wouldn't break Netplan. Also, all those functions and variables renaming wouldn't be necessary.

The second approach would be to keep critical as it is, deprecate it and introduce a new key, such as keep-configuration.

Keeping critical untouched would still work because systemd still accepts CriticalConnection, even though it's not even documented anymore. See https://github.com/systemd/systemd/blob/main/src/network/networkd-network.c#L245

In my opinion we could reuse critical and just change it to accept all the new values accepted by KeepConfiguration.

@slyon @vorlonofportland any thoughts on this?

@daniloegea daniloegea added the community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments. label Sep 21, 2023
src/abi.h Outdated
@@ -211,7 +211,7 @@ struct netplan_net_definition {
/* status options */
gboolean optional;
NetplanOptionalAddressFlag optional_addresses;
gboolean critical;
char* keep_configuration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make things simpler, we should use an enum for this option. It has a well defined set of possible values, so it would naturally fit well in an enum:

enum {
    NETPLAN_CRITICAL_TRUE,
    NETPLAN_CRITICAL_FALSE,
    NETPLAN_CRITICAL_STATIC,
    ...,
} CriticalOptions;

Please, check my comments in the summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Let me revamp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to rework this, but I'm not sure if I'm on the right track, as quite a few tests are failing or even segfaulting.

Copy link
Collaborator

@daniloegea daniloegea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there! Sorry for taking so long to reply. I added a few comments in your most recent commits. I hope it helps.

May I ask you to squash your commits for the next iteration? It will make it a bit easier to read the changes.

Thank you! :)

@@ -225,7 +225,7 @@ reset_netdef(NetplanNetDefinition* netdef, NetplanDefType new_type, NetplanBacke

netdef->optional = FALSE;
netdef->optional_addresses = 0;
FREE_AND_NULLIFY(netdef->keep_configuration);
netdef->critical = NETPLAN_CRITICAL_FALSE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't initialize critical with false.

According to the documentation, KeepConfiguration defaults to "dhcp-on-stop" when systemd-networkd is running in initrd, "yes" when the root filesystem is a network filesystem, and "no" otherwise.

Initializing it to false would imply setting it to false even when the default is something else. I suggest adding a new entry in the enum, something like NETPLAN_CRITICAL_UNSET, and use it as the default value. Then in the switch-case you added, you'd also have an option for the false case and and an option that does nothing for the unset case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using NETPLAN_CRITICAL_UNSET here.

src/parse.c Outdated
@@ -2785,7 +2785,7 @@ static const mapping_entry_handler dhcp6_overrides_handlers[] = {
{"accept-ra", YAML_SCALAR_NODE, {.generic=handle_accept_ra}, netdef_offset(accept_ra)}, \
{"activation-mode", YAML_SCALAR_NODE, {.generic=handle_activation_mode}, netdef_offset(activation_mode)}, \
{"addresses", YAML_SEQUENCE_NODE, {.generic=handle_addresses}, NULL}, \
{"keep-configuration", YAML_SCALAR_NODE, {.generic=handle_netdef_bool}, netdef_offset(keep_configuration)}, \
{"critical", YAML_SCALAR_NODE, {.generic=handle_netdef_bool}, netdef_offset(critical)}, \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the new critical option is not a boolean anymore, we'll need to update the parser. You'll need to create a new function that checks what option the user set and store one of the enum variants in the variable. You can base your implementation on the handle_auth_methods function for example https://github.com/chr4/netplan/blob/feat/keep-configuration/src/parse.c#L947

And as we want to still accept the boolean values so we'll not break existing configurations out there, we'll need to accept all the boolean values accepted by netplan. See https://github.com/chr4/netplan/blob/feat/keep-configuration/src/parse.c#L402

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like any changes to src/parse.c have been lost in a rebase?

case NETPLAN_CRITICAL_DHCP_ON_STOP:
g_string_append(s, "KeepConnection=dhcp-on-stop\n");
break;
default: g_assert_not_reached(); // LCOV_EXCL_LINE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons it's failing tests is because the case for the false option is missing here. See my comment below about the enum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -75,8 +75,8 @@ def set_name(self) -> str:
return _string_realloc_call_no_error(lambda b: lib.netplan_netdef_get_set_name(self._ptr, b, len(b)))

@property
def keep_configuration(self) -> str:
return _string_realloc_call_no_error(lambda b: lib.netplan_netdef_get_keep_configuration(self._ptr, b, len(b)))
def critical(self) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could keep the Python binding untouched and return False if critical is false (or unset) and True if it's set to any of the other values...

Although we probably should change it to return the enum values. You can see an example of enums in the Python code here https://github.com/chr4/netplan/blob/feat/keep-configuration/python-cffi/netplan/_utils.py#L27

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting thought. I feel like we should not diverge the bindings from the native API behavior, though. But we need to think about how we can handle that in a backwards compatible way.

@slyon slyon changed the title Migrate CriticalConnection (deprecated) to KeepConfiguration Extend KeepConfiguration functionality Nov 27, 2023
@slyon slyon changed the title Extend KeepConfiguration functionality Extend KeepConfiguration= functionality Nov 27, 2023
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution to Netplan @chr4! And for the initial review from @daniloegea

Overall, I think this is going into the right direction, but could you please have a look at #424 ? Maybe we should make it a two-step approach to land that first (and fix the bug at hand), then work on the extension of the new setting as an additional step.

Danilo's suggestions are good and should be addressed. I like the approach of keeping the critical: setting in Netplan and just extending it to accept "bool or scalar" eventually. So we can keep backwards compatibility and support the new usecase going forward.

I left some additional remarks inline, which need some more work. Especially:

  • Any changes to src/parse.c seem to been lost. We need those.
  • unit-test and CI failures. We want it to be (mostly) green. Danilo gave some good advice of how to approach this. You might also want to run make check-coverage locally, to test on your own, outside the CI.

@@ -225,7 +225,7 @@ reset_netdef(NetplanNetDefinition* netdef, NetplanDefType new_type, NetplanBacke

netdef->optional = FALSE;
netdef->optional_addresses = 0;
FREE_AND_NULLIFY(netdef->keep_configuration);
netdef->critical = NETPLAN_CRITICAL_FALSE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using NETPLAN_CRITICAL_UNSET here.

@@ -374,11 +374,19 @@ Match devices by MAC when setting options like: `wakeonlan` or `*-offload`.
> (networkd backend only) Allow the specified interface to be configured even
> if it has no carrier.

- **critical** (bool)
- **critical** (scalar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to keep this as "(bool or scalar)", making use of handle_generic_bool() internally in parse.c, checking for the error condition in the GError** error parameter. Then falling back to "scalar" (and validating the input to be one of the accepted strings) if a bool cannot be parsed.

NETPLAN_CRITICAL_STATIC,
NETPLAN_CRITICAL_DHCP,
NETPLAN_CRITICAL_DHCP_ON_STOP,
} NetplanCriticalOption;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we might just want to call that NetplanCritical to stay more in line with the other enums. Everything is an "Option", so that suffix feels kind of redundant.

PS: please also add and use a NETPLAN_CRITICAL_UNSET option, as suggested by @daniloegea

Note: changing the gboolean type to an enum should not break our ABI, as both will be mapped to a gint in the end.

case NETPLAN_CRITICAL_DHCP_ON_STOP:
g_string_append(s, "KeepConnection=dhcp-on-stop\n");
break;
default: g_assert_not_reached(); // LCOV_EXCL_LINE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +105 to 106
NETPLAN_INTERNAL NetplanCriticalOption
_netplan_netdef_get_critical(const NetplanNetDefinition* netdef);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is NETPLAN_INTERNAL API, as also desribed by the _netplan prefix. It's supposed to be used by Netplan tools/CLI only, not by external consumers. Although, changing the signature is an API breaking change, I think it's fine as we're the only ones affected.

Comment on lines +382 to +388
> When set to "static", static addresses and routes won't be dropped on
> starting up process.
> When set to "dhcp-on-stop", addresses and routes won't be dropped when
> stopping the daemon.
> When set to "dhcp", addresses and routes provided by a DHCP server will
> never be dropped even if the DHCP lease expires, implies "dhcp-on-stop".
> When set to "yes", "dhcp" and "static" is implied.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Did you double-check if something similar to KeepConfiguration= exists for NetworkManager these days? I'd like to align the wording between the two backends. If it does not exist it might be fine re-using the systemd-networkd terminology here. I need to think about this a bit more.

src/parse.c Outdated
@@ -2785,7 +2785,7 @@ static const mapping_entry_handler dhcp6_overrides_handlers[] = {
{"accept-ra", YAML_SCALAR_NODE, {.generic=handle_accept_ra}, netdef_offset(accept_ra)}, \
{"activation-mode", YAML_SCALAR_NODE, {.generic=handle_activation_mode}, netdef_offset(activation_mode)}, \
{"addresses", YAML_SEQUENCE_NODE, {.generic=handle_addresses}, NULL}, \
{"keep-configuration", YAML_SCALAR_NODE, {.generic=handle_netdef_bool}, netdef_offset(keep_configuration)}, \
{"critical", YAML_SCALAR_NODE, {.generic=handle_netdef_bool}, netdef_offset(critical)}, \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like any changes to src/parse.c have been lost in a rebase?

case NETPLAN_CRITICAL_DHCP_ON_STOP:
g_string_append(s, "KeepConnection=dhcp-on-stop\n");
break;
default: g_assert_not_reached(); // LCOV_EXCL_LINE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -75,8 +75,8 @@ def set_name(self) -> str:
return _string_realloc_call_no_error(lambda b: lib.netplan_netdef_get_set_name(self._ptr, b, len(b)))

@property
def keep_configuration(self) -> str:
return _string_realloc_call_no_error(lambda b: lib.netplan_netdef_get_keep_configuration(self._ptr, b, len(b)))
def critical(self) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting thought. I feel like we should not diverge the bindings from the native API behavior, though. But we need to think about how we can handle that in a backwards compatible way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments.
Projects
None yet
3 participants