Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to skip loading kernel modules in antrea-agent #5754

Merged
merged 2 commits into from
Dec 1, 2023

Conversation

antoninbas
Copy link
Contributor

In order to support some specialized distributions, we may need to provide users with the ability to skip loading kernel modules. In particular, this is required to support Talos Linux (see #5707).

The Antrea Agent may try to load modules in 2 places:

  1. in the install-cni initContainer: we try to load modules, mostly as a sanity check. If loading the openvswitch module fails, the container fails.
  2. in the antrea-ovs container: this is outside of our direct control, but the ovs-ctl start script will try to load the openvswitch module if not detected.

For install-cni, we introduce an environment variable, SKIP_LOADING_KERNEL_MODULES. If set, we do not run modprobe at all.

For antrea-ovs, we introduce a new flag, --skip-kmod, to the start_ovs script. If provided, we ensure that ovs-ctl will not try to run modprobe, by replacing the ovs-kmod-ctl utility script by a no-op.

To simplify usage, we introduce a new Helm configuration value, agent.dontLoadKernelModules. If set to true, we will take care of both configurations above.

Note that even when skipping "explicit" Kernel module loading, the module will still be automatically loaded on the host when starting OVS if needed. This seems to be expected for recent Linux Kernel versions.

With this change, Antrea can run on Talos Linux (confirmed with both the Docker and QEMU provisioners).

As part of this change, we also introduce the agent.antreaOVS.extraEnv Helm value, to inject arbitrary environment variables in the antrea-ovs container. This is for parity with other antrea-agent containers, and is not strictly required.

In order to support some specialized distributions, we may need to
provide users with the ability to skip loading kernel modules. In
particular, this is required to support Talos Linux (see antrea-io#5707).

The Antrea Agent may try to load modules in 2 places:

 1. in the install-cni initContainer: we try to load modules, mostly as
    a sanity check. If loading the openvswitch module fails, the
    container fails.
 2. in the antrea-ovs container: this is outside of our direct control,
    but the ovs-ctl start script will try to load the openvswitch module
    if not detected.

For install-cni, we introduce an environment variable,
SKIP_LOADING_KERNEL_MODULES. If set, we do not run modprobe at all.

For antrea-ovs, we introduce a new flag, `--skip-kmod`, to the start_ovs
script. If provided, we ensure that ovs-ctl will not try to run
modprobe, by replacing the ovs-kmod-ctl utility script by a no-op.

To simplify usage, we introduce a new Helm configuration value,
`agent.dontLoadKernelModules`. If set to true, we will take care of both
configurations above.

Note that even when skipping "explicit" Kernel module loading, the
module will still be automatically loaded on the host when starting OVS
if needed. This seems to be expected for recent Linux Kernel versions.

With this change, Antrea can run on Talos Linux (confirmed with both the
Docker and QEMU provisioners).

As part of this change, we also introduce the `agent.antreaOVS.extraEnv`
Helm value, to inject arbitrary environment variables in the antrea-ovs
container. This is for parity with other antrea-agent containers, and is
not strictly required.

Signed-off-by: Antonin Bas <abas@vmware.com>
@antoninbas antoninbas requested a review from tnqn November 29, 2023 19:37
# skip-kmod flag, which prevents the ovs-ctl script from trying to load any Kernel module. In
# order for this to work, we need to turn ovs-kmod-ctl into a "no-op".
cp /usr/share/openvswitch/scripts/ovs-kmod-ctl /usr/share/openvswitch/scripts/ovs-kmod-ctl.bak
echo ":" > /usr/share/openvswitch/scripts/ovs-kmod-ctl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does : has any special meaning as a script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be no-different from an empty script. I used : to emphasize that it was a shell script and that we meant for it to be a no-op.

| agent.antreaOVS.logFileMaxNum | int | `4` | Max number of log files. |
| agent.antreaOVS.logFileMaxSize | int | `100` | Max size in MBs of any single log file. |
| agent.antreaOVS.resources | object | `{"requests":{"cpu":"200m"}}` | Resource requests and limits for the antrea-ovs container. |
| agent.antreaOVS.securityContext.capabilities | list | `["SYS_NICE","NET_ADMIN","SYS_ADMIN","IPC_LOCK"]` | Capabilities for the antrea-ovs container. |
| agent.antreaOVS.securityContext.privileged | bool | `false` | Run the antrea-ovs container as privileged. |
| agent.apiPort | int | `10350` | Port for the antrea-agent APIServer to serve on. |
| agent.dnsPolicy | string | `""` | DNS Policy for the antrea-agent Pods. If empty, the Kubernetes default will be used. |
| agent.dontLoadKernelModules | bool | `false` | Do not try to load any of the required Kernel modules (e.g., openvswitch) during initialization of the antrea-agent. Most users should never need to set this to true, but it may be required with some specific distributions. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why don't call it agent.loadKernelModules and defaults to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having the default value be the default value for the type (boolean here) when possible. It shows that loading kernel modules is meant to be the default / general behavior here, and that we need to set a variable to change it.

{{- if .Values.agent.dontLoadKernelModules }}
- name: SKIP_LOADING_KERNEL_MODULES
value: "1"
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it stop mounting host-lib-modules in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, we don't need it. The con is that it introduces one more difference between the 2 "modes", but that's not a big deal. I will make the change and run some sanity checks.

Signed-off-by: Antonin Bas <abas@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Dec 1, 2023
@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas
Copy link
Contributor Author

Validated the latest version with a Talos cluster

@antoninbas antoninbas merged commit e5a9ba1 into antrea-io:main Dec 1, 2023
50 of 54 checks passed
@antoninbas antoninbas deleted the add-dontLoadKernelModules branch December 1, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants