Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I get Cilium to run on WSL2? #29302

Open
2 tasks done
HummingMind opened this issue Nov 21, 2023 · 33 comments
Open
2 tasks done

How do I get Cilium to run on WSL2? #29302

HummingMind opened this issue Nov 21, 2023 · 33 comments
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. good-first-issue Good starting point for new developers, which requires minimal understanding of Cilium. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@HummingMind
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Cilium pods crashing the the following error message in version 1.15.0-pre.2:

level=fatal msg="failed to start: daemon creation failed: error while initializing daemon: failed while reinitializing datapath: removing ipv6 proxy routing rule: address family not supported by protocol" subsys=daemon

Version 1.14.4 works fine.

Cilium Version

1.15.0-pre.2

Kernel Version

linux-msft-wsl-5.15.133.1

Kubernetes Version

1.28.3

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@HummingMind HummingMind added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Nov 21, 2023
@HummingMind
Copy link
Author

HummingMind commented Nov 21, 2023

I understand this might be related to ipv6, but what changed between 1.14.4 and 1.15.0-pre.2? Works fine with 1.14.4 but not with 1.15.0-pre.2

WSL2, kind cluster, with custom-compiled kernel with all the required modules according to:
https://docs.cilium.io/en/latest/operations/system_requirements/

@lmb lmb added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Nov 21, 2023
@lmb lmb changed the title cilium fails to start - v1.15.0-pre.2 - local kind cluster on WSL2 (with custom compiled WSL2 kernel) cilium fails to start on WSL2: removing ipv6 proxy routing rule: address family not supported by protocol Nov 21, 2023
@ti-mo ti-mo self-assigned this Nov 21, 2023
@ti-mo ti-mo removed the needs/triage This issue requires triaging to establish severity and next steps. label Nov 21, 2023
@networkop
Copy link
Contributor

I believe CONFIG_IPV6_MULTIPLE_TABLES=y may be missing

@ti-mo
Copy link
Contributor

ti-mo commented Nov 21, 2023

but what changed between 1.14.4 and 1.15.0-pre.2?

tl;dr: a lot. These rules used to be installed/removed by a shell script with a bunch of ip rule .. || true that swept all these errors under the rug. Fortunately, this is no longer the case. :)

@HummingMind I've put up #29311 that ignores only the specific EAFNOSUPPORT returned by the rule removal. Feel free to play around with this on your own. Are WSL2 kernels typically built with v6 disabled? In this case, you'll need to disable Cilium's v6 support. Otherwise, you'll just hit errors elsewhere.

@networkop's suggestion above may also prove useful. Did you explicitly disable CONFIG_IPV6_MULTIPLE_TABLES? I can't figure out from the kernel docs whether this defaults to true or not, but it's pulled in by at least CONFIG_IPV6_SEG6_LWTUNNEL. Unfortunately, there are many places returning EAFNOSUPPORT.

@networkop
Copy link
Contributor

with WSL2's kernel a LOT of default settings are disabled, so just going by https://docs.cilium.io/en/latest/operations/system_requirements/ may not help. I can include a diff of non-default flags (1.15.0-pre.2 runs fine for me) that I've got enabled but that may include some of the stuff not needed by Cilium.

@HummingMind
Copy link
Author

I believe CONFIG_IPV6_MULTIPLE_TABLES=y may be missing

I just checked the config-wsl file, and yes, this is not set.

@HummingMind
Copy link
Author

HummingMind commented Nov 21, 2023

Are WSL2 kernels typically built with v6 disabled? In this case, you'll need to disable Cilium's v6 support. Otherwise, you'll just hit errors elsewhere.

There is a new netwroking mode in WSL2 2.0.9, called mirrored mode (experimental), which adds IPv6 support to WSL2. I'll try it out, and if it doesn't work, I'll go back to figuring out what might be missing from the compiled kernel (such as CONFIG_IPV6_MULTIPLE_TABLES=y that was mentioned by @networkop). I am far from an expert on Linux and kernel compilation, but some IPv6 modules are enabled in the config, such as: CONFIG_IPV6=y (so I think it might actually be enabled)

I wish you guys had official docs on compiling the WSL2 kernel, as it is a really popular dev environment for local kubernetes testing/development. 🙏🏻

@HummingMind
Copy link
Author

I can include a diff of non-default flags (1.15.0-pre.2 runs fine for me) that I've got enabled but that may include some of the stuff not needed by Cilium.

I'll play around with a few things, but if I can't get it to work, I'll take you up on the offer. Thank you!

@HummingMind
Copy link
Author

HummingMind commented Nov 21, 2023

WSL2 mirrored mode doesn't work with Docker yet. 🤣
Issue here: microsoft/WSL#10494

Will try disabling IPv6 during the install, to see if that works. If it doesn't, I'll go back to messing with the WSL2 kernel. 😨

@HummingMind
Copy link
Author

Looks like IPv6 is disabled by default anyway during the Cilium installation. I tried with --set ipv6.enabled=false flag just in case, but getting the same error.

@HummingMind
Copy link
Author

HummingMind commented Nov 22, 2023

I set CONFIG_IPV6_SEG6_LWTUNNEL to CONFIG_IPV6_SEG6_LWTUNNEL=y , and the original error message is gone.
New error is:

level=info msg="Start hook executed" duration="10.68µs" function="*statedb.DB.Start" subsys=hive
level=error msg="Start hook failed" error="NewHandleAt failed: protocol not supported" function="*linux.devicesController.Start" subsys=hive
level=info msg=Stopping subsys=hive
level=info msg="Stop hook executed" duration="4.662µs" function="*statedb.DB.Stop" subsys=hive

@HummingMind
Copy link
Author

HummingMind commented Nov 22, 2023

Got it to work! I just enabled most of the IPv6 modules in the networking section.

image

I guess you can close this!

Thank you both!

@ti-mo
Copy link
Contributor

ti-mo commented Dec 13, 2023

@HummingMind Thanks for reporting back! Would you be willing to contribute a small section to our documentation with your findings? Sounds like it would help out quite a few WSL2 users!

@ti-mo ti-mo added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. good-first-issue Good starting point for new developers, which requires minimal understanding of Cilium. and removed kind/bug This is a bug in the Cilium logic. labels Dec 13, 2023
@ti-mo ti-mo changed the title cilium fails to start on WSL2: removing ipv6 proxy routing rule: address family not supported by protocol How do I get Cilium to run on WSL2? Dec 13, 2023
@HummingMind
Copy link
Author

I can certainly contribute. Not sure I am qualified though, as this was the first time that I ever had to compile a Linux kernel. 😆

How would I go about doing this? I am also a bit new to open source 😨

@ti-mo
Copy link
Contributor

ti-mo commented Dec 18, 2023

I can certainly contribute. Not sure I am qualified though, as this was the first time that I ever had to compile a Linux kernel. 😆

If you've made it through building your own kernel, you're officially qualified. 😉

How would I go about doing this? I am also a bit new to open source 😨

Not a problem! We have exhaustive documentation on the contribution process, see https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#submitting-a-pull-request. Granted, most of these things don't apply if you're just making a documentation change. Essentially, working on the docs implies editing a text file. Very straightforward! Get yourself a copy of the cilium/cilium source code and run make render-docs to get a live preview in your browser.

Now, I'm not sure what the scope of the docs should be here. We're not going to document the whole kernel build process, but maybe some general pointers would be nice. Note that we already document a set of minimal kernel configs needed for Cilium to function correctly. Maybe linking to https://wsl.dev/wslcilium could be useful to avoid documenting the whole process for now, at least until wsl2 gains better defaults or makes it easier to plug in custom kernels.

@sadath-12
Copy link

sadath-12 commented Jan 5, 2024

+1 adding this to docs would really help including me @HummingMind . Or maybe you could just add it here in the form of comment and someone else will look at it and do the contribution

@HummingMind
Copy link
Author

Ok. I'll post the instruictions here in a bit. Someone else can submit them as a PR.

@sadath-12
Copy link

works thanks 😊

@HummingMind
Copy link
Author

HummingMind commented Jan 5, 2024

Note: This was tested on the Ubuntu 22.04 (2204.3.49.0) WSL2 image from the Microsoft Store

Note: I compiled Kernel version linux-msft-wsl-5.15.137.3 (from https://github.com/microsoft/WSL2-Linux-Kernel.git). So BIG TCP support will not be available for IPv6 and IPv4.

Note: you need to have Git installed. I included it in the first step just in case, but it should already be there.

Note: I also had to install the "bc" and "dwarves" packages, otherwise the compilation was erroring out. So make sure to include them.

Note: you can use the sed command to automate the setting/updating process of the kernel configuration options. Can also use grep to check and confirm the settings. Otherwise, you can do this however you can/want.

The information and the instructions were gathered from the following sources:

https://learn.microsoft.com/en-us/community/content/wsl-user-msft-kernel-v6

https://wsl.dev/wslcilium/

https://docs.cilium.io/en/stable/operations/system_requirements/#linux-kernel (I actually followed the v1.15.0-rc.0 docs, but this link is more suitable for the future versions of Cilium)

Steps followed (inside your WSL2 Ubuntu distro):

sudo apt update && sudo apt install build-essential bc dwarves flex git bison libssl-dev libelf-dev

git clone https://github.com/microsoft/WSL2-Linux-Kernel.git

cd WSL2-Linux-Kernel

nano ./Microsoft/config-wsl

make the changes here to the kernel configuration options, save the file, and exit nano --- (see the code section with the exact kernel options at the end of this post). Once you run the make command below, you might be promted with a couple of additonal configuration questions to configure some additional kernel options. If you do, just hit enter for default choices or select whatever you might need if you know what you are doing.

make -j$(nproc) KCONFIG_CONFIG=Microsoft/config-wsl

sudo make modules_install headers_install

cp arch/x86/boot/bzImage /mnt/c/some_folder_on_your_windows_disk

sudo rm WSL2-Linux-Kernel/ -r

In Windows, create the WSL configuration file at: %USERPROFILE%.wslconfig and add the following entry and save the file:

[wsl2]
kernel=C:\\some_folder_on_your_windows_disk\\bzImage

Then do:

wsl --shutdown

You should be good to go after this. Start WSL2 again and install kind/k3d + Cilium. You can check the kernel inside WSL2 with:

uname -r

The kernel configuration options for the config-wsl file:

From the Cilium documentation, this is what is required (depending on what functioanlity you need):

CONFIG_BPF_JIT=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF=y
CONFIG_CGROUP_BPF=y
CONFIG_CGROUPS=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=m
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_HMAC=m
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_INGRESS=y
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_SET=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_PERF_EVENTS=y
CONFIG_SCHEDSTATS=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_USER=m
CONFIG_XFRM=y

PS: You should also follow the "Load the modules" instructions from https://wsl.dev/wslcilium/#load-the-modules
Those modules should be running so Cilium can "detect" them during installation. That way it can use the newer features that depend on these modules, instead of falling back to older functionality. The quick steps are:

awk '(NR>1) { print $2 }' /usr/lib/modules/$(uname -r)/modules.alias | sudo tee /etc/modules-load.d/cilium.conf

sudo nano /lib/systemd/system/systemd-modules-load.service

comment out the following lines, so they look like this:

#ConditionVirtualization=!container
#ConditionDirectoryNotEmpty=|/lib/modules-load.d
#ConditionDirectoryNotEmpty=|/usr/lib/modules-load.d
#ConditionDirectoryNotEmpty=|/usr/local/lib/modules-load.d
#ConditionDirectoryNotEmpty=|/etc/modules-load.d
#ConditionDirectoryNotEmpty=|/run/modules-load.d
#ConditionKernelCommandLine=|modules-load
#ConditionKernelCommandLine=|rd.modules-load

save the file, then:

sudo systemctl daemon-reload

sudo systemctl restart systemd-modules-load

sudo lsmod

The very last command should show the running modules, something similar to:

image

@sadath-12
Copy link

thanks going to follow and try this

@HummingMind
Copy link
Author

HummingMind commented Jan 5, 2024

This is more informational that anything else, but you can see here the desired configuration vs the default values:

image

This is for the options listed in https://docs.cilium.io/en/stable/operations/system_requirements/#linux-kernel

There are 42 options (as of Cilium 1.15.0) that need to be enabled and configured (wtih the desired values above).

Some were not set and commented out, some were missing, some just had Y instead of M (see the table for all the differences).

I use "grep" to find these lines (and determine what is missing) and "sed" to change the values or add the missing lines.

@HummingMind
Copy link
Author

HummingMind commented Jan 5, 2024

@ti-mo

Looks like with version linux-msft-wsl-5.15.137.3 of the WSL2 kernel, Microsoft enabled the missing IPv6 features. Here is the line from the changelog:

Enable IPV6 multiple routing tables (IPV6_MULTIPLE_TABLES)

So Cilium's documentation is now applicable to the latest WSL2 kernel. The modules you have listed in the docs are all that is needed (I did not have to change any additional ones). The only catch was that these lines:

CONFIG_INET{,6}_ESP=m
CONFIG_INET{,6}_IPCOMP=m
CONFIG_INET{,6}_XFRM_TUNNEL=m
CONFIG_INET{,6}_TUNNEL=m

had to be entered as:

CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m

You can close this issue. The docs don't really need to be updated (unless people want to follow the steps as I outlined them in my posts above).

@sadath-12
Copy link

@HummingMind I did all those things but still when I do uname -r it shows the old kernel name whereas in configuration I had changed the name . also to check I installed cilium and face same issue again any reason why wsl might not be picking the new compiled kernel

@HummingMind
Copy link
Author

HummingMind commented Jan 6, 2024

@sadath-12
uname -r should show 5.15.137.3-microsoft-standard-WSL2+ as your kernel name (or whatever you changed it to).

If it doesn't, that means you are not pointing at the new kernel in your wsl config.

Make sure that in Windows you create the .wslconfig file in your %USERPROFILE% location (would be something like C:\Users\yourusername). This file should have the path to your bzImage that is stored in your Windows storage somewhere:

[wsl2]
kernel=C:\\some_folder_on_your_windows_disk\\bzImage

Then, once you do wsl --shutdown, you should be able to start your WSL2 instance again and it should return the new kernel version with uname -r

@sadath-12
Copy link

sadath-12 commented Jan 7, 2024

@HummingMind yes the solution was to do it as code .wslconfig instead creating txt document it worked but when I open the wsl2 now I get The operation timed out because a response was not received from the virtual machine or container. Error code: Wsl/Service/CreateInstance/CreateVm/HCS_E_CONNECTION_TIMEOUT Press any key to continue... on the net for many people it works by wsl --shutdown does not seem to work for me . later I referred to the issue microsoft/WSL#10196 if I try something with single slash or put them under strings I revert back to old kernel

@sadath-12
Copy link

sadath-12 commented Jan 7, 2024

This is more informational that anything else, but you can see here the desired configuration vs the default values:

image

This is for the options listed in https://docs.cilium.io/en/stable/operations/system_requirements/#linux-kernel

There are 42 options (as of Cilium 1.15.0) that need to be enabled and configured (wtih the desired values above).

Some were not set and commented out, some were missing, some just had Y instead of M (see the table for all the differences).

I use "grep" to find these lines (and determine what is missing) and "sed" to change the values or add the missing lines.

I did the configs from here and then for compiling and moving the kernel I referred https://wsl.dev/wslcilium/ and kept [wsl2] kernel=C:\\wslkernel\\kernel-cilium in the .wslconfig mostly it feels that the path is not picked correctly

@sadath-12
Copy link

referring this microsoft/WSL#8793 (comment) I do also have it enabled -- C:\Windows\System32>bcdedit | find /i "hypervisor" hypervisorlaunchtype Auto

@sadath-12
Copy link

Currently I think since I used different ways to compile them it might be corrupted ill try the compilation from fresh again and try

@networkop
Copy link
Contributor

I've also found that building the loadable modules works around a few quirks when Cilium tries to run modprobe in FindOrLoadModules. Without it, FindOrLoadModules returns an error and you Cilium tries to fall back to an alternative (e.g. from BPF-based masq to IPtables).
Here are the instructions:

# set the kernel target kernel version (uname -r)
export KERNELRELEASE=5.15.90.3-microsoft-standard-WSL2+
# build the loadable modules
make KERNELRELEASE=$KERNELRELEASE -j16 KCONFIG_CONFIG=Microsoft/config-wsl modules
# copy modules to /lib/modules/
sudo make KERNELRELEASE=$KERNELRELEASE -j16 KCONFIG_CONFIG=Microsoft/config-wsl modules_install

@HummingMind
Copy link
Author

Currently I think since I used different ways to compile them it might be corrupted ill try the compilation from fresh again and try

Yeah, I would try a clean clone of the WSL2 kernel repo.

Also, I modified the last section of my original post to include the instructions on loading the modules, per @networkop recommendation (although he does it a bit differently).

@sadath-12
Copy link

Now it worked easily for me . Thank you @HummingMind for making this thing possible . Really appreciate your every guide has been worth

@HummingMind
Copy link
Author

@sadath-12 glad to hear you got it working! 🍻

@sadath-12
Copy link

@HummingMind were you able t run tetragon by building images locally ? I can't seems there is a much bigger obstacle there for wsl2 users

@ti-mo ti-mo removed their assignment Jan 11, 2024
@HummingMind
Copy link
Author

@HummingMind were you able t run tetragon by building images locally ? I can't seems there is a much bigger obstacle there for wsl2 users

I have not tried it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. good-first-issue Good starting point for new developers, which requires minimal understanding of Cilium. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

5 participants