Skip to content

Conversation

@l0rd
Copy link
Member

@l0rd l0rd commented Mar 10, 2025

The output of wsl --status has changed with newer versions of Windows and WSL. It's output can still be parsed to figure out if the WSL cli is installed and if the WSL feature is enabled but the strings to look for have changed.

The checks have been fixed, some unit tests added and the unit tests are now part of the (cirrus) CI checks on Windows too.

Fixes #25234

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 10, 2025
@mheon
Copy link
Member

mheon commented Mar 10, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

@baude
Copy link
Member

baude commented Mar 10, 2025

@l0rd is it possible to begin to migrate support to only wsl2 ?

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

I hate that too. We introduced that a couple of years ago, and in the meantime, the output message of wsl.exe --status has (of course) changed, and our fragile output parsing got broken. The alternative is using a command that requires admin privileges (dism). Still, we don't want to require that for machine init, so my approach has been to fix the current issue by testing wsl --status with as many versions of the WSL cli as possible and adding the corresponding unit tests.

In the future, we may want to investigate how wsl.exe itself retrieves information about Windows features without requiring admin privileges. But even if we find the mechanism used by wsl.exe there may be no guarantee that it will work with future versions of Windows because it may not be a "public" API (dism is the tool to check Windows features).

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

@l0rd is it possible to begin to migrate support to only wsl2 ?

That's already the case: if the WSL CLI is configured to use version 1 (that means that it uses the underlying WSL Windows feature rather than the Virtual Machine Platform) we try to change it to use version 2. And, if it that fails, machine init fails. This behavior has probably been introduced with the release of v5.0.

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

cc @BlackHole1

winmake.ps1 Outdated
function Local-Unit {
Build-Ginkgo
$skippackages="pkg\machine\apple,pkg\machine\applehv,pkg\machine\e2e,pkg\machine\libkrun,pkg\machine\provider,pkg\machine\proxyenv,pkg\machine\qemu"
Run-Command "./test/tools/build/ginkgo.exe -vv -r --tags `"$remotetags`" -timeout=90m --trace --no-color --skip-package `"$skippackages`" pkg\machine"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we run all unit tests on windows not just pkg/machine? unit tests could be added anywhere.
point in case #25323

If these issue is that these do not pass then we need to do the hard work and have !windows build tags for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. I have changed that to run the tests for every package.

Packages with compilation errors are skipped and failing tests (3 files) have now either !windows tag or if runtime.GOOS == "windows" t.Skip().

Comment on lines 36 to 38
Write-Host "`nRunning podman-machine unit tests"
Run-Command ".\winmake localunit"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not done like this.

Merging both tests will make it hard to figure out what failed. But most importantly this breaks all our CI assumptions that we have in our skip tests based on sources. machine tests will only run on machine changes, unit tests might be elsewhere, I do see that this is not the case right now but the next person will overlook this when they enable windows unit tests outside of pkg/machine.

This should really be its own top level task I would think like the existing unit test. That also means we do run them twice for WSL and hyperV, I guess right now they are fast enough so that this doesn't matter much but still.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I have added a new top level cirrus task unit podman windows rootless host sqlite.

Comment on lines +25 to +27
wslNotInstalledMessages = []string{"kernel file is not found", "The Windows Subsystem for Linux is not installed"}
vmpDisabledMessages = []string{"enable the Virtual Machine Platform Windows feature", "Enable \"Virtual Machine Platform\""}
wslDisabledMessages = []string{"enable the \"Windows Subsystem for Linux\" optional component"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not apply to non-English speaking countries. We found that when the system language is Chinese, the output of WSL is also in Chinese.

We can only judge through specific strings, so the following sentences cannot work properly in non-English language systems:

  • kernel file is not found
  • The Windows Subsystem for Linux is not installed
  • enable the Virtual Machine Platform Windows feature
  • Enable "Virtual Machine Platform"
  • enable the "Windows Subsystem for Linux" optional component

In certain WSL versions in the Chinese system, Windows Subsystem for Linux will be Linux 的 Windows 子系统, Virtual Machine Platform will be 虚拟机平台

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback @BlackHole1, you are right. I am afraid that Chinese is only the tip of the iceberg, there are 43 language packs for latest Windows. What's your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our code, we make judgments using the following keywords:

keywords := []string{"Windows Subsystem for Linux", "BIOS", "wsl.exe", "enablevirtualization", "WSL1"}

In our scenario, we require both Windows Subsystem for Linux and Virtual Machine Platform to be enabled at the same time, as this helps us avoid dealing with some edge cases.

For more information, see: https://github.com/oomol-lab/ovm-win/blob/0bf791ef3ff41a2800c47b822b07957f1cc6ffc0/pkg/wsl/check.go#L244-L294


Init System

  1. No enable Features
  2. No Update WSL
PS C:\Users\live> wsl --help

版权所有 (c) Microsoft Corporation。保留所有权利。

用法: wsl.exe [参数]

参数:

   --install
       安装适用于 Linux 的 Windows 子系统。如果未指定任何选项,
       建议的功能将与默认分发版一起安装。

       有关安装选项的完整列表,请访问 https://aka.ms/wslinstall。

   --update
       更新到最新版本的适用于 Linux 的 Windows 子系统。

   --status
       显示适用于 Linux 的 Windows 子系统状态。

   --help
       显示使用情况信息。

PS C:\Users\live> wsl --status
未安装适用于 Linux 的 Windows 子系统。可通过运行 “wsl.exe --install” 进行安装。
有关详细信息,请访问 https://aka.ms/wslinstall

NOTE!!!!
wsl --status exit code is not 0!

Only Updated WSL

wsl --help has too much output, so ignore it.

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL1。
若要使用 WSL1,请启用“Windows Subsystem for Linux”可选组件。
当前计算机配置不支持 WSL2。
请启用“虚拟机平台”可选组件,并确保在 BIOS 中启用虚拟化。
通过运行以下命令启用“虚拟机平台”: wsl.exe --install --no-distribution
有关信息,请访问 https://aka.ms/enablevirtualization

NOTE!!!!
wsl --status exit code is 0!

Updated WSL + Only Enable Windows Subsystem for Linux

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL2。
请启用“虚拟机平台”可选组件,并确保在 BIOS 中启用虚拟化。
通过运行以下命令启用“虚拟机平台”: wsl.exe --install --no-distribution
有关信息,请访问 https://aka.ms/enablevirtualization

Updated WSL + Only Enable Virtual Machine Platform

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL1。
若要使用 WSL1,请启用“Windows Subsystem for Linux”可选组件。

Updated WSL + Enable All

PS C:\Users\live> wsl --status
默认版本: 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to differences between different WSL versions, my suggestion is to first check the user's WSL version. If it is below a certain version, upgrade it to the latest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlackHole1, thank you for your suggestions. However, I am hesitant to make the check too accurate as it augments the risk of false positives (mistakenly assuming that WSL is not installed). A false positive makes podman unusable, as podman init fails, and users can't turn off the WSL check. Given the fragility of the output parsing, we should stay on the safe side and accept a high rate of false negatives (mistakenly assuming that WSL is installed) to avoid any false positives.

Also, we should avoid preemptively updating WSL, even when it's unnecessary. Some company policies may block updating to the latest WSL version, and Podman trying to update at every machine init would be problematic. We do try to update, but only if we find WSL v1.

That said, the problem is far from solved. As you found out, when Windows is configured in a language other than English, we assume that WSL is installed. And, as you suggested, we may need to check the BIOS virtualization support. But considered the risks, a safer approach would be to accept that we may fail to find out that WSL is outdated or not present and, instead, improve the error messages shown when the VM installation fails with a clear explanation of the error and a suggestion on how to move forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a safer approach would be to accept that we may fail to find out that WSL is outdated or not present and, instead, improve the error messages shown when the VM installation fails with a clear explanation of the error and a suggestion on how to move forward.

I agree with this point. Letting users install and update WSL themselves might be a better approach.

@BlackHole1
Copy link
Contributor

BlackHole1 commented Mar 12, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

@mheon In fact, there is no way without applying for administrator privileges. Docker for Windows seems to register a service in Windows to avoid the need for administrator privileges, but I'm not sure.

Under non-administrator privileges, there are only two methods:

  1. String parsing
  2. Calling Windows’s private API

is it possible to begin to migrate support to only wsl2 ?

@baude My suggestion is to not support WSL1, but to only support WSL2.

@BlackHole1
Copy link
Contributor

There is still a problem:

  • When the BIOS does not have virtualization support enabled, podman machine init will still fail without notifying the user.

Maybe this can be achieved in another PR.

@packit-as-a-service
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

Fixes containers#25234

Signed-off-by: Mario Loriedo <mario.loriedo@gmail.com>
@l0rd l0rd force-pushed the fix-wsl-check branch 4 times, most recently from 05d259d to 874f52f Compare March 13, 2025 11:35
.cirrus.yml Outdated


unit_test_windows_task:
name: &std_name_fmt "$TEST_FLAVOR $PODBIN_NAME $DISTRO_NV $PRIV_NAME $TEST_ENVIRON ${CI_DESIRED_DATABASE}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is rather confusing as most of the options have nothing to do with unit tests it should follow the existing unit test name.
I suggest you name this "Unit tests on Windows"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed 👍

winmake.ps1 Outdated
$skippackages="hack,internal\domain\infra\abi,internal\domain\infra\tunnel,libpod\lock\shm,pkg\api\handlers\libpod,pkg\api\handlers\utils,pkg\bindings,"
$skippackages+="pkg\domain\infra\abi,pkg\emulation,pkg\machine\apple,pkg\machine\applehv,pkg\machine\e2e,pkg\machine\libkrun,"
$skippackages+="pkg\machine\provider,pkg\machine\proxyenv,pkg\machine\qemu,pkg\specgen\generate,pkg\systemd,test\e2e,test\utils"
Run-Command "./test/tools/build/ginkgo.exe -vv -r --tags `"$remotetags`" -timeout=90m --trace --no-color --skip-package `"$skippackages`""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to have two dashes for timeout?
Also 90m is to high for unit tests, the default task timeout is 20m so it should be lower than that. Or you just skip setting a timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have fixed it and set the timeout to 15 minutes. Previous run took around 7 min.

Add a new target in winmake.ps1 to run unit tests and use
use it in a new cirrus task.

Fix machine_windows_test.go to make it work in CI machine.

Add the `!windows` tag on tests files that fail on Windows.

Signed-off-by: Mario Loriedo <mario.loriedo@gmail.com>
@l0rd
Copy link
Member Author

l0rd commented Mar 20, 2025

@mheon @baude @Luap99 tests are green, and we should decide what to do with this PR. There are two commits:

  1. Enable running unit tests on Windows: it's a step forward and should be merged.
  2. The second commit detects if WSL is installed or not. It's a fix of an existing check that used to work in v4.x. But it's fragile as it works only when the OS language is English, and its usefulness is affected by this issue (we detect that WSL isn't installed, but then we fail to install it).

I can update this PR and remove the second commit. But if we want to re-enable the automatic installation of WSL, we should consider merging both commits and plan to work on #25523. We removed the automatic installation of WSL in the Windows installer. Still, it may make sense to fix it in machine init as that's easier for the team to maintain (and consider doing the same for Hyper-V).

@mheon
Copy link
Member

mheon commented Mar 20, 2025

I'm in favor of a merge as-is. The solution is imperfect, but automatic installing of WSL if missing is definitely something I want to keep in the installer - so steps to restore it should be taken (even if they are incomplete)

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: l0rd, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@baude
Copy link
Member

baude commented Mar 24, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 7f11ce8 into containers:main Mar 24, 2025
87 of 88 checks passed
Comment on lines -222 to +230
Push-Location ./test/tools
Run-Command "go build -o build/ginkgo.exe ./vendor/github.com/onsi/ginkgo/v2/ginkgo"
Pop-Location
Run-Command "go build -o ./test/tools/build/ginkgo.exe ./vendor/github.com/onsi/ginkgo/v2/ginkgo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@l0rd with this change, ginkgo is being built from the main repo vendor, while it was using test/tools repo vendor before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did that to avoid a "versions don't match" warning at every execution. But that may have been a mistake because I don't know why have ginkgo declared twice (and with two non matching versions). I am happy to fix it if there is a better way to solve the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. This is being fixed in #25732.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jun 30, 2025
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command machine init fails to check if WSL is installed

6 participants