Skip to content

Conversation

@lstocchi
Copy link
Contributor

@lstocchi lstocchi commented May 26, 2025

this patch changes how the detection of wsl works. The old way of using wsl --status command output to detect some missing features required by WSL is not fully reliable. WSL checks if the wsl feature is enabled and if the vmcompute service do exist. However, this is not enough to identify if the virtual machine platform feature is enabled. The vmcompute service could exist because it has been installed by other tools or it could exist but being stopped.

The way proposed by this patch is to try execute the import command and, if it fails, check the error and if it is related to the Host Compute Service try to install all features required by WSL.

The flow is the same as before, the user is asked to execute the podman machine init command with elevated privileges. Eventually, after enabling WSL and VMP features, the user is asked to reboot the machine.

When the machine restarts, the powershell gets invoked again and execute the command init.

The code also fixes some issues that could cause misbehaviors when invoking recursively the elevated shell, like an unreleased lock, or a missing file.

it fixes #25523

Fix the check and the automatic installation of WSL2 on Windows when the command `podman machine init` is executed.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels May 26, 2025
l0rd

This comment was marked as duplicate.

Copy link
Member

@l0rd l0rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.\winmake lint is failing. It's not related to this PR but you need to rebase to current main branch to make it work.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on any windows/wsl specifics but the locking doe snot seem right

Comment on lines 110 to 112
// avoid using defer as the command could be re-executed in elevated mode (on WSL)
// and it could get stuck because of the lock
machineLock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is missing an unlock call on the error condition above now.
And it is not clear at all that this is correct, you have just removed a major critical section and I See no reason why things like CreateVM() should not be locked, calls like mc.Write() 100% must be locked and most others.
Unfortunately the locking mode of podman machine is competently undocumented so it is hard to tell what should be locked and what can be done without holding the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing to it. At first i thought it was superfluous as the machine does not exist and there was no risk other operations like start/stop/rm would try to access the same resources... but, effectively, it could happen a double init operation on the same machine or something else i didn't think at.

I tried to refactor it by not revolutionizing the code.
It works like before, except that when init tries to invoke another init in elevated mode, it unlocks sp that the child is free to perform a complete init workflow.

Comment on lines +270 to +278
err = mp.CreateVM(createOpts, mc, &ignBuilder)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason of why this was moved. IF the order is important then it needs a big comment explaining why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Basically i want to avoid creating any resources before the CreateVM that could be responsible to invoke a new init that will perform the setup. So i switched the order with the AddSSHConnectionsToPodmanSocket

}

return os.Truncate(name, 0)
_, err = os.Create(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function is called truncate but now it no longer truncates which is confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed 👍

@l0rd
Copy link
Member

l0rd commented May 26, 2025

@lstocchi, thank you for this PR. The approach is the right one. I hope you find a way to release the machine lock later in the process (i.e., before restarting in elevated mode), or we should consider another way to check for WSL before the machine lock is acquired. It's also important to find out the different error codes that can be returned when the features are missing (see this comment too), but this is probably less critical.

Adding an end to end test for the WSL installation, that runs in our CI, is complicated. I don't think it's worth it.

@lstocchi lstocchi force-pushed the wsl branch 3 times, most recently from ef0bffc to 17bc0ee Compare May 27, 2025 10:27
@packit-as-a-service
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@lstocchi
Copy link
Contributor Author

@lstocchi, thank you for this PR. The approach is the right one. I hope you find a way to release the machine lock later in the process (i.e., before restarting in elevated mode), or we should consider another way to check for WSL before the machine lock is acquired.

it's quite a mess with the current workflow. I tried not to change the code too much so if you have any suggestions, feel free to let me know.

It's also important to find out the different error codes that can be returned when the features are missing (see this comment too), but this is probably less critical.

Yes i agree, if we find an agreement on this PR, we could always enhance the list of errors related to WSL features later on when users face them. BTW I'll give a look at the WSL source code to see if i can find out some other known errors

@l0rd l0rd added the No New Tests Allow PR to proceed without adding regression tests label May 27, 2025
@l0rd
Copy link
Member

l0rd commented May 27, 2025

I tried again to run machine init on my laptop with the Virtual Machine Platform disabled, and it behaved as expected: WSL got installed, etc. I will continue the review tomorrow.

@lstocchi lstocchi marked this pull request as ready for review May 28, 2025 15:38
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025
Copy link
Member

@l0rd l0rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some new comments. Other suggestions:

  1. avoiding pulling the WSL machine OS image at every invocation of machine init (3 times) but only the first time
  2. better rendering of the progress bars when the Windows features are installed (I am not sure if we can do much)
  3. provide a message to the user mentioning that podman won't be available until Windows is rebooted (instead of showing an error). This is particularly important if the user accepts to install WSL but refuse to boot the machine imediately.

None of these are blocking from my point view for this PR (automatically WSL installation is fixed) but I would apreciate if you could work at least on avoiding returning an error before the reboot.

@lstocchi
Copy link
Contributor Author

lstocchi commented Jun 3, 2025

I have added some new comments. Other suggestions:

  1. avoiding pulling the WSL machine OS image at every invocation of machine init (3 times) but only the first time
  2. better rendering of the progress bars when the Windows features are installed (I am not sure if we can do much)
  3. provide a message to the user mentioning that podman won't be available until Windows is rebooted (instead of showing an error). This is particularly important if the user accepts to install WSL but refuse to boot the machine imediately.

None of these are blocking from my point view for this PR (automatically WSL installation is fixed) but I would apreciate if you could work at least on avoiding returning an error before the reboot.

I would open two separate issues for the first 2 points.
Mainly the first one is quite intrusive. I guess it would change the workflow for other OSes as well

@lstocchi lstocchi requested a review from l0rd June 3, 2025 10:33
this patch changes how the detection of wsl works.
The old way of using wsl --status command output to detect some missing features required by WSL is not fully reliable.
WSL checks if the wsl feature is enabled and if the vmcompute service do exist. However, this is not enough to identify if the virtual machine platform feature is enabled. The vmcompute service could exist because it has been installed by other tools or it could exist but being stopped.

The way proposed by this patch is to try execute the import command and,
if it fails, check the error and if it is related to the Host Compute
Service try to install all features required by WSL.

The flow is the same as before, the user is asked to execute the podman
machine init command with elevated privileges. Eventually, after
enabling WSL and VMP features, the user is asked to reboot the machine.

When the machine restarts, the powershell gets invoked again and execute
the command init.

The code also fixes some issues that could cause misbehaviors when
invoking recursively the elevated shell, like an unreleased lock, or a
missing file.

Signed-off-by: lstocchi <lstocchi@redhat.com>
@l0rd
Copy link
Member

l0rd commented Jun 5, 2025

/lgtm
/hold

@Luap99 @baude PTAL

@lstocchi you should add a release notes section in the description. Something like:

```release-note
Fix the check and the automatic installation of WSL2 on Windows when the command `podman machine init` is executed.
```

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2025
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Jun 5, 2025
@l0rd
Copy link
Member

l0rd commented Jun 13, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: l0rd, lstocchi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 60859b0 into containers:main Jun 13, 2025
82 of 83 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 12, 2025
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine No New Tests Allow PR to proceed without adding regression tests release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic installation of WSL fails

3 participants