-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix wsl install workflow on machine init command #26201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
l0rd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.\winmake lint is failing. It's not related to this PR but you need to rebase to current main branch to make it work.
Luap99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot comment on any windows/wsl specifics but the locking doe snot seem right
pkg/machine/shim/host.go
Outdated
| // avoid using defer as the command could be re-executed in elevated mode (on WSL) | ||
| // and it could get stuck because of the lock | ||
| machineLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is missing an unlock call on the error condition above now.
And it is not clear at all that this is correct, you have just removed a major critical section and I See no reason why things like CreateVM() should not be locked, calls like mc.Write() 100% must be locked and most others.
Unfortunately the locking mode of podman machine is competently undocumented so it is hard to tell what should be locked and what can be done without holding the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing to it. At first i thought it was superfluous as the machine does not exist and there was no risk other operations like start/stop/rm would try to access the same resources... but, effectively, it could happen a double init operation on the same machine or something else i didn't think at.
I tried to refactor it by not revolutionizing the code.
It works like before, except that when init tries to invoke another init in elevated mode, it unlocks sp that the child is free to perform a complete init workflow.
| err = mp.CreateVM(createOpts, mc, &ignBuilder) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason of why this was moved. IF the order is important then it needs a big comment explaining why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Basically i want to avoid creating any resources before the CreateVM that could be responsible to invoke a new init that will perform the setup. So i switched the order with the AddSSHConnectionsToPodmanSocket
| } | ||
|
|
||
| return os.Truncate(name, 0) | ||
| _, err = os.Create(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function is called truncate but now it no longer truncates which is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed 👍
|
@lstocchi, thank you for this PR. The approach is the right one. I hope you find a way to release the machine lock later in the process (i.e., before restarting in elevated mode), or we should consider another way to check for WSL before the machine lock is acquired. It's also important to find out the different error codes that can be returned when the features are missing (see this comment too), but this is probably less critical. Adding an end to end test for the WSL installation, that runs in our CI, is complicated. I don't think it's worth it. |
ef0bffc to
17bc0ee
Compare
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
it's quite a mess with the current workflow. I tried not to change the code too much so if you have any suggestions, feel free to let me know.
Yes i agree, if we find an agreement on this PR, we could always enhance the list of errors related to WSL features later on when users face them. BTW I'll give a look at the WSL source code to see if i can find out some other known errors |
|
I tried again to run |
l0rd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some new comments. Other suggestions:
- avoiding pulling the WSL machine OS image at every invocation of
machine init(3 times) but only the first time - better rendering of the progress bars when the Windows features are installed (I am not sure if we can do much)
- provide a message to the user mentioning that podman won't be available until Windows is rebooted (instead of showing an error). This is particularly important if the user accepts to install WSL but refuse to boot the machine imediately.
None of these are blocking from my point view for this PR (automatically WSL installation is fixed) but I would apreciate if you could work at least on avoiding returning an error before the reboot.
I would open two separate issues for the first 2 points. |
this patch changes how the detection of wsl works. The old way of using wsl --status command output to detect some missing features required by WSL is not fully reliable. WSL checks if the wsl feature is enabled and if the vmcompute service do exist. However, this is not enough to identify if the virtual machine platform feature is enabled. The vmcompute service could exist because it has been installed by other tools or it could exist but being stopped. The way proposed by this patch is to try execute the import command and, if it fails, check the error and if it is related to the Host Compute Service try to install all features required by WSL. The flow is the same as before, the user is asked to execute the podman machine init command with elevated privileges. Eventually, after enabling WSL and VMP features, the user is asked to reboot the machine. When the machine restarts, the powershell gets invoked again and execute the command init. The code also fixes some issues that could cause misbehaviors when invoking recursively the elevated shell, like an unreleased lock, or a missing file. Signed-off-by: lstocchi <lstocchi@redhat.com>
|
/hold cancel |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: l0rd, lstocchi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
60859b0
into
containers:main
this patch changes how the detection of wsl works. The old way of using wsl --status command output to detect some missing features required by WSL is not fully reliable. WSL checks if the wsl feature is enabled and if the vmcompute service do exist. However, this is not enough to identify if the virtual machine platform feature is enabled. The vmcompute service could exist because it has been installed by other tools or it could exist but being stopped.
The way proposed by this patch is to try execute the import command and, if it fails, check the error and if it is related to the Host Compute Service try to install all features required by WSL.
The flow is the same as before, the user is asked to execute the podman machine init command with elevated privileges. Eventually, after enabling WSL and VMP features, the user is asked to reboot the machine.
When the machine restarts, the powershell gets invoked again and execute the command init.
The code also fixes some issues that could cause misbehaviors when invoking recursively the elevated shell, like an unreleased lock, or a missing file.
it fixes #25523