Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Optimize the containerized OVS installation #6383

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

wenyingd
Copy link
Contributor

@wenyingd wenyingd commented May 30, 2024

  1. Add logic to check the installed OVSext drivers, if only the desired version
    of driver is already installed, skip the installation; otherwise, remove the
    existing the drivers and re-install.
  2. Add logic to check the installed VC redistributable files, if the existing
    installd vc_redist version is equal to or higher than the lowest requirement,
    skip the installation; otherwise re-install the files provided in the image.
  3. Optimize the logic of maintaining the system paths by removing the duplicated
    path.
  4. Optimize Uninstall-OVS script by removing OVS bin paths from the system path
    after it is deleted.

@wenyingd wenyingd requested a review from XinShuYang May 30, 2024 10:44
@wenyingd
Copy link
Contributor Author

@XinShuYang @antoninbas I have introduced a ps module "OVSHelper.psm1" in this change, which is used to provide general functions independent from the exact versions of OVS. So I place it in the container image directly, rather than put the functions in Install-OVSDriver.ps1 packaged in the configmap. I didn't add the functions to the existing Helper.psm1, because I want to only provide the necessary function in the PS module file in container images.

I am thinking how to co-work with the existing script Install-OVS.ps1, which is trying to install OVS driver and/or userspaces on the Windows Node and register as native service. In theory, I would prefer to use the new implementations in Install-OVS.ps1, and I also hope to keep that script is independent as it used to work. I don't have a good idea to directly copy the functions in the script or import the new module. Copy functions may introduce troubles in the future code maintenance; while import-module may introduce file dependencies for the script.

}
} else {
Write-Host "OVS driver is not installed via the host-process containers, please install it manually."
Copy link
Contributor

@XinShuYang XinShuYang May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Test-Path $OVSHelperScript fails, we can't determine if the driver has been installed on the host. I think please install it manually might be misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you want?

 if(-not (Test-Path $OVSHelperScript)) {
      $driverStatus = netcftg -q ovsext
      if ($driverStatus -like "*not installed*") {
         Write-Host "OVS driver is not installed on the Windows Node, please install it manually.
      }
}

If there is no $OVSHelperScript provided in the image, we would ask the user to manually install the driver or update it but not operate from container.

@wenyingd wenyingd changed the title [Windows] Optimize the containerized OVS installation steps [Windows] Optimize the containerized OVS installation May 31, 2024
@antoninbas
Copy link
Contributor

Let's merge #6360 first, given that the mentorship program is coming to an end and I don't want to have @shikharish to resolve conflicts.

@luolanzone luolanzone added the area/OS/windows Issues or PRs related to the Windows operating system. label Jun 3, 2024
docs/windows.md Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
@luolanzone
Copy link
Contributor

Please add proper description on PR summary and commit.

@wenyingd wenyingd force-pushed the issue_6358 branch 5 times, most recently from 417e126 to 93a0b0e Compare June 6, 2024 01:35
@wenyingd wenyingd force-pushed the issue_6358 branch 3 times, most recently from ea32285 to 8e090cf Compare June 6, 2024 16:12
build/yamls/antrea-windows-with-ovs.yml Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Show resolved Hide resolved
docs/windows.md Outdated Show resolved Hide resolved
}
$OVSInstallScript = "$mountPath\k\antrea\Install-OVS.ps1"
if (Test-Path $OVSInstallScript) {
& $OVSInstallScript -InstallUserspace $false -LocalFile $mountPath\openvswitch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case user want ot bring their own signed driver, User must be able to know if driver is already installed or not or failed to install , it might not be preferable to check log for failure later.
Instead Start-Job can also be used to schedule the script in background asynchronously, while capturing the status of the background script Install-OVS.ps1 installation using wait-job as and when it returns success or error ! and then just validate driver installation success or failure here as it may allow user to know if driver is installed ? and then they might want to uninstall it and use their own signed driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it , I can try with command "Start-Job" to run the script.

hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
Comment on lines 54 to 57
if ($ovsFile.BaseName -eq "openvswitch") {
return $ovsFile.FullName
}
Log "Unsupported local directory: $LocalFile. Only directories with a base name 'openvswitch' are supported"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit surprising IMO. Why does the basename matter given that we will copy everything we need to OVSInstallDir anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use base name "openvswitch" as a check here becauseI don't have a good idea on how to check if the local OVS directory is valid or not, and both our opensource OVS files (injected in the container image) and the commercial OVS files are using this name. So I hard-code it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have changed it so that it no longer relies on the name? I think it makes more sense that way, because only the contents matter.

hack/windows/Install-OVS.ps1 Show resolved Hide resolved
@wenyingd wenyingd force-pushed the issue_6358 branch 4 times, most recently from d1b3224 to 514cf85 Compare June 13, 2024 10:17
@wenyingd
Copy link
Contributor Author

/test-windows-all

hack/windows/Install-OVS.ps1 Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Show resolved Hide resolved
# Copy $OVSDriverPath to host path "c:/openvwitch/driver", and then install ovsext.inf with the host path.
# This is a workaround for error code "0x80070003" with containerd v1.7+. The error happens when Windows
# utitilty netcfg tries to access the container's mount path "C:/hpc/".
$hostDriverPath="${OVSInstallDir}\driver"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename it to driverStagingPath. By the way I assume that we can actually delete ${OVSInstallDir}\driver after running netcfg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to try to remove the ${OVSInstallDir}\driver from the host, but I found that the ovsext driver was shown as "not installed" some time after the Pod is down. This is different from my expectation that OVS driver is supposed to be loaded after installed. So I keep the files on the disk in my final change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's interesting. I guess that is one more reason that it needs to be copied to a "host path", as it needs to be somewhat permanent.

# This is a workaround for error code "0x80070003" with containerd v1.7+. The error happens when Windows
# utitilty netcfg tries to access the container's mount path "C:/hpc/".
$hostDriverPath="${OVSInstallDir}\driver"
if ($OVSDriverPath -ne $hostDriverPath){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this test in all the copy functions.
I cannot think of any remaining legitimate case in which $OVSLocalPath and $OVSInstallDir would be the same? In order for this to happen, the user would have to invoke the script with LocalFile == OVSInstallDir, which IMO does not need to be a supported case (we could exit with an error). Maybe I am missing something though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the use scenario, sometimes we may need to replace OVS driver on the Windows host for test purpose, e.g., adding logs in the ovsext driver to perform troubleshooting sessions, at this time we can directly replace the driver files in $hostDriverPath, and then call the script.

This is not for production usage, so if you prefer we ignore it, I can remove the compare and this check in the start,

if (($LocalFile -ne "") -and ("$LocalFile" -eq "$OVSInstallDir")) {
    Log "LocalFile and OVSInstallDir are the same, exiting"
    exit 1
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But for troubleshooting wouldn't you also need to somehow overwrite the OVS driver version, or CheckAndInstallOVSDriver would return early?

If this is convenient for debugging and there is no simple alternative, then I think you should keep it, and maybe just log a warning when "$LocalFile" -eq "$OVSInstallDir".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last change, I remove the path compare on driver path and usr/bin as you suggested, and add a precheck at the start to reject the input if $LocalFile equals to $OVSInstallDir. For the local troubleshooting (not production env), I can call UnInstall-OVS.ps1 then call Install-OVS.ps1 as an alternative. Please help review the patch again. @antoninbas

hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
$hostUsrBinPath = "C:\openvswitch\usr\bin"
$usrBinPath="${OVSInstallPath}\usr\bin"
if ("$usrBinPath" -ne "$hostUsrBinPath") {
if (-not $global:updateOVS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't really sound like an issue for an initContainer?

@@ -208,7 +208,7 @@ your own OVS package with a signed OVS kernel driver, you would run:

```powershell
curl.exe -LO https://raw.githubusercontent.com/antrea-io/antrea/main/hack/windows/Install-OVS.ps1
.\Install-OVS.ps1 -ImportCertificate $false -Local -LocalFile <PathToOVSPackage>
.\Install-OVS.ps1 -InstallUserspace $true -LocalFile <PathToOVSPackage>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since InstallUserspace now is true by default, I think we don't need to specify it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we update the guidance of preparing containerized OVS in line 196-204?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to highlight "-InstallUserspace $true" with the context expecting to install Windows native Services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to highlight "-InstallUserspace $true" with the context expecting to install Windows native Services.

Got it, I am ok with it.

Also, should we update the guidance of preparing containerized OVS in line 196-204?

@wenyingd Any follow-ups regarding this question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, remove the original descriptions and example.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thanks for making all these changes

I know that as part of CI, we test the containerized case. Is there any validation as part of CI for the "service" case (.\Install-OVS.ps1 -InstallUserspace $true run directly on the host), or alternatively can you confirm that you tested the most recent version on the script manually?

hack/windows/Install-OVS.ps1 Outdated Show resolved Hide resolved
if ($InstallUserspace -eq $true) {
$OVSBinPaths="${OVSBinPaths};${OVSLocalPath}\usr\sbin"
}
InstallOpenSSLFiles "$OVSBinPaths"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that we copy the SSL files to $OVSBinPaths ("${OVSLocalPath}\usr\bin"), and then we rely on CopyOVSUtilities to copy them to the OVS install directory? Was that intentional? I guess it will work but shouldn't InstallOpenSSLFiles copy the files to their "final" installation location directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current solution is on desired that copy the ssl files to the LocalFile first, then copy the final files to the install dir. This is because the ssl dll files are required by both usr/bin and usr/sbin, and the two directories are copied in different step: usr/bin is copied in step CopyOVSUtilities, and usr/sbin is copied in step InstallOVSServices. If we plan to copy ssl dll files directly to the final path, the coy operations are distributed into different steps, and we also need to record the path where the dll files are extracted until the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good

@wenyingd
Copy link
Contributor Author

I have manually tested the script on my local testbed with both cases to install userspace services and not install services.

@XinShuYang
Copy link
Contributor

XinShuYang commented Jun 27, 2024

I know that as part of CI, we test the containerized case. Is there any validation as part of CI for the "service" case (.\Install-OVS.ps1 -InstallUserspace $true run directly on the host), or alternatively can you confirm that you tested the most recent version on the script manually?

I have manually tested the script on my local testbed with both cases to install userspace services and not install services.

@wenyingd For the OVS host service test, We have jenkins-windows-ovs in CI to verify related code changes. In the run_install_windows_ovs function, it tries to run Install-OVS.ps1 and perform a health check for OVS. With the current code changes in this PR, do you think we need to update or add some checks in this job?

@wenyingd wenyingd force-pushed the issue_6358 branch 2 times, most recently from 66be333 to 4647e64 Compare June 27, 2024 07:45
@wenyingd wenyingd force-pushed the issue_6358 branch 4 times, most recently from 2e86758 to a5e9d15 Compare June 27, 2024 11:54
antoninbas
antoninbas previously approved these changes Jun 27, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-windows-all

}
Write-Host "Completed OVS installation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

XinShuYang
XinShuYang previously approved these changes Jun 28, 2024
Copy link
Contributor

@XinShuYang XinShuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only have some comments about logs.

1. Add logic to check the installed OVSext drivers, if only the desired version
   of driver is already installed, skip the installation; otherwise, remove the
   existing the drivers and re-install.
2. Add logic to check the installed VC redistributable files, if the existing
   installd vc_redist version is equal to or higher than the lowest requirement,
   skip the installation; otherwise re-install with the provided files.
3. Optimize the logic of maintaining the env paths by removing the duplicated
   paths.
4. Optimize Uninstall-OVS script by removing OVS bin paths from the system path
   after it is deleted.

Signed-off-by: Wenying Dong <wenyingd@vmware.com>
Copy link
Contributor

@XinShuYang XinShuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, since the latest change is only about log updates, we don't need to rerun windows jenkins jobs.

@antoninbas
Copy link
Contributor

/test-windows-all

@antoninbas antoninbas merged commit fc40157 into antrea-io:main Jun 28, 2024
52 of 60 checks passed
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/OS/windows Issues or PRs related to the Windows operating system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants