fix: azurelinux configure GRID vGPU licensing#8106
Conversation
Set gridd config and restart its systemd service so that the daemon can acquire a license. Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Updates Linux provisioning to configure NVIDIA GRID vGPU licensing on Mariner/AzureLinux and regenerates the corresponding CustomData snapshot test fixtures used by pkg/agent.
Changes:
- Add Mariner/AzureLinux-only logic in
configGPUDrivers()to setFeatureType=1in/etc/nvidia/gridd.confand (re)enable/(re)startnvidia-griddwhenNVIDIA_GPU_DRIVER_TYPE=grid. - Regenerate multiple
pkg/agent/testdata/**/CustomDatafiles to reflect the updated provisioning payload. - No functional code changes outside the GPU provisioning path; remaining diffs are snapshot updates.
Reviewed changes
Copilot reviewed 32 out of 72 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/cse_config.sh | Adds GRID vGPU licensing configuration/restart for nvidia-gridd on Mariner/AzureLinux during GPU driver setup. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated CustomData fixture. |
| pkg/agent/testdata/MarinerV2+CustomCloud/CustomData | Regenerated CustomData fixture. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated CustomData fixture. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated CustomData fixture. |
You can also share your feedback on Copilot code review. Take the survey.
| # GRID vGPU licensing: configure and restart nvidia-gridd after device nodes exist | ||
| if [ "$NVIDIA_GPU_DRIVER_TYPE" = "grid" ]; then | ||
| sed -i -e '/^FeatureType=/d' -e '$ a FeatureType=1' /etc/nvidia/gridd.conf | ||
| systemctl enable nvidia-gridd.service |
There was a problem hiding this comment.
we have systemctlEnableAndStart which you can use
There was a problem hiding this comment.
Got it. Updated to use systemctlEnableAndStart as suggested.
There was a problem hiding this comment.
Pull request overview
This PR updates the Linux CSE GPU driver configuration to properly bring up NVIDIA GRID vGPU licensing on Azure Linux/Mariner by starting the nvidia-gridd systemd service after GPU device nodes are available, and refreshes the generated CustomData snapshots accordingly.
Changes:
- Start/enable
nvidia-griddduring GPU driver configuration for Mariner/Azure Linux whenNVIDIA_GPU_DRIVER_TYPE=grid. - Regenerate/update multiple
pkg/agent/testdata/**/CustomDatasnapshot files to reflect the script change.
Reviewed changes
Copilot reviewed 32 out of 72 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/cse_config.sh | Adds a nvidia-gridd start/enable step for GRID driver type on Mariner/Azure Linux. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updated generated CustomData snapshot (gzip/base64 payload changed). |
| pkg/agent/testdata/MarinerV2+CustomCloud/CustomData | Updated generated CustomData snapshot (gzip/base64 payload changed). |
| pkg/agent/testdata/CustomizedImage/CustomData | Updated generated CustomData snapshot (gzip/base64 payload changed). |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updated generated CustomData snapshot (gzip/base64 payload changed). |
You can also share your feedback on Copilot code review. Take the survey.
What this PR does / why we need it:
I’ve added the Azure Linux NVIDIA vGPU driver installation path to AgentBaker (#7986). The driver installs and works as expected, but vGPU licensing was not being configured. This change adds the missing setup so vGPU licensing is properly configured. Verified via extra GRID test scenario using AgentBaker E2E, similar to ScenarioUbuntu2404GRID: https://msazure.visualstudio.com/CloudNativeCompute/_build/results?buildId=157022962&view=results
Which issue(s) this PR fixes:
Fixes #