chore: update nvidia device plugin #3225

sozercan · 2020-05-11T20:01:04Z

Reason for Change:

updates nvidia-device-plugin version and moves the image to mcr

Note for versioning change from nvidia's side: https://github.com/NVIDIA/k8s-device-plugin#versioning

Issue Fixed:

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Notes:

sozercan · 2020-05-11T20:05:10Z

/assign @jackfrancis

codecov · 2020-05-11T20:15:59Z

Codecov Report

Merging #3225 into master will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3225      +/-   ##
==========================================
- Coverage   71.45%   71.43%   -0.03%     
==========================================
  Files         147      147              
  Lines       25643    25653      +10     
==========================================
  Hits        18324    18324              
- Misses       6177     6187      +10     
  Partials     1142     1142

Impacted Files	Coverage Δ
pkg/api/k8s_versions.go	`100.00% <100.00%> (ø)`
cmd/get_logs.go	`17.27% <0.00%> (-0.83%)`	⬇️
pkg/engine/templates_generated.go	`39.64% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78e3b18...54d36c4. Read the comment docs.

mboersma · 2020-05-11T22:31:57Z

1.0.0-beta.6 > 1.11 is confusing, but documented as you pointed out.

vhd/packer/install-dependencies.sh

sozercan · 2020-05-11T22:42:28Z

@mboersma yea nvidia also deprecated nvidia-docker2 but it's still required for k8s 😕 https://github.com/NVIDIA/nvidia-docker#upgrading-with-nvidia-docker2-deprecated

mboersma · 2020-05-12T15:50:24Z

I ran the cuda-vector-add test against a new Standard_NC12 1.19.0-alpha.3 cluster off this branch, and it passed:

% k logs -f cuda-vector-add    
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

The nodes report nvidia.com/gpu: "2" and cpu: "12" and that the nvidia-device-plugin pods are running

    Image:          mcr.microsoft.com/oss/nvidia/k8s-device-plugin:1.0.0-beta6

mboersma

/lgtm

acs-bot · 2020-05-12T15:54:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mboersma, sozercan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

update nvidia device plugin

97f9b85

acs-bot added the size/XS label May 11, 2020

acs-bot assigned jackfrancis May 11, 2020

acs-bot added size/S and removed size/XS labels May 11, 2020

update packer and 1.10 version

4759fd1

sozercan force-pushed the nvidia-device-plugin-update branch from b5e9df9 to 4759fd1 Compare May 11, 2020 21:28

mboersma added the gpu GPU-related issues and fixes label May 11, 2020

sozercan commented May 11, 2020

View reviewed changes

vhd/packer/install-dependencies.sh Outdated Show resolved Hide resolved

remove 1.11 from packer

54d36c4

mboersma approved these changes May 12, 2020

View reviewed changes

acs-bot assigned mboersma May 12, 2020

acs-bot added the lgtm label May 12, 2020

mboersma merged commit 6272a70 into Azure:master May 12, 2020

acs-bot added the approved label May 12, 2020

sozercan deleted the nvidia-device-plugin-update branch May 12, 2020 16:43

fmotrifork mentioned this pull request May 26, 2020

aks-engine 0.51.0 fishworks/fish-food#748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: update nvidia device plugin #3225

chore: update nvidia device plugin #3225

sozercan commented May 11, 2020 •

edited

sozercan commented May 11, 2020

codecov bot commented May 11, 2020 •

edited

mboersma commented May 11, 2020

sozercan commented May 11, 2020

mboersma commented May 12, 2020 •

edited

mboersma left a comment

acs-bot commented May 12, 2020

chore: update nvidia device plugin #3225

chore: update nvidia device plugin #3225

Conversation

sozercan commented May 11, 2020 • edited

sozercan commented May 11, 2020

codecov bot commented May 11, 2020 • edited

Codecov Report

mboersma commented May 11, 2020

sozercan commented May 11, 2020

mboersma commented May 12, 2020 • edited

mboersma left a comment

Choose a reason for hiding this comment

acs-bot commented May 12, 2020

sozercan commented May 11, 2020 •

edited

codecov bot commented May 11, 2020 •

edited

mboersma commented May 12, 2020 •

edited