Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: nil pointer dereference around getDevices:d.CPUAffinity #141

Closed
wants to merge 1 commit into from
Closed

fix: nil pointer dereference around getDevices:d.CPUAffinity #141

wants to merge 1 commit into from

Conversation

rockrush
Copy link
Contributor

@rockrush rockrush commented Oct 9, 2019

As mentioned in issue:140 by @jucrouzet and @RenaudWasTaken , when CPUAffinity is not set, *(d.CPUAffinity) simply fails, and cause the whole process to exit.

@rockrush rockrush changed the title fix: nil pointer dereference around getDevices:d.CPUAffinity #140 fix: nil pointer dereference around getDevices:d.CPUAffinity Oct 9, 2019
@@ -40,13 +40,18 @@ func getDevices() []*pluginapi.Device {
for i := uint(0); i < n; i++ {
d, err := nvml.NewDeviceLite(i)
check(err)
var cpu_affinity int64 = 0
if d.CPUAffinity != nil {
cpu_affinity = int64(*(d.CPUAffinity))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If d.CPUAffinity is not set, then we shouldn’t force it to 0. Instead, we should just not set the Topology field of the device at all.

When is it the case that d.CPUAffinity is not set, such that this occurs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska So should NewDeviceLite() be replaced by NewDevice(),which always sets CPUAffinity in numaNode() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be more complicated than that, because NewDevice() currently always sets CPUAffinity to "something" (e.g. 0) even if the underlying host doesn't have any NUMA info for it (i.e. the system returns -1). I think we will need to change NVML to change the way it reports things, in addition to changing the plugin. I plan to look into this later today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this merges https://gitlab.com/nvidia/container-toolkit/gpu-monitoring-tools/merge_requests/4, we can update the vendoring for the NVML bindings and update this patch to selectively set Topology based on if d.CPUAffinity is nil or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That change has merged. You will need to update the vendoring to pull it in and then update the logic here to only set Topology if d.CPUAffinity != nil.

@RenaudWasTaken
Copy link
Contributor

Hello!

Can you move your PR to : https://gitlab.com/nvidia/kubernetes/device-plugin
The code is now hosted there and mirrored to github!

Also it looks like you haven't signed-off your commits, do you mind taking care of it?
https://stackoverflow.com/questions/13043357/git-sign-off-previous-commits

Thanks a lot!

@RenaudWasTaken
Copy link
Contributor

Merged :)

@klueska
Copy link
Contributor

klueska commented Oct 12, 2019

I hope it didn’t merge without my suggested changes. Where can I see the final PR?

@RenaudWasTaken
Copy link
Contributor

@klueska I believe your changes were taken into account, the MR was opened on gitlab here:
https://gitlab.com/nvidia/kubernetes/device-plugin/merge_requests/12/

@klueska
Copy link
Contributor

klueska commented Oct 14, 2019

Yep. Looks good. Thanks!

@klueska
Copy link
Contributor

klueska commented Oct 15, 2019

Fix released: https://github.com/NVIDIA/k8s-device-plugin/releases/tag/1.0.0-beta4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants