Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dev nvidia with nv #1358

Merged
merged 7 commits into from Mar 27, 2018
Merged

Add dev nvidia with nv #1358

merged 7 commits into from Mar 27, 2018

Conversation

GodloveD
Copy link
Collaborator

@GodloveD GodloveD commented Mar 4, 2018

Description of the Pull Request (PR):

Several users noted that the --nv option does not work in conjunction with the --contain option. This is because /dev/nvidia* devices are not bind mounted into the container if --contain is used. This PR fixes this issue by bind mounting /dev/nvidia* if the --nv option is passed along with the --contain option.

I'm not sure why the commit history includes all of the other commits. It could be because that history was inadvertently lost in the release-2.4 branch when bug fixes were backported by cherry-picking merges. If that is the case it will actually be a good thing to add this history back into the release-2.4 branch where it belongs.

Note that this PR is slated for 2.4.4. So it should not be merged into release-2.4 until the 2.4.3 release has been tagged.

This fixes or addresses the following GitHub issues:

Checkoff for all PRs:

  • I have read the Guidelines for Contributing, and this PR conforms to the stated requirements.
  • I have added changes to the CHANGELOG and and documentation updates to the singularityware documentation base.
  • I have tested this PR locally with a make test
  • This PR is NOT against the project's master branch
  • I have added myself as a contributor to the contributors's file
  • This PR is ready for review and/or merge

Attn: @singularityware-admin

@GodloveD GodloveD changed the base branch from release-2.4 to development-2.x March 9, 2018 01:58
@GodloveD
Copy link
Collaborator Author

GodloveD commented Mar 9, 2018

@ctmadison and @paulo7777 this should be ready for testing.

@dtrudg
Copy link
Contributor

dtrudg commented Mar 9, 2018

Confirm this is working for me. GPU visible with --nv and --contain

$ singularity shell --nv --contain digits-6.0.simg

>>> from tensorflow.python.client import device_lib
>>> local_device_protos = device_lib.list_local_devices()
2018-03-09 15:15:56.051745: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-09 15:15:56.051784: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-09 15:15:56.051798: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-03-09 15:15:56.154121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-09 15:15:56.154419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.2785
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.74GiB
2018-03-09 15:15:56.154437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2018-03-09 15:15:56.154444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2018-03-09 15:15:56.154464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)

Copy link
Contributor

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested (see comment), and looks good to me.

On the C front - only thing I see would be if singularity_registry_get("NVDEV") could return an incorrectly terminated string, but I don't believe that's the case.

@dtrudg dtrudg added this to the 2.5 Release milestone Mar 9, 2018
@ctmadison ctmadison self-assigned this Mar 9, 2018
@GodloveD
Copy link
Collaborator Author

Thanks for merging the CHANGELOG.md @dctrud !

Copy link
Collaborator

@cclerget cclerget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must ensure SINGULARITY_NVDEV contains only paths starting with /dev/

Edit: or just use opendir/readdir directly in C code when --nv flag requested and match for nvidia entries

char *nvdev = strtok(nvdevs, ",");
while ( nvdev != NULL ) {
singularity_message(2, "Binding device %s\n", nvdev);
bind_dev(sessiondir, nvdev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GodloveD That's introduce a security issue, SINGULARITY_NVDEV is fully controlled by user so he could bind anything he want into container without sanity checks

@GodloveD
Copy link
Collaborator Author

Thanks for the review @cclerget. And thanks for the suggestion to use opendir and readdir. I think it makes the code cleaner and hopefully it makes the code safer as well.

@GodloveD
Copy link
Collaborator Author

As far as I know, @paulo7777 has tested this and it seems to be working for him. @paulo7777 can you verify?

@dtrudg
Copy link
Contributor

dtrudg commented Mar 27, 2018

@paulo7777 confirmed - also working for me here and @cclerget has approved from a security standpoint. Merging this.

Copy link
Contributor

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm this is working for me with the further changes, and Paulo has confirmed okay too.

@dtrudg dtrudg merged commit 6d982f4 into apptainer:development-2.x Mar 27, 2018
@GodloveD
Copy link
Collaborator Author

Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants