Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass GPU diagnostics from worker to scheduler #2932

Merged
merged 2 commits into from Aug 9, 2019

Conversation

@mrocklin
Copy link
Member

commented Aug 6, 2019

This does a few things:

  1. Use pynvml to collect information about any CUDA GPUs present
  2. Optionally add those metrics to the worker's initial handshake and
    heartbeats
  3. Collect that information in the scheduler in the WorkerState object

For now these just hang out in the scheduler information,
but in the future they might be used for dashboards,
or possibly scheduling decisions in the future.

I believe that everything gpu-specific here is fairly well separated
and generalized (others should be able to follow this pattern to add
more diagnostics relatively easily) but it would be good to hear from
others on if this is out of scope.

mrocklin added 2 commits Aug 6, 2019
Pass GPU diagnostics from worker to scheduler
This does a few things:

1.  Use `pynvml` to collect information about any CUDA GPUs present
2.  Optionally add those metrics to the worker's initial handshake and
    heartbeats
3.  Collect that information in the scheduler in the WorkerState object

For now these just hang out in the scheduler information,
but in the future they might be used for dashboards,
or possibly scheduling decisions in the future.

I believe that everything gpu-specific here is fairly well separated
and generalized (others should be able to follow this pattern to add
more diagnostics relatively easily) but it would be good to hear from
others on if this is out of scope.

@mrocklin mrocklin force-pushed the mrocklin:gpu-diagnostics branch from 0dc2f13 to f30d4b3 Aug 7, 2019

@TomAugspurger

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

In general, this kind of special / extra information seems fine, as long as it doesn't affect the common case when the GPUs isn't present (and this one is fine).

@mrocklin mrocklin merged commit a555155 into dask:master Aug 9, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@mrocklin mrocklin deleted the mrocklin:gpu-diagnostics branch Aug 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.