Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass GPU diagnostics from worker to scheduler #2932

Merged
merged 2 commits into from Aug 9, 2019

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Aug 6, 2019

This does a few things:

  1. Use pynvml to collect information about any CUDA GPUs present
  2. Optionally add those metrics to the worker's initial handshake and
    heartbeats
  3. Collect that information in the scheduler in the WorkerState object

For now these just hang out in the scheduler information,
but in the future they might be used for dashboards,
or possibly scheduling decisions in the future.

I believe that everything gpu-specific here is fairly well separated
and generalized (others should be able to follow this pattern to add
more diagnostics relatively easily) but it would be good to hear from
others on if this is out of scope.

mrocklin added 2 commits Aug 6, 2019
This does a few things:

1.  Use `pynvml` to collect information about any CUDA GPUs present
2.  Optionally add those metrics to the worker's initial handshake and
    heartbeats
3.  Collect that information in the scheduler in the WorkerState object

For now these just hang out in the scheduler information,
but in the future they might be used for dashboards,
or possibly scheduling decisions in the future.

I believe that everything gpu-specific here is fairly well separated
and generalized (others should be able to follow this pattern to add
more diagnostics relatively easily) but it would be good to hear from
others on if this is out of scope.
@mrocklin mrocklin force-pushed the gpu-diagnostics branch from 0dc2f13 to f30d4b3 Aug 7, 2019
@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Aug 7, 2019

In general, this kind of special / extra information seems fine, as long as it doesn't affect the common case when the GPUs isn't present (and this one is fine).

Loading

@mrocklin mrocklin merged commit a555155 into dask:master Aug 9, 2019
2 checks passed
Loading
@mrocklin mrocklin deleted the gpu-diagnostics branch Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants