Ensure unique CUDA worker names #1270

matbryan52 · 2022-06-20T09:04:48Z

Fixes the case where a CUDA worker spec with the same name squashes the previous worker spec in the workers_spec dictionary. This enables spawning multiple CUDA workers on the same GPU via cluster_spec, which currently has to be done by manually by editing the returned dictionary (now more challenging due to tracing setup!!).

The first cuda worker created on each device has an unmodified name ('default-cuda-0'), subsequent workers on the same device are named ('default-cuda-0-0', 'default-cuda-0-1', 'default-cuda-0-2' etc).

Closes #1225

Contributor Checklist:

I have added or updated my entry in the creators.json file
I have added a changelog entry for my contribution
I have added/updated documentation for all user-facing changes
I have added/updated test cases
I have included the rebuilt production build of the client (only if changes were made to the GUI)

Reviewer Checklist:

/azp run libertem.libertem-data passed
No import of GPL code from MIT code

matbryan52 · 2022-06-20T09:06:07Z

/azp run libertem.libertem-data

azure-pipelines · 2022-06-20T09:06:19Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2022-06-20T09:16:58Z

Codecov Report

Merging #1270 (76c8d56) into master (9fec71c) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1270      +/-   ##
==========================================
+ Coverage   72.37%   72.42%   +0.04%     
==========================================
  Files         295      295              
  Lines       15455    15458       +3     
  Branches     2654     2656       +2     
==========================================
+ Hits        11186    11195       +9     
+ Misses       3859     3853       -6     
  Partials      410      410

Impacted Files	Coverage Δ
src/libertem/executor/dask.py	`79.92% <100.00%> (+1.99%)`	⬆️
src/libertem/io/corrections/detector.py	`89.61% <0.00%> (+0.64%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fec71c...76c8d56. Read the comment docs.

sk1p

Looks good to me, thanks!

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following #1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

@matbryan

Following #1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes

Ensure unique cuda worker names +doc +test

76c8d56

sk1p added this to the 0.10 milestone Jun 20, 2022

sk1p approved these changes Jun 20, 2022

View reviewed changes

sk1p merged commit 7cc1dbe into LiberTEM:master Jun 20, 2022

matbryan52 deleted the multi_workers_gpu branch June 20, 2022 12:15

matbryan52 mentioned this pull request Jul 25, 2022

Allow integer arguments for cpus and cudas in make_spec #1294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure unique CUDA worker names #1270

Ensure unique CUDA worker names #1270

matbryan52 commented Jun 20, 2022 •

edited by sk1p

Loading

matbryan52 commented Jun 20, 2022

azure-pipelines bot commented Jun 20, 2022

codecov bot commented Jun 20, 2022

sk1p left a comment

Ensure unique CUDA worker names #1270

Ensure unique CUDA worker names #1270

Conversation

matbryan52 commented Jun 20, 2022 • edited by sk1p Loading

Contributor Checklist:

Reviewer Checklist:

matbryan52 commented Jun 20, 2022

azure-pipelines bot commented Jun 20, 2022

codecov bot commented Jun 20, 2022

Codecov Report

sk1p left a comment

Choose a reason for hiding this comment

matbryan52 commented Jun 20, 2022 •

edited by sk1p

Loading