Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure unique CUDA worker names #1270

Merged
merged 1 commit into from
Jun 20, 2022
Merged

Conversation

matbryan52
Copy link
Member

@matbryan52 matbryan52 commented Jun 20, 2022

Fixes the case where a CUDA worker spec with the same name squashes the previous worker spec in the workers_spec dictionary. This enables spawning multiple CUDA workers on the same GPU via cluster_spec, which currently has to be done by manually by editing the returned dictionary (now more challenging due to tracing setup!!).

The first cuda worker created on each device has an unmodified name ('default-cuda-0'), subsequent workers on the same device are named ('default-cuda-0-0', 'default-cuda-0-1', 'default-cuda-0-2' etc).

Closes #1225

Contributor Checklist:

Reviewer Checklist:

  • /azp run libertem.libertem-data passed
  • No import of GPL code from MIT code

@matbryan52
Copy link
Member Author

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Jun 20, 2022

Codecov Report

Merging #1270 (76c8d56) into master (9fec71c) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1270      +/-   ##
==========================================
+ Coverage   72.37%   72.42%   +0.04%     
==========================================
  Files         295      295              
  Lines       15455    15458       +3     
  Branches     2654     2656       +2     
==========================================
+ Hits        11186    11195       +9     
+ Misses       3859     3853       -6     
  Partials      410      410              
Impacted Files Coverage Δ
src/libertem/executor/dask.py 79.92% <100.00%> (+1.99%) ⬆️
src/libertem/io/corrections/detector.py 89.61% <0.00%> (+0.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fec71c...76c8d56. Read the comment docs.

@sk1p sk1p added this to the 0.10 milestone Jun 20, 2022
Copy link
Member

@sk1p sk1p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@sk1p sk1p merged commit 7cc1dbe into LiberTEM:master Jun 20, 2022
@matbryan52 matbryan52 deleted the multi_workers_gpu branch June 20, 2022 12:15
uellue added a commit to uellue/LiberTEM that referenced this pull request Sep 5, 2022
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
sk1p pushed a commit that referenced this pull request Nov 8, 2022
Following #1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
uellue added a commit to uellue/LiberTEM that referenced this pull request Dec 21, 2022
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
uellue added a commit to uellue/LiberTEM that referenced this pull request Jan 10, 2023
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
uellue added a commit to uellue/LiberTEM that referenced this pull request Jan 12, 2023
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
uellue added a commit to uellue/LiberTEM that referenced this pull request Feb 9, 2023
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
uellue added a commit to uellue/LiberTEM that referenced this pull request Mar 17, 2023
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
sk1p pushed a commit that referenced this pull request Mar 20, 2023
Following #1270 by @matbryan, start several CUDA workers per device.
That utilizes the CUDA device while a worker is doing transfers or CPU work
such as decoding.

For live processing we routinely run several UDFs in parallel, meaning
it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload
on CUDA workers before.

With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads
will be scheduled only on CUDA workers. It also avoids oversubscription
or straggling partitions.

This change brought about 20 % speed-up compared to the previous worker setup in some tests.

A RAM budget of 4 GB worked well in simple tests, to be observed further.
More than four workers rarely brought a benefit.

As a drive-by: Fix mypy, cleanup, minor bug fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow spawning multiple CUDA workers on a single device when using cluster_spec
2 participants