-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure unique CUDA worker names #1270
Conversation
/azp run libertem.libertem-data |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1270 +/- ##
==========================================
+ Coverage 72.37% 72.42% +0.04%
==========================================
Files 295 295
Lines 15455 15458 +3
Branches 2654 2656 +2
==========================================
+ Hits 11186 11195 +9
+ Misses 3859 3853 -6
Partials 410 410
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following #1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following LiberTEM#1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Following #1270 by @matbryan, start several CUDA workers per device. That utilizes the CUDA device while a worker is doing transfers or CPU work such as decoding. For live processing we routinely run several UDFs in parallel, meaning it is likely that not all UDFs support CUDA. Furthermore we already accepted CPU workload on CUDA workers before. With this change, we start hybrid CPU+CUDA workers, meaning only CUDA-only workloads will be scheduled only on CUDA workers. It also avoids oversubscription or straggling partitions. This change brought about 20 % speed-up compared to the previous worker setup in some tests. A RAM budget of 4 GB worked well in simple tests, to be observed further. More than four workers rarely brought a benefit. As a drive-by: Fix mypy, cleanup, minor bug fixes
Fixes the case where a CUDA worker spec with the same name squashes the previous worker spec in the
workers_spec
dictionary. This enables spawning multiple CUDA workers on the same GPU viacluster_spec
, which currently has to be done by manually by editing the returned dictionary (now more challenging due to tracing setup!!).The first cuda worker created on each device has an unmodified name ('default-cuda-0'), subsequent workers on the same device are named ('default-cuda-0-0', 'default-cuda-0-1', 'default-cuda-0-2' etc).
Closes #1225
Contributor Checklist:
Reviewer Checklist:
/azp run libertem.libertem-data
passed