Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cudaDeviceCount flag to the request requirements #1895

Merged
merged 1 commit into from
Aug 29, 2023

Conversation

ndonyapour
Copy link
Contributor

@ndonyapour ndonyapour commented Aug 28, 2023

Hello,

I have been attempting to execute a CUDA-based workflow using the --parallel flag. However, I encountered the following error message:

ERROR Got workflow error: 'cudaDeviceCount'
Traceback (most recent call last):
  File "/home/donyapourn2/actions-runner/_work/workflow-inference-compiler/workflow-inference-compiler/3/envs/globalwic/lib/pypy3.9/site-packages/cwltool/executors.py", line 314, in _runner
    job.run(runtime_context, TMPDIR_LOCK)
  File "/home/donyapourn2/actions-runner/_work/workflow-inference-compiler/workflow-inference-compiler/3/envs/globalwic/lib/pypy3.9/site-packages/cwltool/job.py", line 823, in run
    self._setup(runtimeContext)
  File "/home/donyapourn2/actions-runner/_work/workflow-inference-compiler/workflow-inference-compiler/3/envs/globalwic/lib/pypy3.9/site-packages/cwltool/job.py", line 181, in _setup
    count = cuda_check(cuda_req, math.ceil(self.builder.resources["cudaDeviceCount"]))
KeyError: 'cudaDeviceCount'

Upon investigating the code, it appears that the evalResources function fails to include cudaDeviceCount in the request. I have made necessary changes to fix the error.

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Merging #1895 (e855fed) into main (509ffb9) will increase coverage by 0.00%.
The diff coverage is 75.00%.

@@           Coverage Diff           @@
##             main    #1895   +/-   ##
=======================================
  Coverage   83.95%   83.96%           
=======================================
  Files          46       46           
  Lines        8152     8163   +11     
  Branches     2168     2168           
=======================================
+ Hits         6844     6854   +10     
- Misses        838      839    +1     
  Partials      470      470           
Files Changed Coverage Δ
cwltool/executors.py 84.51% <71.42%> (+0.23%) ⬆️
cwltool/process.py 92.81% <100.00%> (+0.01%) ⬆️

@mr-c
Copy link
Member

mr-c commented Aug 29, 2023

Thank you @ndonyapour !

I'm going to see if I can come up with a unit test for this, before merging.

@mr-c mr-c force-pushed the cuda_parallel branch 2 times, most recently from bb7cbbb to a90d10b Compare August 29, 2023 07:44
Co-authored-by: Nazanin Donyapour <nazanin.donyapour@gmail.com>
@mr-c
Copy link
Member

mr-c commented Aug 29, 2023

@ndonyapour can you test my changes with a real workflow using cudaDeviceCountMin and cwltool --parallel?

@ndonyapour
Copy link
Contributor Author

yes, I tested your changes and it worked!

@mr-c mr-c merged commit 9accbc5 into common-workflow-language:main Aug 29, 2023
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants