Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application Failure When Submitting Dask-Yarn Model Inferencing Job Remotely #152

Closed
rileyhun opened this issue Nov 28, 2021 · 0 comments
Closed

Comments

@rileyhun
Copy link

rileyhun commented Nov 28, 2021

What happened:
I've been following the documentation here to submit my application to dask-yarn. Unfortunately, the job keeps failing when I run deploy-mode as remote. It does seem to work when deploy-mode is local though. The other thing to note is that the worker-count and worker-vcores don't even reflect what I specified in my dask-yarn submit parameters. I tried looking into the yarn application logs but they weren't particularly helpful. The logs just say

21/11/28 10:47:18 INFO skein.ApplicationMaster: Shutting down: Exception in submitted dask application, see logs for more details

...but don't point me to where to look for this exception.

What you expected to happen:

I expected the application status to run to completion but instead the status returned was FAILED.

Minimal Complete Verifiable Example:

dask-yarn submit \
  --name uq_component_batch_inference \
  --environment s3://ch-ml-data/uq_component_count/dask_environment/uq_component_dask.tar.gz \
  --deploy-mode remote \
  --worker-count 30 \
  --worker-vcores 2 \
  --worker-memory 8GiB \
  myscript.py

Anything else we need to know?:
Relevant files are attached here:
Archive.zip

Environment:
Only 26 containers and 26 vcores despite my specifying 30 workers with 2 cores each:
Screen Shot 2021-11-28 at 3 09 41 AM

Application failed
Screen Shot 2021-11-28 at 3 10 56 AM

  • Dask-Yarn version: 0.9.0
  • Python version: 3.7.10
  • Install method (conda, pip, source): conda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant