-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes/improvements to coiled_flow #4
Comments
Thanks for digging in @scharlottej13. What you're observing with the different options to Bucket names are hard-coded b/c this is intended (long-term) to be part of a pipeline that can used for incremental processing. It would be fine for them to be public, read-only. I'm using the output of this workflow here. Agree about the |
Ah ok, I hadn't realized
Thx for the explanation, that's cool that it will feed into the xgboost example + post too.
Oh right. I'm curious if there is a way to make this work... |
Depends heavily on how the |
Closing this for now, opened an issue for |
I was working through running this flow as part of https://github.com/coiled/platform/issues/294 and noticed a few things (in order of importance). Thanks for putting this flow together @hayesgb and for working together on this, otherwise I would not have noticed any of these. To be clear, I'm happy to open up PRs for any/all of these.
dask.distributed.get_client
puts most of the work on a single workerscreencast of scheduler dashboard
https://user-images.githubusercontent.com/8620816/210840205-61108962-0e82-45d3-ba5a-a8a80e1ec770.movprefect_dask.get_dask_client
, however, this no longer works! The fix instead is to usedask.distributed.get_worker_client(separate_thread=False)
, which is essentially whatget_dask_client
does in Add get_dask_client PrefectHQ/prefect-dask#33.Furthermore, though I was able to use
get_worker_client(separate_thread=False)
in a minimal reproducer, I'm still working on ensuring it works for flows/coiled_flow.py; the write seems slower than I'd expect:coiled-from-prefect/flows/coiled_flow.py
Lines 75 to 80 in 865253f
All this to say, I'm still working on disentangling what's going on here, and will open a follow-up issue in the appropriate repo.
minimal repro w/ worker_client
the s3 bucket names are hard-coded and not public, is this intentional?
We shouldn't need a Prefect Block for the AWS credentials-- Coiled automatically creates + sends an STS token to the cluster. I also think this is more secure than grabbing it from a Prefect Block, as this returns a string:
coiled-from-prefect/flows/coiled_flow.py
Line 41 in 865253f
The text was updated successfully, but these errors were encountered: