New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add draft of institutional FAQ #5214
Conversation
Soliciting @mmccarty @datametrician @ericdill @hussainsultan @dharhas for additional questions |
How about something along the lines of "How do I convince my IT teams to use Dask?" Things like security, open license, pre-requisites for the requester to self-service, accessibility of admin logs might be included for convincing IT teams. Thoughts? |
I love it. We can have a whole IT section. What else shoulld be in there?
…On Fri, Aug 2, 2019 at 3:08 PM Hussain Sultan ***@***.***> wrote:
How about something along the lines of "How do I convince my IT teams to
use Dask?" Things like security, open license, pre-requisites for the
requester to self-service, accessibility of admin logs might be included
for convincing IT teams. Thoughts?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5214?email_source=notifications&email_token=AACKZTHEVZNYYNISWNGVMVLQCSV5VA5CNFSM4II74PLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3O6F2Q#issuecomment-517858026>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTCXSVZNFTODHJVEXY3QCSV5VANCNFSM4II74PLA>
.
|
Today I demo'ed killing a worker during a large matrix operation on a k8s cluster. Many minds blown! |
I am hoping we can crowdsource some of the things that may matter to an IT team. In my mind, we should start off with how to build a business case for dask, a set of actions that a dask user may ask an IT team to perform, and a set of best practices around operations.
What else matters to an IT organization to build comfort in accepting a new distributed service? |
|
@mmccarty I agree, that should be part of the business case to answer a question like: "Why should I maintain a new service when I already maintain Spark?" |
I'm working on an executive-level deck to explain "what is Dask and why do I care?" I've been asked to keep it to 5 - 6 slides. It is currently 23 slides. :( |
Or "How would I set up Dask under my company's cloud governance policies?" A few random topics come to mind:
|
Thanks for pinging me on this @mrocklin Questions that are at the forefront of my thinking coming from a background in Data Engineering on Hadoop / Spark / Yarn:
And some generic questions / statements to consider
cc @mariusvniekerk who can provide some additional color / context / ideas / questions to this thread. |
I suspect that this is internal-specific, but if a generic version of that existed I'm sure that it would see some use. @mmccarty |
Thanks all. These responses have been great. I'm now thinking about how to organize this information into a concise set of questions for users to navigate, and that I (and hopefully others?) can write. To start off, I'm personally thinking of writing a short paragraph or two for around 20 questions. Should we be thinking bigger than this? If so, would you all be interested in helping as a group? If not, how do we select a small set of questions from those that are listed above, and how do we organize them? |
I'd probably structure them around high level topics like Deploying dask at scale |
@mrocklin Happy to share a generic version of the deck once it is ready. Also, happy to help out with this document. |
Need a section that addresses stuff that the security folks will be concerned about, i.e. which ports does dask use, do any ports need to be opened etc. |
OK, I've pushed up some content. Currently it is organized under three common roles that I often find in presentation rooms. Here is the current structure
There were many questions above I didn't have time to answer. My hope is that this provides a firm enough base that we can add things too it as necessary. |
I also plan to cull this list in a bit after I rest and clear out my memory of it. |
|
||
If Dask's centralized scheduler goes down then you would need to resubmit the | ||
computation. This is a fairly standard level of resiliency today, shared with | ||
other tooling like Apache Spark, Flink, and others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to mention distributed vs centralized schedulers here? Ray and Spark both have a distributed scheduler, but I don't know much about them.
A concern I've heard is over the centralized scheduler going down, but how likely is this given the dask scheduler isn't doing any of the work itself? It only has one job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a brief look, tangram seems to be a distributed system that is happy to launch Spark jobs, rather than an alternative distributed Spark driver.
Ray is genuinely a different architecture. I'm curious, I know that it uses Redis for metadata communication. Does it use distributed Redis for this or the more common single-node Redis server, if so, how resilient is distributed Redis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was miss informed about tangram and Spark. I don't see a native distributed Spark driver. My internal expert is double checking to see if this is natively possible. Going back to Spark scheduler resiliency, it is achievable with a resource manager like Yarn, which also transfers state.
From this discussion on Ray:
specifications of the tasks are stored in a sharded in-memory database, currently we are not resilient to the failure of this database, but we're prototyping a fault tolerance scheme based on chain replication for it
I believe this is the Redis metadata store you mention, so it is indeed no different from a resilience perspective.
Is it safe to save the scheduler resilience is left to a resource manager and relying on the user to resubmit computations?
This fact checking was useful since this topic recently came up at C1. I think what you have is fine, but other readers may want to dig in on this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in normal operation relying on a resource manager like Yarn or Kubernetes seems to solve problems for most institutions. That being said, highly available schedulers have been requested a few times, and is something that could be done. It's a non-trivial effort though and, despite interest, I haven't seen a case where it was actually necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I haven't either, yet people seem to be concerned about it. I'll keep an eye on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stock answer here is that "Dask has the same resilience properties as Spark"
The *vast majority* of institutional users though do not reach this limit. | ||
For more information you may want to peruse our :doc:`best practices | ||
<best-practices>` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I get. How does Dask move data around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! A few more thoughts above. Let me know if you want help. |
That would be welcome |
Thanks Tom |
I plan to merge this in later today if there are no further comments. There is still plenty of work to do here, but hopefully this can provide a base from which to expand. |
Thanks @mrocklin. We'll do additional items as followup PRs. |
Fixes #5016
black dask
/flake8 dask