Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #322

Merged
merged 4 commits into from Aug 19, 2018
Merged

Roadmap #322

merged 4 commits into from Aug 19, 2018

Conversation

@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Jul 27, 2018

Closes #210

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Jul 27, 2018

I like the high level framing. I'm somewhat concerned that it is specific in some instances (like text processing and async algorithms) but not comprehensive. I think that if a document like this lists specific cases then it is obligated to list all of the ways in which it may grow comprehensively. Personally I feel like I don't know enough about the ways in which dask-ml will grow in the near future (even a month out is pretty murky) so I find this sort of thing challenging.

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Jul 27, 2018

Also, given that this sort of thing is likely to change frequently for a project like dask-ml, I wonder if it would be better suited as a living document like a google doc or hackmd page.

Copy link
Member

@stsievert stsievert left a comment

This looks pretty good. It has some high level framing.

Maybe we should rename "asynchronous algorithms" to "distributed optimization", and write something like

Dask's distributed architecture opens a lot of doors for optimization, especially as dataset size grows and clusters become more common. This comes in two flavors:

  1. optimization algorithms tuned for distributed systems
  2. managing the input and output of different machines.

This would bring practical advantages including reduced training time and memory consumption.


1. Working with existing libraries within the Python ecosystem
2. Using the features of Dask to scale computation to larger datasets and larger
problems

This comment has been minimized.

@stsievert

stsievert Jul 28, 2018
Member

larger datasets and larger
problems

Maybe "larger datasets and harder problems"?


Other Deep Learning and Machine Learning frameworks have their own distributed
runtimes. Examples include Tensorflow, PyTorch, XGBoost, and LightGBM. Dask-ML
is not interested in re-implementing everything these libraries too. Rather,

This comment has been minimized.

@stsievert

stsievert Jul 28, 2018
Member

everything these libraries too do

Typo: too => do

@stsievert
Copy link
Member

@stsievert stsievert commented Aug 7, 2018

Parts of this roadmap, or a summary of it, are featured on https://www.quansight.com/projects

Dask ML has two goals. Scale machine learning to more computation; and scale machine learning to big data. Dask’s architecture can scale to many machines using parallelism for optimization.

  • Big Data Algorithms
  • Optimization Framework
  • Existing Libraries ​

with a big “coming soon” button at the bottom where other libraries link to their roadmap (e.g., https://docs.wixstatic.com/ugd/06679a_0a932ffb16e4445fba8bb0c7f8d81cd6.pdf). The two roadmaps I looked at were both PDFs styled the same way.

@scopatz
Copy link

@scopatz scopatz commented Aug 14, 2018

@stsievert - those went up accidentally via an adminstrative error and were removed as soon as I discovered them. All apologies!

We are currently waiting for this to be merged. Then we'll cut a draft of the brochure for your aproval and editting before anything hits our website.

@TomAugspurger
Copy link
Member Author

@TomAugspurger TomAugspurger commented Aug 16, 2018

On the call today, discussed a roadmap document that just lists the goals / values of Dask-ML, and links to Github issues for individual items. This seems like a good compromise for a project as young as dask-ml.

I've created a roadmap label and tagged a few issues with it.

@TomAugspurger
Copy link
Member Author

@TomAugspurger TomAugspurger commented Aug 18, 2018

CI failures are being tracked in #333 and fixed upstream.

@TomAugspurger
Copy link
Member Author

@TomAugspurger TomAugspurger commented Aug 18, 2018

Maybe I'll wait till Monday for a +1 from @stsievert or @mrocklin before merging. I hope this accurately captures what we discussed on the call.

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Aug 19, 2018

@TomAugspurger TomAugspurger merged commit 0a73136 into dask:master Aug 19, 2018
1 of 4 checks passed
1 of 4 checks passed
ci/circleci: py27 Your tests failed on CircleCI
Details
ci/circleci: py36 Your tests failed on CircleCI
Details
ci/circleci: sklearn_dev Your tests failed on CircleCI
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@TomAugspurger TomAugspurger deleted the TomAugspurger:roadmap branch Aug 19, 2018
stsievert added a commit to stsievert/dask-ml that referenced this pull request Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants