New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #322

Merged
merged 4 commits into from Aug 19, 2018

Conversation

Projects
None yet
4 participants
@TomAugspurger
Member

TomAugspurger commented Jul 27, 2018

Closes #210

TomAugspurger added some commits Jul 26, 2018

wip
@mrocklin

This comment has been minimized.

Member

mrocklin commented Jul 27, 2018

I like the high level framing. I'm somewhat concerned that it is specific in some instances (like text processing and async algorithms) but not comprehensive. I think that if a document like this lists specific cases then it is obligated to list all of the ways in which it may grow comprehensively. Personally I feel like I don't know enough about the ways in which dask-ml will grow in the near future (even a month out is pretty murky) so I find this sort of thing challenging.

@mrocklin

This comment has been minimized.

Member

mrocklin commented Jul 27, 2018

Also, given that this sort of thing is likely to change frequently for a project like dask-ml, I wonder if it would be better suited as a living document like a google doc or hackmd page.

@stsievert

This looks pretty good. It has some high level framing.

Maybe we should rename "asynchronous algorithms" to "distributed optimization", and write something like

Dask's distributed architecture opens a lot of doors for optimization, especially as dataset size grows and clusters become more common. This comes in two flavors:

  1. optimization algorithms tuned for distributed systems
  2. managing the input and output of different machines.

This would bring practical advantages including reduced training time and memory consumption.

1. Working with existing libraries within the Python ecosystem
2. Using the features of Dask to scale computation to larger datasets and larger
problems

This comment has been minimized.

@stsievert

stsievert Jul 28, 2018

Contributor

larger datasets and larger
problems

Maybe "larger datasets and harder problems"?

Other Deep Learning and Machine Learning frameworks have their own distributed
runtimes. Examples include Tensorflow, PyTorch, XGBoost, and LightGBM. Dask-ML
is not interested in re-implementing everything these libraries too. Rather,

This comment has been minimized.

@stsievert

stsievert Jul 28, 2018

Contributor

everything these libraries too do

Typo: too => do

@stsievert

This comment has been minimized.

Contributor

stsievert commented Aug 7, 2018

Parts of this roadmap, or a summary of it, are featured on https://www.quansight.com/projects

Dask ML has two goals. Scale machine learning to more computation; and scale machine learning to big data. Dask’s architecture can scale to many machines using parallelism for optimization.

  • Big Data Algorithms
  • Optimization Framework
  • Existing Libraries ​

with a big “coming soon” button at the bottom where other libraries link to their roadmap (e.g., https://docs.wixstatic.com/ugd/06679a_0a932ffb16e4445fba8bb0c7f8d81cd6.pdf). The two roadmaps I looked at were both PDFs styled the same way.

@scopatz

This comment has been minimized.

scopatz commented Aug 14, 2018

@stsievert - those went up accidentally via an adminstrative error and were removed as soon as I discovered them. All apologies!

We are currently waiting for this to be merged. Then we'll cut a draft of the brochure for your aproval and editting before anything hits our website.

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Aug 16, 2018

On the call today, discussed a roadmap document that just lists the goals / values of Dask-ML, and links to Github issues for individual items. This seems like a good compromise for a project as young as dask-ml.

I've created a roadmap label and tagged a few issues with it.

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Aug 18, 2018

CI failures are being tracked in #333 and fixed upstream.

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Aug 18, 2018

Maybe I'll wait till Monday for a +1 from @stsievert or @mrocklin before merging. I hope this accurately captures what we discussed on the call.

@mrocklin

This comment has been minimized.

Member

mrocklin commented Aug 19, 2018

@TomAugspurger TomAugspurger merged commit 0a73136 into dask:master Aug 19, 2018

1 of 4 checks passed

ci/circleci: py27 Your tests failed on CircleCI
Details
ci/circleci: py36 Your tests failed on CircleCI
Details
ci/circleci: sklearn_dev Your tests failed on CircleCI
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@TomAugspurger TomAugspurger deleted the TomAugspurger:roadmap branch Aug 19, 2018

stsievert added a commit to stsievert/dask-ml that referenced this pull request Aug 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment