This repository has been archived by the owner on Jun 11, 2022. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4e0212b
commit 2403401
Showing
14 changed files
with
280 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Dask-ML Benchmarks | ||
|
||
Documenting the scalability and performance of Dask-ML. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
build/ | ||
./auto_examples/ |
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+24.8 KB
docs/source/auto_examples/images/sphx_glr_plot_parallel_post_fit_scaling_001.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+16 KB
...ce/auto_examples/images/thumb/sphx_glr_plot_parallel_post_fit_scaling_thumb.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
:orphan: | ||
|
||
|
||
|
||
|
||
.. raw:: html | ||
|
||
<div class="sphx-glr-thumbcontainer" tooltip="This example demonstrates :class:`dask_ml.wrappers.ParallelPostFit`. A :class:`sklearn.svm.SVC`..."> | ||
|
||
.. only:: html | ||
|
||
.. figure:: /auto_examples/images/thumb/sphx_glr_plot_parallel_post_fit_scaling_thumb.png | ||
|
||
:ref:`sphx_glr_auto_examples_plot_parallel_post_fit_scaling.py` | ||
|
||
.. raw:: html | ||
|
||
</div> | ||
|
||
|
||
.. toctree:: | ||
:hidden: | ||
|
||
/auto_examples/plot_parallel_post_fit_scaling | ||
.. raw:: html | ||
|
||
<div style='clear:both'></div> | ||
|
||
|
||
|
||
.. only :: html | ||
.. container:: sphx-glr-footer | ||
.. container:: sphx-glr-download | ||
:download:`Download all examples in Python source code: auto_examples_python.zip <//Users/taugspurger/sandbox/dask-ml-benchmarks/docs/source/auto_examples/auto_examples_python.zip>` | ||
.. container:: sphx-glr-download | ||
:download:`Download all examples in Jupyter notebooks: auto_examples_jupyter.zip <//Users/taugspurger/sandbox/dask-ml-benchmarks/docs/source/auto_examples/auto_examples_jupyter.zip>` | ||
.. only:: html | ||
|
||
.. rst-class:: sphx-glr-signature | ||
|
||
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_ |
54 changes: 54 additions & 0 deletions
54
docs/source/auto_examples/plot_parallel_post_fit_scaling.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%matplotlib inline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\nParallelizing Predicion\n=======================\n\nThis example demonstrates :class:`dask_ml.wrappers.ParallelPostFit`.\nA :class:`sklearn.svm.SVC` is fit on a small dataset that easily fits\nin memory.\n\nAfter training, we predict for successively larger datasets. We compare\n\n* The serial prediction time using the regular SVC.predict method\n* The parallel prediction time using\n :meth:`dask_ml.warppers.ParallelPostFit.predict`\n\nWe see that the parallel version is faster, especially for larger datasets.\nAdditionally, the parallel version from ParallelPostFit scales out to larger\nthan memory datasets.\n\nWhile only predict is demonstrated here, wrappers.ParallelPostFit is equally\nuseful for predict_proba and transform.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from timeit import default_timer as tic\n\nimport pandas as pd\nimport seaborn as sns\nimport sklearn.datasets\nfrom sklearn.svm import SVC\n\nimport dask_ml.datasets\nfrom dask_ml.wrappers import ParallelPostFit\n\nX, y = sklearn.datasets.make_classification(n_samples=1000)\nclf = ParallelPostFit(SVC(gamma='scale'))\nclf.fit(X, y)\n\n\nNs = [100_000, 200_000, 400_000, 800_000]\ntimings = []\n\n\nfor n in Ns:\n X, y = dask_ml.datasets.make_classification(n_samples=n,\n random_state=n,\n chunks=n // 20)\n t1 = tic()\n # Serial scikit-learn version\n clf.estimator.predict(X)\n timings.append(('Scikit-Learn', n, tic() - t1))\n\n t1 = tic()\n # Parallelized scikit-learn version\n clf.predict(X).compute()\n timings.append(('dask-ml', n, tic() - t1))\n\n\ndf = pd.DataFrame(timings,\n columns=['method', 'Number of Samples', 'Predict Time'])\nax = sns.factorplot(x='Number of Samples', y='Predict Time', hue='method',\n data=df, aspect=1.5)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
59 changes: 59 additions & 0 deletions
59
docs/source/auto_examples/plot_parallel_post_fit_scaling.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
""" | ||
Parallelizing Predicion | ||
======================= | ||
This example demonstrates :class:`dask_ml.wrappers.ParallelPostFit`. | ||
A :class:`sklearn.svm.SVC` is fit on a small dataset that easily fits | ||
in memory. | ||
After training, we predict for successively larger datasets. We compare | ||
* The serial prediction time using the regular SVC.predict method | ||
* The parallel prediction time using | ||
:meth:`dask_ml.warppers.ParallelPostFit.predict` | ||
We see that the parallel version is faster, especially for larger datasets. | ||
Additionally, the parallel version from ParallelPostFit scales out to larger | ||
than memory datasets. | ||
While only predict is demonstrated here, wrappers.ParallelPostFit is equally | ||
useful for predict_proba and transform. | ||
""" | ||
from timeit import default_timer as tic | ||
|
||
import pandas as pd | ||
import seaborn as sns | ||
import sklearn.datasets | ||
from sklearn.svm import SVC | ||
|
||
import dask_ml.datasets | ||
from dask_ml.wrappers import ParallelPostFit | ||
|
||
X, y = sklearn.datasets.make_classification(n_samples=1000) | ||
clf = ParallelPostFit(SVC(gamma='scale')) | ||
clf.fit(X, y) | ||
|
||
|
||
Ns = [100_000, 200_000, 400_000, 800_000] | ||
timings = [] | ||
|
||
|
||
for n in Ns: | ||
X, y = dask_ml.datasets.make_classification(n_samples=n, | ||
random_state=n, | ||
chunks=n // 20) | ||
t1 = tic() | ||
# Serial scikit-learn version | ||
clf.estimator.predict(X) | ||
timings.append(('Scikit-Learn', n, tic() - t1)) | ||
|
||
t1 = tic() | ||
# Parallelized scikit-learn version | ||
clf.predict(X).compute() | ||
timings.append(('dask-ml', n, tic() - t1)) | ||
|
||
|
||
df = pd.DataFrame(timings, | ||
columns=['method', 'Number of Samples', 'Predict Time']) | ||
ax = sns.factorplot(x='Number of Samples', y='Predict Time', hue='method', | ||
data=df, aspect=1.5) |
1 change: 1 addition & 0 deletions
1
docs/source/auto_examples/plot_parallel_post_fit_scaling.py.md5
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
6d397165ad9f798ec5746ebf2f35d717 |
101 changes: 101 additions & 0 deletions
101
docs/source/auto_examples/plot_parallel_post_fit_scaling.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
|
||
|
||
.. _sphx_glr_auto_examples_plot_parallel_post_fit_scaling.py: | ||
|
||
|
||
Parallelizing Predicion | ||
======================= | ||
|
||
This example demonstrates :class:`dask_ml.wrappers.ParallelPostFit`. | ||
A :class:`sklearn.svm.SVC` is fit on a small dataset that easily fits | ||
in memory. | ||
|
||
After training, we predict for successively larger datasets. We compare | ||
|
||
* The serial prediction time using the regular SVC.predict method | ||
* The parallel prediction time using | ||
:meth:`dask_ml.warppers.ParallelPostFit.predict` | ||
|
||
We see that the parallel version is faster, especially for larger datasets. | ||
Additionally, the parallel version from ParallelPostFit scales out to larger | ||
than memory datasets. | ||
|
||
While only predict is demonstrated here, wrappers.ParallelPostFit is equally | ||
useful for predict_proba and transform. | ||
|
||
|
||
|
||
|
||
.. image:: /auto_examples/images/sphx_glr_plot_parallel_post_fit_scaling_001.png | ||
:align: center | ||
|
||
|
||
|
||
|
||
|
||
.. code-block:: python | ||
from timeit import default_timer as tic | ||
import pandas as pd | ||
import seaborn as sns | ||
import sklearn.datasets | ||
from sklearn.svm import SVC | ||
import dask_ml.datasets | ||
from dask_ml.wrappers import ParallelPostFit | ||
X, y = sklearn.datasets.make_classification(n_samples=1000) | ||
clf = ParallelPostFit(SVC(gamma='scale')) | ||
clf.fit(X, y) | ||
Ns = [100_000, 200_000, 400_000, 800_000] | ||
timings = [] | ||
for n in Ns: | ||
X, y = dask_ml.datasets.make_classification(n_samples=n, | ||
random_state=n, | ||
chunks=n // 20) | ||
t1 = tic() | ||
# Serial scikit-learn version | ||
clf.estimator.predict(X) | ||
timings.append(('Scikit-Learn', n, tic() - t1)) | ||
t1 = tic() | ||
# Parallelized scikit-learn version | ||
clf.predict(X).compute() | ||
timings.append(('dask-ml', n, tic() - t1)) | ||
df = pd.DataFrame(timings, | ||
columns=['method', 'Number of Samples', 'Predict Time']) | ||
ax = sns.factorplot(x='Number of Samples', y='Predict Time', hue='method', | ||
data=df, aspect=1.5) | ||
**Total running time of the script:** ( 0 minutes 22.372 seconds) | ||
|
||
|
||
|
||
.. only :: html | ||
.. container:: sphx-glr-footer | ||
.. container:: sphx-glr-download | ||
:download:`Download Python source code: plot_parallel_post_fit_scaling.py <plot_parallel_post_fit_scaling.py>` | ||
.. container:: sphx-glr-download | ||
:download:`Download Jupyter notebook: plot_parallel_post_fit_scaling.ipynb <plot_parallel_post_fit_scaling.ipynb>` | ||
.. only:: html | ||
|
||
.. rst-class:: sphx-glr-signature | ||
|
||
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters