Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPOT 0.10.0 release #855

Merged
merged 76 commits into from
Apr 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
f505039
add more GP type for strongly typed GP
weixuanfu Apr 13, 2018
fa96afe
clean up codes
weixuanfu Apr 13, 2018
3a44c42
add tempate param
weixuanfu Apr 13, 2018
fbb78b5
add fix length param
weixuanfu Apr 16, 2018
c9a4aae
inital trial for template function
weixuanfu Apr 16, 2018
bdd7f7d
template works
weixuanfu Apr 16, 2018
87bf8c8
Dataset selector
trangdata Apr 18, 2018
aad6bcf
Merge remote-tracking branch 'trang/grixor-dataset-selector' into dat…
weixuanfu Apr 19, 2018
4b1e6a4
change filename to lowercase for dataset_selector
weixuanfu Apr 19, 2018
e03654a
refine dataset selector
weixuanfu Apr 19, 2018
92f0b4f
bug fix
weixuanfu Apr 19, 2018
24ac32f
Merge pull request #3 from weixuanfu/template_opt
weixuanfu Apr 26, 2018
5046388
add test codes and fix dataset check in the fit
weixuanfu Apr 26, 2018
c4a32e2
fix a bug
weixuanfu Apr 26, 2018
e311ac2
use fname instead of index
weixuanfu Apr 30, 2018
5fde220
better unit tests
weixuanfu Apr 30, 2018
04abe10
better per test
weixuanfu Apr 30, 2018
f9ad82e
clean codes
weixuanfu Apr 30, 2018
109812e
fix type unit tests
weixuanfu Apr 30, 2018
ed1cb15
Add first set of hyperparams
Apr 26, 2018
843cbc8
fix most unit tests 1 left related to stats
weixuanfu Apr 30, 2018
39709b6
Rename nn config dict
Apr 30, 2018
db54137
fix bug in expr_mut
weixuanfu May 1, 2018
b193abf
fix unit tests
weixuanfu May 1, 2018
61dfd56
remove example
weixuanfu May 1, 2018
06dc7e3
fix all unit tests
weixuanfu May 1, 2018
7dab563
Merge pull request #5 from weixuanfu/datasel_op
weixuanfu May 1, 2018
c078ad2
Merge pull request #4 from weixuanfu/neural_networks
weixuanfu May 2, 2018
fce0051
add tree structure
weixuanfu May 7, 2018
6a7937a
clean codes
weixuanfu May 7, 2018
95870f8
add combineDFs in pipeline
weixuanfu May 7, 2018
a0dfe3e
more complex tree sturture
weixuanfu May 8, 2018
0024a78
more complex tree stucture
weixuanfu May 8, 2018
d81e3c9
update base for pd df input
weixuanfu Jun 18, 2018
dcda20d
add support for CLI for template
weixuanfu Jun 28, 2018
cb3e6e6
clean docs
weixuanfu Jun 28, 2018
7349cbe
rebase dev branch
weixuanfu Aug 6, 2018
c1f0095
fix fail pipeline during init and mutation
weixuanfu Aug 7, 2018
38e6a55
fix unit tests
weixuanfu Aug 7, 2018
7d04263
clean codes
weixuanfu Aug 7, 2018
f152452
clean codes
weixuanfu Aug 7, 2018
5330515
refine return from dataset selector and use object for ret type
weixuanfu Aug 22, 2018
2acc71f
fix conflists
weixuanfu Sep 12, 2018
ca3fdab
fix unit test
weixuanfu Sep 12, 2018
b77a5b6
change random set for fixing a unit test
weixuanfu Sep 12, 2018
c305731
Merge remote-tracking branch 'upstream/development' into template_opt
weixuanfu Sep 12, 2018
3b348e9
fix bug in decorators for mut/xo
weixuanfu Sep 17, 2018
b614de9
better pset
weixuanfu Sep 18, 2018
37f268e
better min and max
weixuanfu Sep 18, 2018
ef29cd9
refine dataset selector
weixuanfu Oct 15, 2018
fc6545e
fix a bug
weixuanfu Oct 15, 2018
d9a668c
add sorted feat_list
weixuanfu Dec 6, 2018
2587a1e
fix a unit test
weixuanfu Dec 6, 2018
334b9ee
support eli5
weixuanfu Dec 18, 2018
c54db6e
fix conflicts
weixuanfu Mar 14, 2019
814a3a2
fix a bug
weixuanfu Mar 14, 2019
a0836f5
fix a bug when sample size is less than 50
weixuanfu Mar 18, 2019
40b97c3
update version number
weixuanfu Apr 4, 2019
9173699
add 3 unit tests for template
weixuanfu Apr 4, 2019
0219032
make dataselector selctor as selector instead of transformer
weixuanfu Apr 4, 2019
4ba9a07
rename DS to FeatureSetSelector
weixuanfu Apr 11, 2019
08aee66
refine n_jobs parameter to support n_jobs < -2 #846
weixuanfu Apr 11, 2019
a2264b0
refine template docs
weixuanfu Apr 11, 2019
7823ab5
add teimplate examples
weixuanfu Apr 11, 2019
83c641d
add FFS example
weixuanfu Apr 11, 2019
909981e
generate html pages
weixuanfu Apr 11, 2019
b1b61e4
refine html pages
weixuanfu Apr 11, 2019
be0783d
remove tpot nn config
weixuanfu Apr 11, 2019
49b73b8
refine memory param #837
weixuanfu Apr 11, 2019
a2fe7cd
install xgboost into windows ci
weixuanfu Apr 11, 2019
dafb188
xgboost from anaconda
weixuanfu Apr 11, 2019
a06c33f
refine unit tests because xgboost cannot install in windows ci
weixuanfu Apr 11, 2019
1764731
refine docs of FSS selector
weixuanfu Apr 11, 2019
8c90b85
add unit tests for more coverage
weixuanfu Apr 12, 2019
fbda879
Merge pull request #854 from weixuanfu/template_opt
weixuanfu Apr 12, 2019
85f0c11
rm landscape status until it works
weixuanfu Apr 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
Master status: [![Master Build Status - Mac/Linux](https://travis-ci.org/EpistasisLab/tpot.svg?branch=master)](https://travis-ci.org/EpistasisLab/tpot)
[![Master Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/master?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=master)
[![Master Code Health](https://landscape.io/github/EpistasisLab/tpot/master/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/tpot/master)
[![Master Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=master)](https://coveralls.io/github/EpistasisLab/tpot?branch=master)

Development status: [![Development Build Status - Mac/Linux](https://travis-ci.org/EpistasisLab/tpot.svg?branch=development)](https://travis-ci.org/EpistasisLab/tpot/branches)
[![Development Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/development?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=development)
[![Development Code Health](https://landscape.io/github/EpistasisLab/tpot/development/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/tpot/development)
[![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=development)](https://coveralls.io/github/EpistasisLab/tpot?branch=development)

Package information: [![Python 2.7](https://img.shields.io/badge/python-2.7-blue.svg)](https://www.python.org/download/releases/2.7/)
Expand Down
24 changes: 22 additions & 2 deletions docs/api/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ <h1 id="classification">Classification</h1>
<strong>subsample</strong>=1.0, <strong>n_jobs</strong>=1,
<strong>max_time_mins</strong>=None, <strong>max_eval_time_mins</strong>=5,
<strong>random_state</strong>=None, <strong>config_dict</strong>=None,
<strong>template</strong>="RandomTree",
<strong>warm_start</strong>=False,
<strong>memory</strong>=None,
<strong>use_dask</strong>=False,
Expand Down Expand Up @@ -246,7 +247,7 @@ <h1 id="classification">Classification</h1>
<blockquote>
Number of processes to use in parallel for evaluating pipelines during the TPOT optimization process.
<br /><br />
Setting <em>n_jobs</em>=-1 will use as many cores as available on the computer. Beware that using multiple processes on the same machine may cause memory issues for large datasets
Setting <em>n_jobs</em>=-1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Beware that using multiple processes on the same machine may cause memory issues for large datasets.
</blockquote>

<strong>max_time_mins</strong>: integer or None, optional (default=None)
Expand Down Expand Up @@ -285,6 +286,15 @@ <h1 id="classification">Classification</h1>
See the <a href="../using/#built-in-tpot-configurations">built-in configurations</a> section for the list of configurations included with TPOT, and the <a href="../using/#customizing-tpots-operators-and-parameters">custom configuration</a> section for more information and examples of how to create your own TPOT configurations.
</blockquote>

<strong>template</strong>: string (default="RandomTree")
<blockquote>
Template of predefined pipeline structure. The option is for specifying a desired structure for the machine learning pipeline evaluated in TPOT.
<br /><br />
So far this option only supports linear pipeline structure. Each step in the pipeline should be a main class of operators (Selector, Transformer, Classifier) or a specific operator (e.g. `SelectPercentile`) defined in TPOT operator configuration. If one step is a main class, TPOT will randomly assign all subclass operators (subclasses of [`SelectorMixin`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17), [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html), [`ClassifierMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.ClassifierMixin.html) in scikit-learn) to that step. Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Classifier". By default value of template is "RandomTree", TPOT generates tree-based pipeline randomly.

See the <a href="../using/#template-option-in-tpot"> template option in tpot</a> section for more details.
</blockquote>

<strong>warm_start</strong>: boolean, optional (default=False)
<blockquote>
Flag indicating whether the TPOT instance will reuse the population from previous calls to <em>fit()</em>.
Expand Down Expand Up @@ -611,6 +621,7 @@ <h1 id="regression">Regression</h1>
<strong>subsample</strong>=1.0, <strong>n_jobs</strong>=1,
<strong>max_time_mins</strong>=None, <strong>max_eval_time_mins</strong>=5,
<strong>random_state</strong>=None, <strong>config_dict</strong>=None,
<strong>template</strong>="RandomTree",
<strong>warm_start</strong>=False,
<strong>memory</strong>=None,
<strong>use_dask</strong>=False,
Expand Down Expand Up @@ -709,7 +720,7 @@ <h1 id="regression">Regression</h1>
<blockquote>
Number of processes to use in parallel for evaluating pipelines during the TPOT optimization process.
<br /><br />
Setting <em>n_jobs</em>=-1 will use as many cores as available on the computer. Beware that using multiple processes on the same machine may cause memory issues for large datasets
Setting <em>n_jobs</em>=-1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Beware that using multiple processes on the same machine may cause memory issues for large datasets
</blockquote>

<strong>max_time_mins</strong>: integer or None, optional (default=None)
Expand Down Expand Up @@ -748,6 +759,15 @@ <h1 id="regression">Regression</h1>
See the <a href="../using/#built-in-tpot-configurations">built-in configurations</a> section for the list of configurations included with TPOT, and the <a href="../using/#customizing-tpots-operators-and-parameters">custom configuration</a> section for more information and examples of how to create your own TPOT configurations.
</blockquote>

<strong>template</strong>: string (default="RandomTree")
<blockquote>
Template of predefined pipeline structure. The option is for specifying a desired structure for the machine learning pipeline evaluated in TPOT.
<br /><br />
So far this option only supports linear pipeline structure. Each step in the pipeline should be a main class of operators (Selector, Transformer or Regressor) or a specific operator (e.g. `SelectPercentile`) defined in TPOT operator configuration. If one step is a main class, TPOT will randomly assign all subclass operators (subclasses of [`SelectorMixin`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17), [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html) or [`RegressorMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.RegressorMixin.html) in scikit-learn) to that step. Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Regressor". By default value of template is "RandomTree", TPOT generates tree-based pipeline randomly.

See the <a href="../using/#template-option-in-tpot"> template option in tpot</a> section for more details.
</blockquote>

<strong>warm_start</strong>: boolean, optional (default=False)
<blockquote>
Flag indicating whether the TPOT instance will reuse the population from previous calls to <em>fit()</em>.
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -213,5 +213,5 @@

<!--
MkDocs version : 0.17.2
Build Date UTC : 2019-03-01 17:12:19
Build Date UTC : 2019-04-11 20:02:14
-->
22 changes: 16 additions & 6 deletions docs/search/search_index.json

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,79 +4,79 @@

<url>
<loc>http://epistasislab.github.io/tpot/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/installing/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/using/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/api/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/examples/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/contributing/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/releases/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/citing/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/support/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/related/</loc>
<lastmod>2019-03-01</lastmod>
<lastmod>2019-04-11</lastmod>
<changefreq>daily</changefreq>
</url>

Expand Down
56 changes: 53 additions & 3 deletions docs/using/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@
<li class="toctree-l2"><a href="#customizing-tpots-operators-and-parameters">Customizing TPOT's operators and parameters</a></li>


<li class="toctree-l2"><a href="#template-option-in-tpot">Template option in TPOT</a></li>


<li class="toctree-l2"><a href="#featuresetselector-in-tpot">FeatureSetSelector in TPOT</a></li>


<li class="toctree-l2"><a href="#pipeline-caching-in-tpot">Pipeline caching in TPOT</a></li>


Expand Down Expand Up @@ -367,7 +373,7 @@ <h1 id="tpot-on-the-command-line">TPOT on the command line</h1>
<td>Any positive integer or -1</td>
<td>Number of CPUs for evaluating pipelines in parallel during the TPOT optimization process.
<br /><br />
Assigning this to -1 will use as many cores as available on the computer.</td>
Assigning this to -1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.</td>
</tr>
<tr>
<td>-maxtime</td>
Expand Down Expand Up @@ -409,6 +415,15 @@ <h1 id="tpot-on-the-command-line">TPOT on the command line</h1>
</td>
</tr>
<tr>
<td>-template</td>
<td>TEMPLATE</td>
<td>String</td>
<td>Template of predefined pipeline structure. The option is for specifying a desired structure for the machine learning pipeline evaluated in TPOT. So far this option only supports linear pipeline structure. Each step in the pipeline should be a main class of operators (Selector, Transformer, Classifier or Regressor) or a specific operator (e.g. `SelectPercentile`) defined in TPOT operator configuration. If one step is a main class, TPOT will randomly assign all subclass operators (subclasses of [`SelectorMixin`](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17), [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html), [`ClassifierMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.ClassifierMixin.html) or [`RegressorMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.RegressorMixin.html) in scikit-learn) to that step. Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Classifier". By default value of template is "RandomTree", TPOT generates tree-based pipeline randomly.

See the <a href="../using/#template-option-in-tpot"> template option in tpot</a> section for more details.
</td>
</tr>
<tr>
<td>-memory</td>
<td>MEMORY</td>
<td>String or file path</td>
Expand Down Expand Up @@ -641,6 +656,41 @@ <h1 id="customizing-tpots-operators-and-parameters">Customizing TPOT's operators
<p>When using the command-line interface, the configuration file specified in the <code>-config</code> parameter <em>must</em> name its custom TPOT configuration <code>tpot_config</code>. Otherwise, TPOT will not be able to locate the configuration dictionary.</p>
<p>For more detailed examples of how to customize TPOT's operator configuration, see the default configurations for <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier.py">classification</a> and <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/regressor.py">regression</a> in TPOT's source code.</p>
<p>Note that you must have all of the corresponding packages for the operators installed on your computer, otherwise TPOT will not be able to use them. For example, if XGBoost is not installed on your computer, then TPOT will simply not import nor use XGBoost in the pipelines it considers.</p>
<h1 id="template-option-in-tpot">Template option in TPOT</h1>
<p>Template option provides a way to specify a desired structure for machine learning pipeline, which may reduce TPOT computation time and potentially provide more interpretable results. Current implementation only supports linear pipelines.</p>
<p>Below is a simple example to use <code>template</code> option. The pipelines generated/evaluated in TPOT will follow this structure: 1st step is a feature selector (a subclass of <a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/base.py#L17"><code>SelectorMixin</code></a>), 2nd step is a feature transformer (a subclass of <a href="https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html"><code>TransformerMixin</code></a>) and 3rd step is a classifier for classification (a subclass of <a href="https://scikit-learn.org/stable/modules/generated/sklearn.base.ClassifierMixin.html"><code>ClassifierMixin</code></a>). The last step must be <code>Classifier</code> for <code>TPOTClassifier</code>'s template but <code>Regressor</code> for <code>TPOTRegressor</code>. <strong>Note: although <code>SelectorMixin</code> is subclass of <code>TransformerMixin</code> in scikit-leawrn, but <code>Transformer</code> in this option excludes those subclasses of <code>SelectorMixin</code>.</strong></p>
<pre><code class="Python">tpot_obj = TPOTClassifier(
template='Selector-Transformer-Classifier'
)
</code></pre>

<p>If a specific operator, e.g. <code>SelectPercentile</code>, is prefered to used in the 1st step of pipeline, the template can be defined like 'SelectPercentile-Transformer-Classifier'.</p>
<h1 id="featuresetselector-in-tpot">FeatureSetSelector in TPOT</h1>
<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
<p>Please check our <a href="https://www.biorxiv.org/content/10.1101/502484v1.article-info">preprint paper</a> for more details.</p>
<pre><code class="Python">from tpot import TPOTClassifier
import numpy as np
import pandas as pd
from tpot.config import classifier_config_dict
test_data = pd.read_csv(&quot;https://raw.githubusercontent.com/EpistasisLab/tpot/master/tests/tests.csv&quot;)
test_X = test_data.drop(&quot;class&quot;, axis=1)
test_y = test_data['class']

# add FeatureSetSelector into tpot configuration
classifier_config_dict['tpot.builtins.FeatureSetSelector'] = {
'subset_list': ['https://raw.githubusercontent.com/EpistasisLab/tpot/master/tests/subset_test.csv'],
'sel_subset': [0,1] # select only one feature set, a list of index of subset in the list above
#'sel_subset': list(combinations(range(3), 2)) # select two feature sets
}


tpot = TPOTClassifier(generations=5,
population_size=50, verbosity=2,
template='FeatureSetSelector-Transformer-Classifier',
config_dict=classifier_config_dict)
tpot.fit(test_X, test_y)
</code></pre>

<h1 id="pipeline-caching-in-tpot">Pipeline caching in TPOT</h1>
<p>With the <code>memory</code> parameter, pipelines can cache the results of each transformer after fitting them. This feature is used to avoid repeated computation by transformers within a pipeline if the parameters and input data are identical to another fitted pipeline during optimization process. TPOT allows users to specify a custom directory path or <a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/externals/joblib/memory.py#L847"><code>sklearn.external.joblib.Memory</code></a> in case they want to re-use the memory cache in future TPOT runs (or a <code>warm_start</code> run).</p>
<p>There are three methods for enabling memory caching in TPOT:</p>
Expand Down Expand Up @@ -684,8 +734,8 @@ <h1 id="parallel-training-with-dask">Parallel Training with Dask</h1>
<p>For large problems or working on Jupyter notebook, we highly recommend that you can distribute the work on a <a href="http://dask.pydata.org/en/latest/">Dask</a> cluster.
The <a href="https://mybinder.org/v2/gh/dask/dask-examples/master?filepath=machine-learning%2Ftpot.ipynb">dask-examples binder</a> has a runnable example
with a small dask cluster.</p>
<p>To use your Dask cluster to fit a TPOT model, specify the <code>use_dask</code> keyword when you create the TPOT estimator. <strong>Note: if <code>use_dask=True</code>, TPOT will use as many cores as available on the your Dask cluster regardless of whether <code>n_jobs</code> is specified.</strong></p>
<pre><code class="python">estimator = TPOTEstimator(use_dask=True)
<p>To use your Dask cluster to fit a TPOT model, specify the <code>use_dask</code> keyword when you create the TPOT estimator. <strong>Note: if <code>use_dask=True</code>, TPOT will use as many cores as available on the your Dask cluster. If <code>n_jobs</code> is specified, then it will control the chunk size (10*<code>n_jobs</code> if it is less then offspring size) of parallel training. </strong></p>
<pre><code class="python">estimator = TPOTEstimator(use_dask=True, n_jobs=-1)
</code></pre>

<p>This will use use all the workers on your cluster to do the training, and use <a href="https://dask-ml.readthedocs.io/en/latest/hyper-parameter-search.html#avoid-repeated-work">Dask-ML's pipeline rewriting</a> to avoid re-fitting estimators multiple times on the same set of data.
Expand Down
Loading