[Feature request] Implement n_jobs=-2 like scikit-learn #817

mendel5 · 2021-02-26T11:20:41Z

When working with sklearn (scikit-learn) I am used to setting the parameter n_jobs=-2. As explained at https://scikit-learn.org/stable/glossary.html#term-n_jobs this means:

n_jobs is an integer, specifying the maximum number of concurrently running workers.
If 1 is given, no joblib parallelism is used at all, which is useful for debugging.
If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used.
For example with n_jobs=-2, all CPUs but one are used.

When I set the parameter n_jobs=-2 in the extract_features() function I get an error: ValueError: Number of processes must be at least 1.

If tsfresh would be able to accept the parameter n_jobs=-2 it would be possible to write code for different kinds of CPUs and tell tsfresh "use all CPU cores except for one core". Therefore the code adapts to the CPU it's running on which might be an older Intel 4-core CPU or a newer Ryzen 8, 12 or 16-core CPU.

The text was updated successfully, but these errors were encountered:

nils-braun · 2021-02-26T21:00:09Z

That is a very good suggestion! Would you like to do a PR?
It is mostly used in the MultiprocessingDistributor and in the calculate_relevance_table (and probably a bunch of docstrings).

mendel5 · 2021-02-27T13:44:58Z

Would you like to do a PR?

I can try it. However it might take some weeks because I'm quite busy right now.

MultiprocessingDistributor

Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/utilities/distribution.py#L401?

calculate_relevance_table

Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_selection/relevance.py#L31?

and probably a bunch of docstrings

A grep search over the full repo returns this:

$ grep -rni "n_jobs"
docs/text/tsfresh_on_a_cluster.rst:27:`n_jobs`. This field defaults to
docs/text/tsfresh_on_a_cluster.rst:46:`n_jobs` and `chunksize`. Both behave analogue to the parameters
docs/text/tsfresh_on_a_cluster.rst:50:setting the parameter `n_jobs` to 0.
docs/text/tsfresh_on_a_cluster.rst:133:                         n_jobs=4)
tsfresh/feature_selection/relevance.py:37:    n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/relevance.py:131:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/relevance.py:132:    :type n_jobs: int
tsfresh/feature_selection/relevance.py:195:        if n_jobs == 0:
tsfresh/feature_selection/relevance.py:199:                processes=n_jobs,
tsfresh/feature_selection/relevance.py:230:            if n_jobs != 0:
tsfresh/feature_selection/relevance.py:297:        if n_jobs != 0:
tsfresh/feature_selection/selection.py:25:    n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/selection.py:110:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py:111:    :type n_jobs: int
tsfresh/feature_selection/selection.py:170:        n_jobs=n_jobs,
tsfresh/transformers/feature_selector.py:68:        n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/feature_selector.py:101:        :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_selector.py:102:        :type n_jobs: int
tsfresh/transformers/feature_selector.py:144:        self.n_jobs = n_jobs
tsfresh/transformers/feature_selector.py:180:            n_jobs=self.n_jobs,
tsfresh/transformers/feature_augmenter.py:67:                 n_jobs=tsfresh.defaults.N_PROCESSES, show_warnings=tsfresh.defaults.SHOW_WARNINGS,
tsfresh/transformers/feature_augmenter.py:96:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_augmenter.py:97:        :type n_jobs: int
tsfresh/transformers/feature_augmenter.py:136:        self.n_jobs = n_jobs
tsfresh/transformers/feature_augmenter.py:205:                                              n_jobs=self.n_jobs, show_warnings=self.show_warnings,
tsfresh/transformers/relevant_feature_augmenter.py:96:        n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/relevant_feature_augmenter.py:150:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/relevant_feature_augmenter.py:151:        :type n_jobs: int
tsfresh/transformers/relevant_feature_augmenter.py:223:        self.n_jobs = n_jobs
tsfresh/transformers/relevant_feature_augmenter.py:325:                                                      n_jobs=self.feature_extractor.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:395:            n_jobs=self.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:410:            n_jobs=self.n_jobs,
tsfresh/convenience/relevant_extraction.py:27:                              n_jobs=defaults.N_PROCESSES,
tsfresh/convenience/relevant_extraction.py:89:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/convenience/relevant_extraction.py:90:    :type n_jobs: int
tsfresh/convenience/relevant_extraction.py:168:                             n_jobs=n_jobs,
tsfresh/convenience/relevant_extraction.py:180:                            n_jobs=n_jobs,
tsfresh/scripts/measure_execution_time.py:46:    n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:59:        extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:70:            "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:84:    n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:96:        extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:103:            "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:121:                                     n_jobs=job,
tsfresh/scripts/measure_execution_time.py:125:                                     n_jobs=job,
tsfresh/scripts/measure_execution_time.py:133:                        n_jobs=job,
tsfresh/scripts/measure_execution_time.py:142:                            n_jobs=job,
tsfresh/feature_extraction/extraction.py:30:                     n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/feature_extraction/extraction.py:91:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:92:    :type n_jobs: int
tsfresh/feature_extraction/extraction.py:155:                                n_jobs=n_jobs, chunk_size=chunksize,
tsfresh/feature_extraction/extraction.py:177:                   n_jobs, chunk_size, disable_progressbar, show_warnings, distributor,
tsfresh/feature_extraction/extraction.py:214:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:215:    :type n_jobs: int
tsfresh/feature_extraction/extraction.py:235:            if n_jobs == 0:
tsfresh/feature_extraction/extraction.py:239:                distributor = MultiprocessingDistributor(n_workers=n_jobs,
tsfresh/utilities/dataframe_functions.py:315:                     n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/utilities/dataframe_functions.py:374:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py:375:    :type n_jobs: int
tsfresh/utilities/dataframe_functions.py:416:                                      n_jobs=n_jobs,
tsfresh/utilities/dataframe_functions.py:478:        if n_jobs == 0:
tsfresh/utilities/dataframe_functions.py:482:            distributor = MultiprocessingDistributor(n_workers=n_jobs,
notebooks/advanced/compare-runtimes-of-feature-calculators.ipynb:173:    "                                                n_jobs=0, \n",
tests/benchmark.py:28:    benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:35:    benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:43:    benchmark(extract_relevant_features, df, y, column_id="id", column_sort="time", n_jobs=0,
tests/units/feature_selection/test_relevance.py:84:        relevance_table = calculate_relevance_table(X, y_binary, n_jobs=0)
tests/units/feature_selection/test_relevance.py:103:        relevance_table = calculate_relevance_table(X, y_real, n_jobs=0)
tests/units/feature_selection/test_relevance.py:138:                X, y_real, n_jobs=0, ml_task="regression", show_warnings=True
tests/units/transformers/test_feature_augmenter.py:24:                                     n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:60:                                     n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:87:                                     n_jobs=0,
tests/units/feature_extraction/test_extraction.py:22:        self.n_jobs = 1
tests/units/feature_extraction/test_extraction.py:30:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:44:                                                  n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:54:                                              column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:121:                                              n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:126:                                                          n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:140:        X = extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:152:        extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:164:                             n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:173:                                             n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:177:                                           n_jobs=0)
tests/units/feature_extraction/test_extraction.py:188:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:210:        self.n_jobs = 2
tests/units/feature_extraction/test_extraction.py:226:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_settings.py:59:                                 n_jobs=0)
tests/units/feature_extraction/test_settings.py:64:                                 n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:23:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:29:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:34:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:40:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:45:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:50:                          rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:55:                          rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:62:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:68:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:75:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:114:                                                  column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:122:                                                  max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:130:                                                  max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:154:                                                  max_timeshift=2, min_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:208:                                                  column_kind=None, rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:216:                                                  max_timeshift=None, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:224:                                                  max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:247:                                                  max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:272:                                                  max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:298:                                                  min_timeshift=2, max_timeshift=3, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:348:                                                  column_kind=None, rolling_direction=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:368:                                                  column_kind=None, rolling_direction=-2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:404:                                                  column_kind="kind", rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:427:                                                  rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:477:                                                  rolling_direction=-1, max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:571:                                                 column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:623:                                                  column_kind=None, rolling_direction=1, n_jobs=0)

nils-braun · 2021-02-27T16:36:47Z

I can try it. However it might take some weeks because I'm quite busy right now.

That would be awesome! If this is not fast enough for you, I can also try to have a look - but more contributors is always better :-)

Do you mean this one:

Yes and yes. Sorry, I was on the smartphone - thanks for providing the links. These two code parts are basically the only two where the n_jobs is actually used (the rest just passes it).

A grep search over the full repo returns this:

Here are the docstrings that one would need to fix (the rest is not relevant, as only variables are passed).

tsfresh/convenience/relevant_extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_selection/relevance.py:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_augmenter.py:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_selector.py:        :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/relevant_feature_augmenter.py:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.

nils-braun · 2021-05-14T15:46:36Z

You could have a look into https://github.com/blue-yonder/tsfresh/pull/852/files to get some starter :-)

stergion linked a pull request Dec 22, 2022 that will close this issue

Allow n_jobs to receive negative values. #993

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Implement n_jobs=-2 like scikit-learn #817

[Feature request] Implement n_jobs=-2 like scikit-learn #817

mendel5 commented Feb 26, 2021

nils-braun commented Feb 26, 2021

mendel5 commented Feb 27, 2021

nils-braun commented Feb 27, 2021 •

edited

nils-braun commented May 14, 2021

[Feature request] Implement n_jobs=-2 like scikit-learn #817

[Feature request] Implement n_jobs=-2 like scikit-learn #817

Comments

mendel5 commented Feb 26, 2021

nils-braun commented Feb 26, 2021

mendel5 commented Feb 27, 2021

nils-braun commented Feb 27, 2021 • edited

nils-braun commented May 14, 2021

nils-braun commented Feb 27, 2021 •

edited