Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Implement n_jobs=-2 like scikit-learn #817

Open
mendel5 opened this issue Feb 26, 2021 · 4 comments · May be fixed by #993
Open

[Feature request] Implement n_jobs=-2 like scikit-learn #817

mendel5 opened this issue Feb 26, 2021 · 4 comments · May be fixed by #993

Comments

@mendel5
Copy link
Contributor

mendel5 commented Feb 26, 2021

When working with sklearn (scikit-learn) I am used to setting the parameter n_jobs=-2. As explained at https://scikit-learn.org/stable/glossary.html#term-n_jobs this means:

n_jobs is an integer, specifying the maximum number of concurrently running workers.
If 1 is given, no joblib parallelism is used at all, which is useful for debugging.
If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used.
For example with n_jobs=-2, all CPUs but one are used.

When I set the parameter n_jobs=-2 in the extract_features() function I get an error: ValueError: Number of processes must be at least 1.

If tsfresh would be able to accept the parameter n_jobs=-2 it would be possible to write code for different kinds of CPUs and tell tsfresh "use all CPU cores except for one core". Therefore the code adapts to the CPU it's running on which might be an older Intel 4-core CPU or a newer Ryzen 8, 12 or 16-core CPU.

@nils-braun
Copy link
Collaborator

That is a very good suggestion! Would you like to do a PR?
It is mostly used in the MultiprocessingDistributor and in the calculate_relevance_table (and probably a bunch of docstrings).

@mendel5
Copy link
Contributor Author

mendel5 commented Feb 27, 2021

Would you like to do a PR?

I can try it. However it might take some weeks because I'm quite busy right now.

MultiprocessingDistributor

Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/utilities/distribution.py#L401?

calculate_relevance_table

Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_selection/relevance.py#L31?

and probably a bunch of docstrings

A grep search over the full repo returns this:

$ grep -rni "n_jobs"
docs/text/tsfresh_on_a_cluster.rst:27:`n_jobs`. This field defaults to
docs/text/tsfresh_on_a_cluster.rst:46:`n_jobs` and `chunksize`. Both behave analogue to the parameters
docs/text/tsfresh_on_a_cluster.rst:50:setting the parameter `n_jobs` to 0.
docs/text/tsfresh_on_a_cluster.rst:133:                         n_jobs=4)
tsfresh/feature_selection/relevance.py:37:    n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/relevance.py:131:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/relevance.py:132:    :type n_jobs: int
tsfresh/feature_selection/relevance.py:195:        if n_jobs == 0:
tsfresh/feature_selection/relevance.py:199:                processes=n_jobs,
tsfresh/feature_selection/relevance.py:230:            if n_jobs != 0:
tsfresh/feature_selection/relevance.py:297:        if n_jobs != 0:
tsfresh/feature_selection/selection.py:25:    n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/selection.py:110:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py:111:    :type n_jobs: int
tsfresh/feature_selection/selection.py:170:        n_jobs=n_jobs,
tsfresh/transformers/feature_selector.py:68:        n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/feature_selector.py:101:        :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_selector.py:102:        :type n_jobs: int
tsfresh/transformers/feature_selector.py:144:        self.n_jobs = n_jobs
tsfresh/transformers/feature_selector.py:180:            n_jobs=self.n_jobs,
tsfresh/transformers/feature_augmenter.py:67:                 n_jobs=tsfresh.defaults.N_PROCESSES, show_warnings=tsfresh.defaults.SHOW_WARNINGS,
tsfresh/transformers/feature_augmenter.py:96:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_augmenter.py:97:        :type n_jobs: int
tsfresh/transformers/feature_augmenter.py:136:        self.n_jobs = n_jobs
tsfresh/transformers/feature_augmenter.py:205:                                              n_jobs=self.n_jobs, show_warnings=self.show_warnings,
tsfresh/transformers/relevant_feature_augmenter.py:96:        n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/relevant_feature_augmenter.py:150:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/relevant_feature_augmenter.py:151:        :type n_jobs: int
tsfresh/transformers/relevant_feature_augmenter.py:223:        self.n_jobs = n_jobs
tsfresh/transformers/relevant_feature_augmenter.py:325:                                                      n_jobs=self.feature_extractor.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:395:            n_jobs=self.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:410:            n_jobs=self.n_jobs,
tsfresh/convenience/relevant_extraction.py:27:                              n_jobs=defaults.N_PROCESSES,
tsfresh/convenience/relevant_extraction.py:89:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/convenience/relevant_extraction.py:90:    :type n_jobs: int
tsfresh/convenience/relevant_extraction.py:168:                             n_jobs=n_jobs,
tsfresh/convenience/relevant_extraction.py:180:                            n_jobs=n_jobs,
tsfresh/scripts/measure_execution_time.py:46:    n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:59:        extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:70:            "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:84:    n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:96:        extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:103:            "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:121:                                     n_jobs=job,
tsfresh/scripts/measure_execution_time.py:125:                                     n_jobs=job,
tsfresh/scripts/measure_execution_time.py:133:                        n_jobs=job,
tsfresh/scripts/measure_execution_time.py:142:                            n_jobs=job,
tsfresh/feature_extraction/extraction.py:30:                     n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/feature_extraction/extraction.py:91:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:92:    :type n_jobs: int
tsfresh/feature_extraction/extraction.py:155:                                n_jobs=n_jobs, chunk_size=chunksize,
tsfresh/feature_extraction/extraction.py:177:                   n_jobs, chunk_size, disable_progressbar, show_warnings, distributor,
tsfresh/feature_extraction/extraction.py:214:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:215:    :type n_jobs: int
tsfresh/feature_extraction/extraction.py:235:            if n_jobs == 0:
tsfresh/feature_extraction/extraction.py:239:                distributor = MultiprocessingDistributor(n_workers=n_jobs,
tsfresh/utilities/dataframe_functions.py:315:                     n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/utilities/dataframe_functions.py:374:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py:375:    :type n_jobs: int
tsfresh/utilities/dataframe_functions.py:416:                                      n_jobs=n_jobs,
tsfresh/utilities/dataframe_functions.py:478:        if n_jobs == 0:
tsfresh/utilities/dataframe_functions.py:482:            distributor = MultiprocessingDistributor(n_workers=n_jobs,
notebooks/advanced/compare-runtimes-of-feature-calculators.ipynb:173:    "                                                n_jobs=0, \n",
tests/benchmark.py:28:    benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:35:    benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:43:    benchmark(extract_relevant_features, df, y, column_id="id", column_sort="time", n_jobs=0,
tests/units/feature_selection/test_relevance.py:84:        relevance_table = calculate_relevance_table(X, y_binary, n_jobs=0)
tests/units/feature_selection/test_relevance.py:103:        relevance_table = calculate_relevance_table(X, y_real, n_jobs=0)
tests/units/feature_selection/test_relevance.py:138:                X, y_real, n_jobs=0, ml_task="regression", show_warnings=True
tests/units/transformers/test_feature_augmenter.py:24:                                     n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:60:                                     n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:87:                                     n_jobs=0,
tests/units/feature_extraction/test_extraction.py:22:        self.n_jobs = 1
tests/units/feature_extraction/test_extraction.py:30:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:44:                                                  n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:54:                                              column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:121:                                              n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:126:                                                          n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:140:        X = extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:152:        extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:164:                             n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:173:                                             n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:177:                                           n_jobs=0)
tests/units/feature_extraction/test_extraction.py:188:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:210:        self.n_jobs = 2
tests/units/feature_extraction/test_extraction.py:226:                                              n_jobs=self.n_jobs)
tests/units/feature_extraction/test_settings.py:59:                                 n_jobs=0)
tests/units/feature_extraction/test_settings.py:64:                                 n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:23:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:29:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:34:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:40:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:45:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:50:                          rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:55:                          rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:62:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:68:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:75:                          rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:114:                                                  column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:122:                                                  max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:130:                                                  max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:154:                                                  max_timeshift=2, min_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:208:                                                  column_kind=None, rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:216:                                                  max_timeshift=None, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:224:                                                  max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:247:                                                  max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:272:                                                  max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:298:                                                  min_timeshift=2, max_timeshift=3, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:348:                                                  column_kind=None, rolling_direction=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:368:                                                  column_kind=None, rolling_direction=-2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:404:                                                  column_kind="kind", rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:427:                                                  rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:477:                                                  rolling_direction=-1, max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:571:                                                 column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:623:                                                  column_kind=None, rolling_direction=1, n_jobs=0)

@nils-braun
Copy link
Collaborator

nils-braun commented Feb 27, 2021

I can try it. However it might take some weeks because I'm quite busy right now.

That would be awesome! If this is not fast enough for you, I can also try to have a look - but more contributors is always better :-)

Do you mean this one:

Yes and yes. Sorry, I was on the smartphone - thanks for providing the links. These two code parts are basically the only two where the n_jobs is actually used (the rest just passes it).

A grep search over the full repo returns this:

Here are the docstrings that one would need to fix (the rest is not relevant, as only variables are passed).

tsfresh/convenience/relevant_extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_selection/relevance.py:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py:    :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_augmenter.py:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_selector.py:        :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/relevant_feature_augmenter.py:        :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py:    :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.

@nils-braun
Copy link
Collaborator

You could have a look into https://github.com/blue-yonder/tsfresh/pull/852/files to get some starter :-)

@stergion stergion linked a pull request Dec 22, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants