Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View and Set N Jobs #1029

Merged
merged 5 commits into from
Aug 3, 2023
Merged

View and Set N Jobs #1029

merged 5 commits into from
Aug 3, 2023

Conversation

DhruvSrikanth
Copy link
Contributor

Reason for PR:

When profiling specific functions, it is often helpful to have more fine-grained control over the N_PROCESSES constant used in multiprocessing.

PR Description:

Getter and setter functions for the N_PROCESSES constant found with the other default constants.

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DhruvSrikanth - the changes loog good, but I might not have understood the motivation for them. As far as I see, all functions that use multiprocessing have an n_jobs setting that would allow you changing the number of jobs used. A quick search through the repo only gave me results where the default is used as a function parameter default setting - so you should be able to overwrite it in any case.
Am I missing something?



class MultiprocessingDistributorTestCase(TestCase):
def test_n_jobs(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure that the default setting (as it was before running this test) is reset after the test? (also if the test failed).
Just to make sure that in case any other test assumes the default value is still set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I can do that. Should I make the changes and open as a new PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add the changes to this PR as they belong together :)

tsfresh/utilities/profiling.py Outdated Show resolved Hide resolved
@DhruvSrikanth
Copy link
Contributor Author

I was not able to pass the n_jobs parameter to some of the feature extractors such as mean, median etc. I wanted to profile specific feature extractors, hence, I thought more control over the default number of processes would be useful to have. Happy to provide a more concrete example as well.

@nils-braun
Copy link
Collaborator

Yes, I might need some more concrete example. Because each specific feature extractor like mean etc. does not do multiprocessing in any case - multiprocessing is just used to process multiple timeseries in parallel. If there is only a single time series, changing the number of jobs will not help :)

@DhruvSrikanth
Copy link
Contributor Author

I see. So each feature extractor uses a single process to perform the computation.

What about the case where multiple features need to be extracted on the same timeseries. Would a single process handle all of the feature extractors or would n_jobs allow you to distribute the feature extraction across multiple processes (even though it is only on a single timeseries)?

Irrespective of that, I believe it would be helpful to have a function to globally set the number of jobs to be used in any computation. I don't see it being more or less helpful than having n_jobs be a parameter (as is already present in the package) however I think it would be helpful to have that functionality for convenience and user preference. That being said I'm open to being convinced otherwise. :)

P.S - Apologies for the late response.

@nils-braun
Copy link
Collaborator

The parallelization is only on the timeseries. So a single timeseries will never do anything in parallel - so multiple features for a single time series are calculated on the same process.

Sure, we can still have the changes in - I just wanted to make sure you actually get what you expected :)

After my comment is in, we can merge!

@DhruvSrikanth
Copy link
Contributor Author

DhruvSrikanth commented Jul 31, 2023

I've incorporated the following changes:

  1. simplified imports
  2. reset n_jobs after running test

Are there any others I should incorporate? @nils-braun

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good!

@codecov-commenter
Copy link

Codecov Report

Patch and project coverage have no change.

Comparison is base (ac413b6) 93.43% compared to head (8cf3bf8) 93.43%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1029   +/-   ##
=======================================
  Coverage   93.43%   93.43%           
=======================================
  Files          20       20           
  Lines        1888     1888           
  Branches      372      372           
=======================================
  Hits         1764     1764           
  Misses         85       85           
  Partials       39       39           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DhruvSrikanth
Copy link
Contributor Author

Thanks!

How can I check if the PR has been merged into main?

@nils-braun nils-braun merged commit f3a6a7c into blue-yonder:main Aug 3, 2023
4 checks passed
@nils-braun
Copy link
Collaborator

I have just done so. You see it mentioned on this PR page.

@DhruvSrikanth
Copy link
Contributor Author

Awesome! Thanks!!

@DhruvSrikanth DhruvSrikanth deleted the parallelset branch August 3, 2023 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants