Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallelization of feature calculation per kind #35

Closed
wants to merge 14 commits into from

Conversation

jneuff
Copy link
Collaborator

@jneuff jneuff commented Nov 3, 2016

If there are several kinds of time series, their features are calculated in parallel using a process pool.
Standard behavior will be one process per cpu. This setting can be overwritten in the FeatureExtractionSettings object provided to extract_features.

@MaxBenChrist
Copy link
Collaborator

Have you checked the maximal memory usage? Are the time series container per kind duplicated?

@jneuff
Copy link
Collaborator Author

jneuff commented Nov 4, 2016

The maximal memory usage depends on the size of the input as the time series containers per kind (which we internally store in one dict) are copied to the worker processes.
So the memory usage is higher than before. I will add instructions to set the field processes in FeatureExtractionSettings to 1 in a low memory environment.

It seems that the Travis jobs stall at some point. All tests run fine on my machine. It might be a memory issue. I will investigate this further and update the PR.

@MaxBenChrist
Copy link
Collaborator

I looked into the process and memory issues. Travis only gives you 3GB of memory and 2 cores. The combination of parallelizing the individual tests and then parallelizing the different p-value calculations could cause travis to time out.

We could use just one joblib process if the environment variable "TRAVIS" is set.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 28ff4c2 on issues-4 into 06dd31a on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 3310e78 on issues-4 into 06dd31a on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 3310e78 on issues-4 into 06dd31a on master.

@MaxBenChrist
Copy link
Collaborator

pytest xdist was set to 4 slaves, maybe too much for 3 GB of ram and 2 cores. I set it to auto.

@coveralls
Copy link

coveralls commented Nov 12, 2016

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling d2cc5f7 on issues-4 into 06dd31a on master.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Nov 12, 2016

finally it is passing... so we should only deploy xdist with 2 slaves. I would prefer to use --n auto in setup.cfg and then change this line in the travis.yml

further we have to look at that coverage

@coveralls
Copy link

coveralls commented Nov 12, 2016

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 3d5c11f on issues-4 into 06dd31a on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 3d5c11f on issues-4 into 06dd31a on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling c28c3a6 on issues-4 into 06dd31a on master.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Nov 12, 2016

I don't understand travis behavior at this point. It stalls at random unit tests. If I reduce the number of xdist worker nodes to 1 or 2 it passes on python 3.5.2 but fails on python 2.7. The error I get is

"No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself."

I can not reproduce any of the errors locally. I will have to install a local travis docker image and investigate further.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling 9b52448 on issues-4 into 06dd31a on master.

@coveralls
Copy link

coveralls commented Nov 13, 2016

Coverage Status

Coverage decreased (-9.8%) to 83.747% when pulling d8a7f34 on issues-4 into 06dd31a on master.

@jneuff
Copy link
Collaborator Author

jneuff commented Nov 13, 2016

I couldn't reproduce the tests stalling in a similar sized vm with the same operating system as Travis. With docker I have yet to try.

Looking around a bit I found some travis file containing:

# The next couple lines fix a crash with multiprocessing on Travis and are not specific to using Miniconda
  - sudo rm -rf /dev/shm
  - sudo ln -s /run/shm /dev/shm

Maybe they ran into the same issue. However, I'd rather not switch to a sudo Travis image.

We could just set the default number of processes to 1 and leave it at that. An alternative would be to set an environment variable in the travis file and set the number of processes accordingly.

Regarding the unit test for parallel extraction: In my opinion we don't need it. We rely on multiprocessing.Pool to do its job and I trust they do test to ensure it works as expected.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Nov 13, 2016

We could just set the default number of processes to 1 and leave it at that. An alternative would be to set an environment variable in the travis file and set the number of processes accordingly.

The thing is, sometimes not the unit tests where the Pool method is called are stalling.

I also tried to reduce the number of processes to 1 this, see commit 3cc9d56. The testing still failed :-/

Regarding the unit test for parallel extraction: In my opinion we don't need it. We rely on multiprocessing.Pool to do its job and I trust they do test to ensure it works as expected.

We should at least test the aggregating of the different jobs for the different types of time series. Maybe we can mock the library?

@MaxBenChrist
Copy link
Collaborator

I think we have to debug this in a docker box.

Here is an explanation how to setup a docker travis box
https://docs.travis-ci.com/user/common-build-problems/#Troubleshooting-Locally-in-a-Docker-Image

@MaxBenChrist
Copy link
Collaborator

Travis is weird. If I repeat the branch Testing with Python2.7 and Python3.5, Py2.7 fails. If I then restart just the python 2.7 job, it passes.

Anyway. I turned off xdist and joblib and got a memory allocation error. So it seems that indeed memory is the issue here

@MaxBenChrist MaxBenChrist removed their assignment Nov 21, 2016
@nils-braun nils-braun mentioned this pull request Nov 24, 2016
@coveralls
Copy link

coveralls commented Nov 25, 2016

Coverage Status

Coverage decreased (-8.7%) to 85.571% when pulling 1c24fec on issues-4 into aa42083 on master.

@MaxBenChrist
Copy link
Collaborator

Done in 80803d6

@nils-braun nils-braun deleted the issues-4 branch December 14, 2016 21:50
@nils-braun nils-braun added this to the Speed milestone Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants