Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing __name__ = '__main__' guard for Multiprocessing on windows #185

Closed
ShahuN-107 opened this issue Apr 2, 2017 · 19 comments
Closed
Labels

Comments

@ShahuN-107
Copy link

ShahuN-107 commented Apr 2, 2017

Hi All,
I've got the following problem:

  1. Windows 7: Ultimate
  2. tsfresh==0.7.0
  3. The data on which the problem occurred: CV_50_100.csv
    (have many more similar, but just uploading one)
    CV_50_100.zip
from tsfresh import extract_features
import pandas as pd

df = pd.read_csv('CV_50_100.csv')

feat = extract_features(df, column_id='T1')

Also breaks with:

from tsfresh import extract_features
import pandas as pd

df = pd.read_csv('CV_50_100.csv')

feat = extract_features(df, column_id='T1', column_sort='Timestamp')

I've spoken to @ MaxBenChrist on Gitter, he suggested opening this.

Edit: Typo in tsfresh version.

@ShahuN-107
Copy link
Author

As it's a very long error, I decided to post it in a separate comment (so you can delete it if not needed):

Feature Extraction: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "", line 1, in
run_name="mp_main")
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 115, in extract_features
run_name="mp_main")column_id, column_value)

File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 152, in _extract_features_parallel_per_kind
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
context=self.get_context())
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in init
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
self._repopulate_pool()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 233, in _repopulate_pool
exec(code, run_globals)
File "C:\Shaun CSC\evertbase2\tstest.py", line 6, in
feat = extract_features(df, column_id='T1', column_sort='Timestamp')
File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 115, in extract_features
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
column_id, column_value)
self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 152, in _extract_features_parallel_per_kind

File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
pool = Pool(settings.n_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
context=self.get_context())prep_data = spawn.get_preparation_data(process_obj._name)

File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 168, in init
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
self._repopulate_pool()

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Apr 2, 2017

This looks like a windows error related to the parallelization. can you try to run the same snippet on a linux or mac os machine?

I do not have access to any windows machine, so I can not debug this.

@MaxBenChrist
Copy link
Collaborator

Your first snippet is causing error because tsfresh thinks that the time stamp column is a time series columns and is expecting floats instead of time stamps.

However the second one is passing.

@MaxBenChrist
Copy link
Collaborator

I don't know if @jneuff or @nils-braun have a windows machine at their hand but I doubt it :D :D

@ShahuN-107
Copy link
Author

I have tried this on a Windows 10 machine with the same results.

Thanks in advance,
Shaun

@MaxBenChrist MaxBenChrist changed the title extract features error Multiprocessing on windows Apr 3, 2017
@moritzgelb
Copy link
Contributor

Hi @ShahuN-107,

finally I succeeded in setting up a windows environment. :D

The solution for your problem seems rather simple as explained here.
Just change your script to:

from tsfresh import extract_features
import pandas as pd

if __name__ == '__main__':
    df = pd.read_csv('CV_50_100.csv')
    feat = extract_features(df, column_id='T1')

Nevertheless, there is a failure when converting string to float, but this is not related to this issue.

Cheers,
Moritz

@MaxBenChrist
Copy link
Collaborator

Thanks @moritzgelb, so you are now the tsfresh expert for windows? :D

@moritzgelb moritzgelb self-assigned this Apr 6, 2017
@moritzgelb
Copy link
Contributor

Yes, seems so. :D

@MaxBenChrist MaxBenChrist changed the title Multiprocessing on windows Missing __name__ = '__main__' guard for Multiprocessing on windows Apr 6, 2017
@MaxBenChrist
Copy link
Collaborator

I think we should fix that globally:

See those threads

http://stackoverflow.com/questions/29690091/python2-7-exception-the-freeze-support-line-can-be-omitted-if-the-program

http://stackoverflow.com/questions/39468658/figure-out-if-called-from-function-without-main-guard

So, the multiprocessing library is spawning infinite child processes in a loop in windows. We should be able to catch that with a name = 'main' guard somewhere. However, I still have to think about where to put that guard. Maybe you got some ideas @moritzgelb @jneuff @nils-braun

@MaxBenChrist MaxBenChrist reopened this Apr 6, 2017
@moritzgelb
Copy link
Contributor

moritzgelb commented Apr 6, 2017

@MaxBenChrist

I'm not sure if we should take care of this. As stated in the links you quoted, the multiprocessing failure on window can be avoided by using if __name__ == '__main__' in the script importing the tsfresh functions.
And it's now also mentioned in the FAQ how to fix this.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Apr 6, 2017

I think the user experience suffers if one has to wrap the tsfresh calls by the if __name__ == '__main__' guard. We should try to do it internally in tsfresh

@nils-braun
Copy link
Collaborator

I totally agree Max, that the user experience suffers, but as far as I have understood it is just technically not possible to do this on the library level. The script that calls extract must handle this - but this script is written by the user and not us.

@moritzgelb
Copy link
Contributor

@MaxBenChrist
I suggest to close this issue, since the user should take care of this, as pointed out by nils.

@MaxBenChrist
Copy link
Collaborator

MaxBenChrist commented Apr 17, 2017

Okay, I understand that a name == __main__ guard look needs to be placed in the top level script. So the user has to add it.

Maybe we can inspect the trace inside extract_features to prevent a flood of jobs to spawn? I will read into that

So let us keep this issue open until we have a technical argumentation why it is impossible to substitute the guard lock in the top level script

@MaxBenChrist
Copy link
Collaborator

guys, what do you think of having a check when tsfresh is imported and trigger a warning if windows is detected?

In this warning we can recommend the main lock.

@jscheithe
Copy link

jscheithe commented Mar 6, 2018

Hey there,

just to let you know: I just spent half a day trying to fix this for my case.

Although this is not an issue of this package, I think it's important to mention it in the documentation.

My solution: Put everything in the file you're running within an if __name__ == __main__: check.
(including all imports)
And maybe add a call to multiprocessing.freeze_support() right after the check, too (it seems to depend on your actual machine whether you need this or not).

This worked for me, although not via IPython console, only via command line.

@nils-braun
Copy link
Collaborator

It is written in the FAQs. If this is still a problem for users and we need to make it more clear, feel free to reopen.

@sronilsson
Copy link

sronilsson commented Mar 13, 2021

Got this error on macOS, conda, python3.8 (with or without main) - works in python3.6 though

@assiswagner
Copy link

Got the same error on Windows 10 even using Anaconda. The error is related to multiprocessing.
When I'm using WSL (Windows Subsystem for Linux) runs perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants