Threads are restarted over and over in tsfresh.extract_features if using multiprocessing #364

jscheithe · 2018-02-28T13:11:42Z

Hi there,

first of all, thanks for this package, I'm using it very happily!

Since yesterday, I can't run tsfresh.extract_features and tsfresh.select_features with n_jobs > 1:

When using IPython, the command line status bar stays at 0% forever. When I look in the Task Manager I see, that the processes are started, run for about 1 second, then die and are restarted over and over.
When I execute the script with python from the command line I get a RuntimeError as pasted below.

The DataFrame I'm passing looks like this:

            LOAD         AX    ...    trial_id
597.0   8.894621  -1.000000    ...         302
598.0   8.546521   0.000000    ...         302
.
.
.
234.0   8.123546   1.234567    ...         303
.
.
.
[57570 rows x 25 columns]

I'm calling tsfresh.extract_features(timeseries, column_id='trial_id', n_jobs=2) and each sub-frame with the same trial_id has the same shape: [57 rows x 25 columns]

I wish I knew what caused the error, this used to work until yesterday on my machine (Windows 7 x64).
Also, other scripts using the package multiprocessing work fine.

I am glad about any help. Thanks!

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 105, in spawn_
main
    exitcode = _main(fd)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 225, in prepar
e
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 277, in _fixup
_main_from_path
    run_name="__mp_main__")
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "T:\AR\Studenten\Studenten_2017\Jakob_Scheithe\python-projects\bl3data-package\bl3data\trialdb.py", line 489, in
<module>
    TDB = TrialDB(data_dict=full_data_dict)
  File "T:\AR\Studenten\Studenten_2017\Jakob_Scheithe\python-projects\bl3data-package\bl3data\trialdb.py", line 117, in
__init__
    self.add_data_dict(data_dict)
  File "T:\AR\Studenten\Studenten_2017\Jakob_Scheithe\python-projects\bl3data-package\bl3data\trialdb.py", line 152, in
add_data_dict
    self.add_tsfresh_features()
  File "T:\AR\Studenten\Studenten_2017\Jakob_Scheithe\python-projects\bl3data-package\bl3data\trialdb.py", line 239, in
add_tsfresh_features
    tsff = fs.tsfresh_features(self)
  File "T:\AR\Studenten\Studenten_2017\Jakob_Scheithe\python-projects\bl3data-package\bl3data\feature_selection.py", lin
e 87, in tsfresh_features
    extracted_features = tsfresh.extract_features(timeseries, column_id='trial_id', n_jobs=2)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\site-packages\tsfresh\feature_extraction\extra
ction.py", line 152, in extract_features
    distributor=distributor)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\site-packages\tsfresh\feature_extraction\extra
ction.py", line 226, in _do_extraction
    progressbar_title="Feature Extraction")
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\site-packages\tsfresh\utilities\distribution.p
y", line 341, in __init__
    self.pool = Pool(processes=n_workers)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\context.py", line 119, in Pool

    context=self.get_context())
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\pool.py", line 174, in __init_
_
    self._repopulate_pool()
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\pool.py", line 239, in _repopu
late_pool
    w.start()
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\process.py", line 105, in star
t
    self._popen = self._Popen(self)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\context.py", line 322, in _Pop
en
    return Popen(process_obj)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\popen_spawn_win32.py", line 33
, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 143, in get_pr
eparation_data
    _check_not_importing_main()
  File "C:\Users\SCHEITHE\AppData\Local\Continuum\miniconda3\envs\p36\lib\multiprocessing\spawn.py", line 136, in _check
_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The text was updated successfully, but these errors were encountered:

jscheithe · 2018-03-01T13:03:16Z

I just encountered this problem unrelated to tsfresh.

So most likely this is not a tsfresh problem. Sorry for this!

TheYoxy · 2018-03-29T14:31:59Z

Did you find what was causing this issues?
I'm having the same one :/

MaxBenChrist · 2018-03-31T11:32:35Z

Hard to judge for us what causes this problem. Maybe @jscheithe can clarify how he fixed the issue .

just a guess, did you try to start the feature extraction inside a pool worker?

jscheithe · 2018-04-06T13:15:03Z

Yes i did! Sorry I didn't update ..

This is the same as #185. There I wrote:

My solution: Put everything in the file you're running within an if __name__ == __main__: check. (including all imports)
And maybe add a call to multiprocessing.freeze_support() right after the check, too (it seems to depend on your actual machine whether you need this or not)

Also, even with the guard, the problem occurs when I start the script from an IPython console. On my machine, I actually have to start all scripts that involve multiprocessing from command line ...

directnirvana · 2018-04-07T07:04:44Z

I am getting the same error, and it appears to be fixed with the if __name__ == __main__: as well. However, I am attempting to write a module for other programs to import, in those cases, of course, __name__ does not equal __main__. I've tried __name__ == moduleName: but that doesn't seem to have the same protective effect. Are there any other suggestions on how to avoid this? Perhaps disabling the multiprocessing to avoid the spawning of the child loops?

jscheithe · 2018-04-09T16:22:03Z

This should only happen if your module contains direct calls to a function that uses multiprocessing (or imports a module that does).

If so, you can consider rewriting something like this:

import tsfresh
...
tsfresh.extract_features(..)

to something like this:

import tsfresh
...
def my_extraction():
    return tsfresh.extract_features(..)

I don't exactly understand why, but for me it also helped to

if __name__ == __main__:
    import mymodule

in the script I was running.

I'm not sure this is correct, so maybe someone who understands this better can be of more help.

MaxBenChrist · 2018-04-09T16:39:19Z

Unfortunately, this is a windows specific issue within the multiprocessing library, see #185

From what I understand, there is nothing what we can do inside the tsfresh library to catch this

jscheithe · 2018-04-10T11:54:13Z

However, I think it is important to mention this in the documentation and provide some workaround(s).

This can be a really frustrating one, especially if you're new to multiprocessing (on windows).

MaxBenChrist · 2018-04-10T12:04:04Z

We touch this topic in the faq http://tsfresh.readthedocs.io/en/latest/text/faq.html

Where would you expect this information as a user?

jscheithe · 2018-04-12T10:32:46Z

E.g. for tsfresh.feature_extraction.extraction.extract_features, the standard value for parameter n_jobs is 2.

So I think there are two options:

Change the standard value to 1 and warn in http://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html, that changing the parameter might be problematic on windows (but in general is a good idea!)
Keep n_jobs=2 as standard and warn about the problematic behaviour right in the quickstart section (and in the API reference).

I think I'd prefer the second one.
However, this is just my viewpoint as a rather unexperienced user.

directnirvana · 2018-05-15T07:21:51Z

Thanks for the responses, I do see that the FAQ does mention the issue now, and I supposes that this is where I would mention the problem as well. I think it is worth considering @jscheithe 's suggestions, or perhaps in the FAQ mention the problem explicitly as a Windows problem, as it may not be obvious to novice users such as myself that we are using any multiprocessing features with tsfresh.

tccf1109 · 2022-02-10T15:36:50Z

Hello guys, I am a novice using tsfresh. Nonetheless I got the same problem as described before, and effectively I can resolve it with if name == "main": .

For my use case I want to call a method that based on dataframe input can extract features and then return the dataframe with extracted features.

I've tried the following, where in .py file I put the if name == "main": with all the imports and then a function I want to use. But in this case I cannot call the method "AttributeError: module 'feature_extraction.FirstTest' has no attribute 'my_extraction'"

if __name__ == "__main__":
    import pandas as pd
    from tsfresh import extract_features
    from tsfresh.examples.robot_execution_failures import (
        download_robot_execution_failures,
        load_robot_execution_failures,
    )
    from tsfresh.feature_extraction import MinimalFCParameters

    def my_extraction():
        # Download the dataset
        download_robot_execution_failures()

        # Load the dataset
        timeseries, y = load_robot_execution_failures()

        # Extract just the first time series from the dataset
        pattern = timeseries.loc[timeseries["id"] == 1]

        # Remove all variables other than the minimum
        pattern = pattern.filter(items=["id", "time", "F_z"])

        # Only use minimal features
        settings = MinimalFCParameters()
        # Extract the features from the single pattern
        extracted_features = extract_features(
            pattern, column_id="id", column_sort="time", default_fc_parameters=settings
        )

        print(extracted_features)
        return extracted_features

kempa-liehr · 2022-02-10T18:44:15Z

Hi @tccf1109,
All imports and the definition of function my_extraction() have to be located before the if __name__ == "__main__": clause. Then, you can call my_extraction() from within the if-clause.

Cheers,
Andreas

import pandas as pd
from tsfresh import extract_features
from tsfresh.examples.robot_execution_failures import (
        download_robot_execution_failures,
        load_robot_execution_failures,
    )
from tsfresh.feature_extraction import MinimalFCParameters

def my_extraction():
    # Download the dataset
    download_robot_execution_failures()

    # Load the dataset
    timeseries, y = load_robot_execution_failures()

    # Extract just the first time series from the dataset
    pattern = timeseries.loc[timeseries["id"] == 1]

    # Remove all variables other than the minimum
    pattern = pattern.filter(items=["id", "time", "F_z"])

    # Only use minimal features
    settings = MinimalFCParameters()
    # Extract the features from the single pattern
    extracted_features = extract_features(
        pattern, column_id="id", column_sort="time", default_fc_parameters=settings
    )

    print(extracted_features)
    return extracted_features
    
if __name__ == "__main__":
    my_extraction()

jscheithe closed this as completed Mar 1, 2018

nils-braun mentioned this issue Aug 28, 2020

Make sure the main-guard is prominently mentioned in our docu #750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threads are restarted over and over in tsfresh.extract_features if using multiprocessing #364

Threads are restarted over and over in tsfresh.extract_features if using multiprocessing #364

jscheithe commented Feb 28, 2018

jscheithe commented Mar 1, 2018

TheYoxy commented Mar 29, 2018

MaxBenChrist commented Mar 31, 2018 •

edited

jscheithe commented Apr 6, 2018 •

edited

directnirvana commented Apr 7, 2018 •

edited

jscheithe commented Apr 9, 2018 •

edited

MaxBenChrist commented Apr 9, 2018 •

edited

jscheithe commented Apr 10, 2018

MaxBenChrist commented Apr 10, 2018

jscheithe commented Apr 12, 2018 •

edited

directnirvana commented May 15, 2018

tccf1109 commented Feb 10, 2022

kempa-liehr commented Feb 10, 2022

Threads are restarted over and over in tsfresh.extract_features if using multiprocessing #364

Threads are restarted over and over in tsfresh.extract_features if using multiprocessing #364

Comments

jscheithe commented Feb 28, 2018

jscheithe commented Mar 1, 2018

TheYoxy commented Mar 29, 2018

MaxBenChrist commented Mar 31, 2018 • edited

jscheithe commented Apr 6, 2018 • edited

directnirvana commented Apr 7, 2018 • edited

jscheithe commented Apr 9, 2018 • edited

MaxBenChrist commented Apr 9, 2018 • edited

jscheithe commented Apr 10, 2018

MaxBenChrist commented Apr 10, 2018

jscheithe commented Apr 12, 2018 • edited

directnirvana commented May 15, 2018

tccf1109 commented Feb 10, 2022

kempa-liehr commented Feb 10, 2022

MaxBenChrist commented Mar 31, 2018 •

edited

jscheithe commented Apr 6, 2018 •

edited

directnirvana commented Apr 7, 2018 •

edited

jscheithe commented Apr 9, 2018 •

edited

MaxBenChrist commented Apr 9, 2018 •

edited

jscheithe commented Apr 12, 2018 •

edited