Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ilastk API: prediction gets stuck when not run in main process #2517

Open
constantinpape opened this issue Jan 4, 2022 · 4 comments · May be fixed by #2604
Open

Ilastk API: prediction gets stuck when not run in main process #2517

constantinpape opened this issue Jan 4, 2022 · 4 comments · May be fixed by #2604
Labels
api ilastik.experimental.api

Comments

@constantinpape
Copy link
Member

constantinpape commented Jan 4, 2022

Describe the bug

Prediction with the ilastik API gets stuck when it's not run in the main process.

To Reproduce

import numpy as np    
from ilastik.experimental.api import from_project_file
from concurrent import futures 
from xarray import DataArray
  
ILP = "vnc2d.ilp"
  
  
def predict_ilp():
    ilp = from_project_file(ILP)    
    inp = DataArray(np.random.rand(1, 128, 128), dims=("z", "y", "x"))
    pred = ilp.predict(inp)
    print("Passed", pred.shape)


def predict_in_thread_or_process(pool):
    with pool(1) as ex:
        task = ex.submit(predict_ilp)
        task.result()


print("Main process")
predict_ilp()
print("In thread")
predict_in_thread_or_process(futures.ThreadPoolExecutor)
print("In process)
predict_in_thread_or_process(futures.ProcessPoolExecutor)

Prediction in the main process and in a thread work, but prediction with the process pool does not terminate.
The project is available here. (But the same issue will most likely occur for any other project).

Why is this needed?

I want to use the ilastik api in a pytorch data loader, which uses multiprocessing (and this can't be changed...).
I assume that this issue might be hard to fix properly due to the whole lazyflow / greenlet stuff.
Maybe a workaround would be to expose the classifier and the feature functionality through the API somehow so that they can be used without running full ilastik inference, e.g. get_classifier (returns the RF) and get_features (either returns a callable that computes the features given raw input or just the filter and sigma values that need to be interpreted by the user.)

Desktop (please complete the following information):

  • ilastik version: current main
  • OS: Ubuntu 20.04
@k-dominik
Copy link
Contributor

k-dominik commented Jan 10, 2022

Thanks for providing the concise example - very easy to reproduce.

The root cause seems to be ilastik/lazyflow creating worker threads once imported. These will not be available to the forked processes.

Currently, there are a couple of workarounds, but I'm not sure if any of those are applicable in your case.

  1. Never import anything from ilastik in your main process
    Your code example will run through if you put the import (from ilastik.experimental.api import from_project_file) into the predict_ilp function, and never call this function from your main process.
  2. Don't fork, but spawn (https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). Your example works for me if I change it as follows (this is noticeably slower than forking):
    import numpy as np
    from multiprocessing import set_start_method
    from concurrent import futures
    from xarray import DataArray
    from ilastik.experimental.api import from_project_file   
    
    ILP = "vnc2d.ilp"
    
    
    def predict_ilp():
        ilp = from_project_file(ILP)
        inp = DataArray(np.random.rand(1, 128, 128), dims=("z", "y", "x"))
        pred = ilp.predict(inp)
        print("Passed", pred.shape)
    
    
    def predict_in_thread_or_process(pool):
        with pool(1) as ex:
            task = ex.submit(predict_ilp)
            task.result()
    
    
    if __name__ == "__main__":
        set_start_method("spawn")
    
        print("Main process")
        predict_ilp()
        print("In thread")
        predict_in_thread_or_process(futures.ThreadPoolExecutor)
        print("In process")
        predict_in_thread_or_process(futures.ProcessPoolExecutor)
  3. Don't use worker threads in lazyflow. This might be what you want anyway if you're going multiprocessing...
    import numpy as np
    from concurrent import futures
    from xarray import DataArray
    import lazyflow
    from ilastik.experimental.api import from_project_file
    lazyflow.request.Request.reset_thread_pool(0)
    
    ILP = "vnc2d.ilp"
    
    
    def predict_ilp():
        ilp = from_project_file(ILP)
        inp = DataArray(np.random.rand(1, 128, 128), dims=("z", "y", "x"))
        pred = ilp.predict(inp)
        print("Passed", pred.shape)
    
    
    def predict_in_thread_or_process(pool):
        with pool(1) as ex:
            task = ex.submit(predict_ilp)
            task.result()
    
    
    print("Main process")
    predict_ilp()
    print("In thread")
    predict_in_thread_or_process(futures.ThreadPoolExecutor)
    print("In process")
    predict_in_thread_or_process(futures.ProcessPoolExecutor)

would any of those options work for you?

@constantinpape
Copy link
Member Author

Thanks for looking into this @k-dominik. For my issue option 3 is the way to go and it's working now :).
It would be good to document this somehow, but feel free to close this issue.

@k-dominik
Copy link
Contributor

I'd like to keep it open until we find a user friendly solution to this. I've to admit that I haven't thought about this before - so it's great that you brought it up!

@constantinpape constantinpape added the api ilastik.experimental.api label Feb 7, 2022
k-dominik added a commit to k-dominik/ilastik that referenced this issue Aug 24, 2022
this enables external parallelization via multiprocessing

fixes ilastik#2517
@k-dominik k-dominik linked a pull request Aug 24, 2022 that will close this issue
4 tasks
k-dominik added a commit to k-dominik/ilastik that referenced this issue Aug 25, 2022
any import from experimental will reset the threadpool to 0
in order to have all other tests run normally, we reset it after
each test module for the experimental tests
further reference:
see ilastik#2517
@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/documentation-ilastik-api/83202/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api ilastik.experimental.api
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants