Fix for CI problem - closing multiprocessing.pool again #403

PhilipMay · 2020-06-06T19:08:08Z

For the background of this PR see here and the following comments: #385 (comment)

TODO

should we write a regression test with additional dependency to psutil? Details see below.
what about num_processes = mp.cpu_count() - 1 vs. num_processes = mp.cpu_count()? Details see below.
write docstrings
check examples and other documentation
sould other code be changed to close pool

PhilipMay · 2020-06-06T19:26:21Z

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?

Or do we do that because the root task also needs CPU time?

PhilipMay · 2020-06-06T20:38:09Z

I am thinking to add a test (assert) to check that all child processes are closed (as regression test). That would need a new package called psutil.

Code would be something like this:

import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

What do you think?

PhilipMay · 2020-06-07T16:24:14Z

More then 25 builds in a row have been green. I think the CI problem (#385) is fixed. ;-)

tanaysoni · 2020-06-08T13:07:40Z

Hi @PhilipMay, thank you for the PR and comprehensive tests!

tanaysoni · 2020-06-08T13:16:10Z

farm/infer.py

-                except AttributeError:
-                    pass
-            yield from predictions
+        # Use context manager to close the pool again


With this approach, a new instance of multiprocessing.Pool is created for each inference request. The spawning and teardown of the Pool may cause problems when deploying the REST APIs using gunicorn/uvicorn/fastapi stack.

Additionally, there's some overhead of Pool creation for each inference request when processing large streams.

What do you think?

Well - there are many things that come into my mind:

I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (test_inference.py) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures.

I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?

A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:

Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.

Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.

The client creates a Pool (with a context manager if he wants to) and then passes it to us. He can call us as many times as he wants to bundle the requests and avoid unnecessary pool creation and closing. When the client is done with our API he is responsible to close the pool again.

I suggest to extend the signature of inference_from_dicts in infer.py with an additional parameter called multiprocessing_pool. Then pass that to _get_predictions_and_aggregate and just use there. No pool creation in our API. The API client has to do that.

Then we could add defensive code to inference_from_dicts to check if the parameter configuration / combination is valid.

What do you think?

Hi @PhilipMay, thank you for the detailed discussion! Here are my thoughts:

I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (test_inference.py) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures.

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?

A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:
Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.

Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.

Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.
I suggest to extend the signature of inference_from_dicts in infer.py with an additional parameter called multiprocessing_pool. Then pass that to _get_predictions_and_aggregate and just use there. No pool creation in our API. The API client has to do that.

I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer. With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

Ok - want me a do a PR with temporarily disabled multi processing in the tests?

Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.

Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.

I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer.

Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?

With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?

Yes. From API user perspective the current implementation seems to be nice. But has a bug...

Which alternatives do we have? Maybe you could review and discuss my implementation / change from below.

Ok - want me a do a PR with temporarily disabled multi processing in the tests?

Yes, that'd be great.

Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.

Yes, I agree. This functionality for the question-answering task for instance could be implemented in Haystack.

Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?
Yes. From API user perspective the current implementation seems to be nice. But has a bug...

I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?

I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?

IMO the implementation works. When the client of our API wants to have a pool of worker he is free to create one exactly how he wants it to be, can use it with our API and close it whenever he wants. That is not good or bad it is how it is.

In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.

In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.

I agree. By default, the pooling is turned off here in the API.

tanaysoni · 2020-06-08T13:18:50Z

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?

Or do we do that because the root task also needs CPU time?

Thank you for bringing this up. The docstrings should be updated to reflect this n-1 behaviour.

PhilipMay · 2020-06-08T16:14:40Z

Removed the empty commits to trigger CI.

PhilipMay · 2020-06-08T16:16:17Z

Rebased master.

PhilipMay · 2020-06-08T19:16:00Z

I implemented the "API user has to provide the multiprocessing pool" change described above. There is some more stuff to do (see first comment on top). But before I continue I would like to get feedback / review on this design. Should I continue in this direction or stop?

PhilipMay · 2020-06-10T12:55:39Z

@tanaysoni
So what about my proposed pooling fix as it is now with the last commit: 2004e5f

After the discussion above: Do you think this goes into the right direction? Should I add the open TODOs from above? What about my design choices? Do you agree, should we change the direction or stop and so something different?

tanaysoni · 2020-06-10T13:09:44Z

@tanaysoni
So what about my proposed pooling fix as it is now with the last commit: 2004e5f

After the discussion above: Do you think this goes into the right direction? Should I add the open TODOs from above? What about my design choices? Do you agree, should we change the direction or stop and so something different?

Hi @PhilipMay, here's what I think:

We fix the tests by running them with a single process by getting rid of num_processes=None parameters.
We keep multiprocessing.Pool as an instance attribute of the Inferencer(like in the current version). It'd work well for doing streaming inference on a large number of documents, as we avoid creating a Pool for each request. For the REST API, we retain the present behaviour of not using Pool.
We update the docstrings to reflect that n-1 processes are spawned when num_processes is set as None.

PhilipMay · 2020-06-10T13:44:03Z

@tanaysoni I agree with everything but point 2.

If we keep that, then we (the API client code) have no possibility to close the pool again. That way we open up a big window for very ugly bugs. These unclosed pools can cause memory issues that are very hard to find (as you can see with this CI bug we have).

What is wrong with my suggestion to give the client the possibility to manage the pool itself? He can pass in a pool, we use that pool and the client can close it again. This pool is optional. If the client does not pass a pool he gets no pool and is single threased als "normal".

tanaysoni · 2020-06-10T15:09:56Z

If we keep that, then we (the API client code) have no possibility to close the pool again. That way we open up a big window for very ugly bugs. These unclosed pools can cause memory issues that are very hard to find (as you can see with this CI bug we have).

Yes, using Pools with API is tricky, as you mention. Since it's disabled by default, the users will not run into issues.

What is wrong with my suggestion to give the client the possibility to manage the pool itself? He can pass in a pool, we use that pool and the client can close it again. This pool is optional. If the client does not pass a pool he gets no pool and is single threased als "normal".

I think it might require more engineering to build a robust solution for using Pool with the APIs. Here's related issues for FastAPI/Gunicorn deployment: tiangolo/fastapi#841 & encode/uvicorn#548.

As we discussed earlier, instead of multiprocessing, a message queue/worker based would be a better option to leverage multiprocessing.

PhilipMay · 2020-06-11T07:00:34Z

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

The quick fix PR is here: #406

PhilipMay · 2020-06-11T07:50:53Z

@tanaysoni
So you suggest to change the default so no pooling is used (num_processes=0). Right?

My problem with this is that the bug still remains when the API user is not using the default. I see 2 options.

this fix here -> let the API user provide (open / close) the pool
Remove the pooling completely.

Since you are not happy with option 1.) what about option 2.)?

Wouldn't it be better to remove a feature with a bug then providing an optional feature that has an ugly bug that is hard to debug and find?

PhilipMay · 2020-06-16T20:12:37Z

The discussion has somehow stalled. How can we continue?

Although we have a temporary fix for the CI OOM problem we still have a bug that opens a worker poll that can not be closed again.

tanaysoni · 2020-06-17T07:51:43Z

Hi @PhilipMay, thank you for your patience.

@tholor, @Timoeller, and I had a discussion. We agree that opening a pool with no option for closing it is not an ideal situation. For that, we can implement a close_multiprocessing_pool() method in the Inferencer and add a docstring suggesting to close the pool when you're done using the Inferencer. This approach will have no breaking changes in FARM.

As for the REST API, FARM currently has a minimal Flask API to serve as an explanatory implementation. For very large scale deployments, as we discussed, having a Celery/RabbitMQ like solution would be a good fit, but that is out-of-scope for FARM. So, for now, we keep the same behaviour of disabling multiprocessing by default in the Flask API.

PhilipMay · 2020-06-17T08:37:13Z

@tholor, @Timoeller, and I had a discussion. We agree that opening a pool with no option for closing it is not an ideal situation. For that, we can implement a close_multiprocessing_pool() method in the Inferencer and add a docstring suggesting to close the pool when you're done using the Inferencer. This approach will have no breaking changes in FARM.

Ok. I agree to this because I did not have those potential breaking changes in my mind. Want me to implement a suggestion? I would revert this PR to the beginning to fix it here without opening a new one.

tanaysoni · 2020-06-17T10:49:45Z

Want me to implement a suggestion?

@PhilipMay, do you mean another approach than close_multiprocessing_pool()? In that case, it might be efficient if we discuss the solution before you put the effort into implementation.

PhilipMay · 2020-06-17T11:48:07Z

No. I mean the approach suggested by you.

tanaysoni · 2020-06-18T07:36:54Z

Yes, that'd be great @PhilipMay!

PhilipMay · 2020-06-20T15:30:39Z

Reverted all changes and rebased master.

- add funtion - call funtion in test - disabled quick fix

PhilipMay · 2020-06-20T20:44:09Z

Reopen PR

PhilipMay · 2020-06-21T06:28:30Z

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?
Or do we do that because the root task also needs CPU time?

Thank you for bringing this up. The docstrings should be updated to reflect this n-1 behaviour.

I changed the docstring to reflect that.

PhilipMay · 2020-06-21T06:33:09Z

@tanaysoni can you please check this idea: #403 (comment)

The question is: should we extend the test and add one more dependency (psutil) to check if all subprocesses are closed? What do you think?

The code would look like this:

import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

PhilipMay · 2020-06-21T06:42:19Z

I would like to suggest to change the default setting for num_processes from None to 0.

When setting to None by default we use multiprocessing which requires special attention (call close_multiprocessing_pool in an finally block). Setting it to 0 would not require any special attention by the API user because nothing needs to be closed. Also the examples do not have to be changed (extended) which makes them easier to read.

@tanaysoni what do you think?

tanaysoni · 2020-06-23T13:44:29Z

The question is: should we extend the test and add one more dependency (psutil) to check if all subprocesses are closed? What do you think?

The code would look like this:
import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

Yes, that sounds good!

I would like to suggest to change the default setting for num_processes from None to 0.

When setting to None by default we use multiprocessing which requires special attention (call close_multiprocessing_pool in an finally block). Setting it to 0 would not require any special attention by the API user because nothing needs to be closed. Also the examples do not have to be changed (extended) which makes them easier to read.

@tholor and I had a discussion on this. We think that many new FARM users work with notebooks or Python scripts(similar to FARM examples). In both cases, the processes eventually terminate upon exit. So, in our opinion, it would be nice to have the performance gains of multiprocessing for the new users.

tanaysoni

Thanks again @PhilipMay for working on this! This looks good-to-go.

PhilipMay · 2020-06-25T08:42:51Z

Hi @tanaysoni
That merge was very quick. I was not fully done with this. There are still 2 open TODOs (see first entry on top).
I will just open a new PR for them.
If you want to prepare a release soon - please tell me so I can add the missing stuff.

tanaysoni · 2020-06-25T08:46:18Z

Apologies for that @PhilipMay. Yes, I think the easiest is to open a new PR. There isn't a release planned as yet, so you can take your time to add the missing stuff.

tholor · 2020-06-26T09:21:43Z

If the only change there is the additional line of inferencer.close_multiprocessing_pool(), I'll be fine with it. Otherwise, we should discuss this again.

PhilipMay force-pushed the multiprocessing_fix branch from e4b8dbf to cfe897a Compare June 6, 2020 19:09

PhilipMay mentioned this pull request Jun 6, 2020

CI Pipeline seems to be broken #385

Closed

PhilipMay changed the title ~~Fix for unclosed worker pool - which might cause the CI problems~~ Fix for CI problem - closing multiprocessing.pool again Jun 7, 2020

tanaysoni self-requested a review June 8, 2020 13:07

tanaysoni suggested changes Jun 8, 2020

View reviewed changes

PhilipMay force-pushed the multiprocessing_fix branch from 0eb95f4 to cfe897a Compare June 8, 2020 16:13

PhilipMay force-pushed the multiprocessing_fix branch from cfe897a to 150b61f Compare June 8, 2020 16:15

PhilipMay requested a review from tanaysoni June 8, 2020 19:18

PhilipMay mentioned this pull request Jun 11, 2020

Quick fix CI problems with OOM and unclosed worker pool #406

Merged

PhilipMay closed this Jun 20, 2020

PhilipMay force-pushed the multiprocessing_fix branch from 2004e5f to 6edf91e Compare June 20, 2020 15:28

Introduce function to close pool again

2b0e0dd

- add funtion - call funtion in test - disabled quick fix

PhilipMay reopened this Jun 20, 2020

PhilipMay added 3 commits June 21, 2020 08:10

Add docstring with warning to close pool again

f50ff3f

Add docstring for close_multiprocessing_pool

dd4bd29

Say in doc that we use all cores minus one

db0c4cb

assert if all sub-processes are closed

8f9d9c3

PhilipMay force-pushed the multiprocessing_fix branch from 78288db to 8771792 Compare June 24, 2020 18:46

remove session scope for fixture

8771792

tanaysoni approved these changes Jun 25, 2020

View reviewed changes

tanaysoni merged commit 451bc0b into deepset-ai:master Jun 25, 2020

PhilipMay mentioned this pull request Jun 26, 2020

Fix and document Inferencer usage and pool handling #429

Merged

9 tasks

Fix for CI problem - closing multiprocessing.pool again #403

Fix for CI problem - closing multiprocessing.pool again #403

Conversation

PhilipMay commented Jun 6, 2020 • edited

TODO

PhilipMay commented Jun 6, 2020

PhilipMay commented Jun 6, 2020 • edited

PhilipMay commented Jun 7, 2020 • edited

tanaysoni commented Jun 8, 2020

tanaysoni Jun 8, 2020

Choose a reason for hiding this comment

PhilipMay Jun 8, 2020 • edited

Choose a reason for hiding this comment

PhilipMay Jun 8, 2020 • edited

Choose a reason for hiding this comment

PhilipMay Jun 8, 2020 • edited

Choose a reason for hiding this comment

tanaysoni Jun 10, 2020

Choose a reason for hiding this comment

PhilipMay Jun 10, 2020 • edited

Choose a reason for hiding this comment

tanaysoni Jun 10, 2020

Choose a reason for hiding this comment

PhilipMay Jun 10, 2020 • edited

Choose a reason for hiding this comment

tanaysoni Jun 10, 2020

Choose a reason for hiding this comment

tanaysoni commented Jun 8, 2020

PhilipMay commented Jun 8, 2020 • edited

PhilipMay commented Jun 8, 2020

PhilipMay commented Jun 8, 2020 • edited

PhilipMay commented Jun 10, 2020

tanaysoni commented Jun 10, 2020

PhilipMay commented Jun 10, 2020

tanaysoni commented Jun 10, 2020

PhilipMay commented Jun 11, 2020

PhilipMay commented Jun 11, 2020 • edited

PhilipMay commented Jun 16, 2020

tanaysoni commented Jun 17, 2020

PhilipMay commented Jun 17, 2020

tanaysoni commented Jun 17, 2020

PhilipMay commented Jun 17, 2020

tanaysoni commented Jun 18, 2020

PhilipMay commented Jun 20, 2020

PhilipMay commented Jun 20, 2020

PhilipMay commented Jun 21, 2020

PhilipMay commented Jun 21, 2020 • edited

PhilipMay commented Jun 21, 2020 • edited

tanaysoni commented Jun 23, 2020

tanaysoni left a comment

Choose a reason for hiding this comment

PhilipMay commented Jun 25, 2020

tanaysoni commented Jun 25, 2020

tholor commented Jun 26, 2020

PhilipMay commented Jun 6, 2020 •

edited

PhilipMay commented Jun 6, 2020 •

edited

PhilipMay commented Jun 7, 2020 •

edited

PhilipMay Jun 8, 2020 •

edited

PhilipMay Jun 8, 2020 •

edited

PhilipMay Jun 8, 2020 •

edited

PhilipMay Jun 10, 2020 •

edited

PhilipMay Jun 10, 2020 •

edited

PhilipMay commented Jun 8, 2020 •

edited

PhilipMay commented Jun 8, 2020 •

edited

PhilipMay commented Jun 11, 2020 •

edited

PhilipMay commented Jun 21, 2020 •

edited

PhilipMay commented Jun 21, 2020 •

edited