-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for CI problem - closing multiprocessing.pool again #403
Conversation
e4b8dbf
to
cfe897a
Compare
The docstring of the Or do we do that because the root task also needs CPU time? |
I am thinking to add a test (assert) to check that all child processes are closed (as regression test). That would need a new package called Code would be something like this: import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0 What do you think? |
More then 25 builds in a row have been green. I think the CI problem (#385) is fixed. ;-) |
Hi @PhilipMay, thank you for the PR and comprehensive tests! |
farm/infer.py
Outdated
except AttributeError: | ||
pass | ||
yield from predictions | ||
# Use context manager to close the pool again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this approach, a new instance of multiprocessing.Pool
is created for each inference request. The spawning and teardown of the Pool
may cause problems when deploying the REST APIs using gunicorn
/uvicorn
/fastapi
stack.
Additionally, there's some overhead of Pool
creation for each inference request when processing large streams.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well - there are many things that come into my mind:
-
I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (
test_inference.py
) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures. -
I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?
-
A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:
Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool
instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.
The client creates a Pool (with a context manager if he wants to) and then passes it to us. He can call us as many times as he wants to bundle the requests and avoid unnecessary pool creation and closing. When the client is done with our API he is responsible to close the pool again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to extend the signature of inference_from_dicts
in infer.py
with an additional parameter called multiprocessing_pool
. Then pass that to _get_predictions_and_aggregate
and just use there. No pool creation in our API. The API client has to do that.
Then we could add defensive code to inference_from_dicts
to check if the parameter configuration / combination is valid.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PhilipMay, thank you for the detailed discussion! Here are my thoughts:
- I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (
test_inference.py
) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures.
Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.
- I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?
- A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:
Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.
Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.
Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.
I suggest to extend the signature of inference_from_dicts in infer.py with an additional parameter called multiprocessing_pool. Then pass that to _get_predictions_and_aggregate and just use there. No pool creation in our API. The API client has to do that.
I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer. With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.
Ok - want me a do a PR with temporarily disabled multi processing in the tests?
Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.
Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.
I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer.
Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?
With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?
Yes. From API user perspective the current implementation seems to be nice. But has a bug...
Which alternatives do we have? Maybe you could review and discuss my implementation / change from below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - want me a do a PR with temporarily disabled multi processing in the tests?
Yes, that'd be great.
Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.
Yes, I agree. This functionality for the question-answering task for instance could be implemented in Haystack.
Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?
Yes. From API user perspective the current implementation seems to be nice. But has a bug...
I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?
IMO the implementation works. When the client of our API wants to have a pool of worker he is free to create one exactly how he wants it to be, can use it with our API and close it whenever he wants. That is not good or bad it is how it is.
In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.
I agree. By default, the pooling is turned off here in the API.
Thank you for bringing this up. The docstrings should be updated to reflect this n-1 behaviour. |
0eb95f4
to
cfe897a
Compare
Removed the empty commits to trigger CI. |
cfe897a
to
150b61f
Compare
Rebased master. |
I implemented the "API user has to provide the multiprocessing pool" change described above. There is some more stuff to do (see first comment on top). But before I continue I would like to get feedback / review on this design. Should I continue in this direction or stop? |
@tanaysoni After the discussion above: Do you think this goes into the right direction? Should I add the open TODOs from above? What about my design choices? Do you agree, should we change the direction or stop and so something different? |
Hi @PhilipMay, here's what I think:
|
@tanaysoni I agree with everything but point 2. If we keep that, then we (the API client code) have no possibility to close the pool again. That way we open up a big window for very ugly bugs. These unclosed pools can cause memory issues that are very hard to find (as you can see with this CI bug we have). What is wrong with my suggestion to give the client the possibility to manage the pool itself? He can pass in a pool, we use that pool and the client can close it again. This pool is optional. If the client does not pass a pool he gets no pool and is single threased als "normal". |
Yes, using Pools with API is tricky, as you mention. Since it's disabled by default, the users will not run into issues.
I think it might require more engineering to build a robust solution for using Pool with the APIs. Here's related issues for FastAPI/Gunicorn deployment: fastapi/fastapi#841 & encode/uvicorn#548. As we discussed earlier, instead of multiprocessing, a message queue/worker based would be a better option to leverage multiprocessing. |
The quick fix PR is here: #406 |
@tanaysoni My problem with this is that the bug still remains when the API user is not using the default. I see 2 options.
Since you are not happy with option 1.) what about option 2.)? Wouldn't it be better to remove a feature with a bug then providing an optional feature that has an ugly bug that is hard to debug and find? |
The discussion has somehow stalled. How can we continue? Although we have a temporary fix for the CI OOM problem we still have a bug that opens a worker poll that can not be closed again. |
Hi @PhilipMay, thank you for your patience. @tholor, @Timoeller, and I had a discussion. We agree that opening a pool with no option for closing it is not an ideal situation. For that, we can implement a As for the REST API, FARM currently has a minimal Flask API to serve as an explanatory implementation. For very large scale deployments, as we discussed, having a Celery/RabbitMQ like solution would be a good fit, but that is out-of-scope for FARM. So, for now, we keep the same behaviour of disabling multiprocessing by default in the Flask API. |
Ok. I agree to this because I did not have those potential breaking changes in my mind. Want me to implement a suggestion? I would revert this PR to the beginning to fix it here without opening a new one. |
@PhilipMay, do you mean another approach than |
No. I mean the approach suggested by you. |
Yes, that'd be great @PhilipMay! |
2004e5f
to
6edf91e
Compare
Reverted all changes and rebased master. |
- add funtion - call funtion in test - disabled quick fix
Reopen PR |
I changed the docstring to reflect that. |
@tanaysoni can you please check this idea: #403 (comment) The question is: should we extend the test and add one more dependency (psutil) to check if all subprocesses are closed? What do you think? The code would look like this: import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0 |
I would like to suggest to change the default setting for When setting to @tanaysoni what do you think? |
Yes, that sounds good!
@tholor and I had a discussion on this. We think that many new FARM users work with notebooks or Python scripts(similar to FARM examples). In both cases, the processes eventually terminate upon exit. So, in our opinion, it would be nice to have the performance gains of multiprocessing for the new users. |
78288db
to
8771792
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @PhilipMay for working on this! This looks good-to-go.
Hi @tanaysoni |
Apologies for that @PhilipMay. Yes, I think the easiest is to open a new PR. There isn't a release planned as yet, so you can take your time to add the missing stuff. |
If the only change there is the additional line of |
For the background of this PR see here and the following comments: #385 (comment)
TODO
psutil
? Details see below.num_processes = mp.cpu_count() - 1
vs.num_processes = mp.cpu_count()
? Details see below.