Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for CI problem - closing multiprocessing.pool again #403

Merged
merged 6 commits into from Jun 25, 2020

Conversation

PhilipMay
Copy link
Contributor

@PhilipMay PhilipMay commented Jun 6, 2020

For the background of this PR see here and the following comments: #385 (comment)

TODO

  • should we write a regression test with additional dependency to psutil? Details see below.
  • what about num_processes = mp.cpu_count() - 1 vs. num_processes = mp.cpu_count()? Details see below.
  • write docstrings
  • check examples and other documentation
  • sould other code be changed to close pool

@PhilipMay
Copy link
Contributor Author

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?

Or do we do that because the root task also needs CPU time?

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 6, 2020

I am thinking to add a test (assert) to check that all child processes are closed (as regression test). That would need a new package called psutil.

Code would be something like this:

import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

What do you think?

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 7, 2020

More then 25 builds in a row have been green. I think the CI problem (#385) is fixed. ;-)

@PhilipMay PhilipMay changed the title Fix for unclosed worker pool - which might cause the CI problems Fix for CI problem - closing multiprocessing.pool again Jun 7, 2020
@tanaysoni tanaysoni self-requested a review June 8, 2020 13:07
@tanaysoni
Copy link
Contributor

Hi @PhilipMay, thank you for the PR and comprehensive tests!

farm/infer.py Outdated
except AttributeError:
pass
yield from predictions
# Use context manager to close the pool again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach, a new instance of multiprocessing.Pool is created for each inference request. The spawning and teardown of the Pool may cause problems when deploying the REST APIs using gunicorn/uvicorn/fastapi stack.

Additionally, there's some overhead of Pool creation for each inference request when processing large streams.

What do you think?

Copy link
Contributor Author

@PhilipMay PhilipMay Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - there are many things that come into my mind:

  1. I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (test_inference.py) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures.

  2. I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?

  3. A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:

Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.

Copy link
Contributor Author

@PhilipMay PhilipMay Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.

The client creates a Pool (with a context manager if he wants to) and then passes it to us. He can call us as many times as he wants to bundle the requests and avoid unnecessary pool creation and closing. When the client is done with our API he is responsible to close the pool again.

Copy link
Contributor Author

@PhilipMay PhilipMay Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extend the signature of inference_from_dicts in infer.py with an additional parameter called multiprocessing_pool. Then pass that to _get_predictions_and_aggregate and just use there. No pool creation in our API. The API client has to do that.

Then we could add defensive code to inference_from_dicts to check if the parameter configuration / combination is valid.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PhilipMay, thank you for the detailed discussion! Here are my thoughts:

  1. I agree that the fix in this form does not seem to be optimal and needs to be improved. But it has discovered that the CI problem is being caused by an unclosed Worker Pool. IMO as a quick fix this test (test_inference.py) should temporarily be set to single threaded to avoid the CI problems. This way the other PRs are not "disturbed" by random CI failures.

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

  1. I agree - creating and closing the worker pool is ugly. A solution could be to expose a the context manager to the client. This CM then has to be used by a client, so the client can bundle the requests under that context manager. What do you think?
  2. A more general question (issue): When you use gunicorn or something else to have flask in a production environment and to start a worker pool that accepts HTTP requests from clients: Is it realy clever to start subprocesses in these pooled threads. In Java (Tomcat, ...) it would be realy evil to do this. I am not sure about Python. When I have a Gunicorn running with 5 Workers to accept requests and a CPU with 6 cores and 12 hyperthreads. I do not think that it is useful when 5 workers start 6 or 12 pooled subprocesses (30 to 60 subprocesses) when under load. IMO the architecture needs to be changed when you want to do stuff like that. The answer could be this pattern:
    Use a REST pattern that accepts the data and returns an URL with a time estimate. Then the client can poll this URL to get an answer. It either gets the answer or the info that the answer is not available yet. In the background the Flask Code takes the request and pushes it to an async messaging system and then returns the URL to the client and is done. See celery / RabitMQ / Redis / etc. This way you can scale your ML ressources by just starting / stopping additional ML servers that listen and answer on the message queue.

Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.

Regarting item 2 above: Even simpler would be to design the API in a way that lets the API client pass in a Pool instance and delegate the opening and closing to the client. Then we could add come documentation about the usage and everything is clean.
I suggest to extend the signature of inference_from_dicts in infer.py with an additional parameter called multiprocessing_pool. Then pass that to _get_predictions_and_aggregate and just use there. No pool creation in our API. The API client has to do that.

I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer. With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?

Copy link
Contributor Author

@PhilipMay PhilipMay Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

Ok - want me a do a PR with temporarily disabled multi processing in the tests?

Yes, I agree. I think a scalable solution is implementing an async API where we log requests to a message queue(Redis/RabbitMQ) and use Celery to process them as you mentioned. This would also enable possibilities to split an inference request across multiple GPUs and aggregate the result.

Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.

I reckon this is a good solution for the API, but it may get a bit difficult for new FARM users to get started with the Inferencer.

Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?

With the current approach, multiprocessing in the Inferencer can be used without any configuration/param. What do you think?

Yes. From API user perspective the current implementation seems to be nice. But has a bug...

Which alternatives do we have? Maybe you could review and discuss my implementation / change from below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - want me a do a PR with temporarily disabled multi processing in the tests?

Yes, that'd be great.

Yes. IMO this should not be in the scope of FARM. FARM should just be a tool that is used inside such a tech stack.

Yes, I agree. This functionality for the question-answering task for instance could be implemented in Haystack.

Well - with example code I think it should be possible to use. The question is: which alternative to we have that is free of bugs?
Yes. From API user perspective the current implementation seems to be nice. But has a bug...

I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?

Copy link
Contributor Author

@PhilipMay PhilipMay Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the current API implementation works in the context of multiprocessing. The multiprocessing.Pool helps to speed up an individual inference request by using multiple CPU cores. It's more optimized for the latency than throughout, as we end up with too many processes (Pool x Gunicorn workers). Do you think there are any potential bugs that could arise in inference with this approach?

IMO the implementation works. When the client of our API wants to have a pool of worker he is free to create one exactly how he wants it to be, can use it with our API and close it whenever he wants. That is not good or bad it is how it is.

In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of a REST api with flask and so on I would not use this pooling as a solution. I would use async messaging to connext flask with an other ML server. But it's up to the user (client) of our API how he wants to use our software. All we could do is to write something in the doc obout this.

I agree. By default, the pooling is turned off here in the API.

@tanaysoni
Copy link
Contributor

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?

Or do we do that because the root task also needs CPU time?

Thank you for bringing this up. The docstrings should be updated to reflect this n-1 behaviour.

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 8, 2020

Removed the empty commits to trigger CI.

@PhilipMay
Copy link
Contributor Author

Rebased master.

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 8, 2020

I implemented the "API user has to provide the multiprocessing pool" change described above. There is some more stuff to do (see first comment on top). But before I continue I would like to get feedback / review on this design. Should I continue in this direction or stop?

@PhilipMay PhilipMay requested a review from tanaysoni June 8, 2020 19:18
@PhilipMay
Copy link
Contributor Author

@tanaysoni
So what about my proposed pooling fix as it is now with the last commit: 2004e5f

After the discussion above: Do you think this goes into the right direction? Should I add the open TODOs from above? What about my design choices? Do you agree, should we change the direction or stop and so something different?

@tanaysoni
Copy link
Contributor

@tanaysoni
So what about my proposed pooling fix as it is now with the last commit: 2004e5f

After the discussion above: Do you think this goes into the right direction? Should I add the open TODOs from above? What about my design choices? Do you agree, should we change the direction or stop and so something different?

Hi @PhilipMay, here's what I think:

  1. We fix the tests by running them with a single process by getting rid of num_processes=None parameters.
  2. We keep multiprocessing.Pool as an instance attribute of the Inferencer(like in the current version). It'd work well for doing streaming inference on a large number of documents, as we avoid creating a Pool for each request. For the REST API, we retain the present behaviour of not using Pool.
  3. We update the docstrings to reflect that n-1 processes are spawned when num_processes is set as None.

@PhilipMay
Copy link
Contributor Author

@tanaysoni I agree with everything but point 2.

If we keep that, then we (the API client code) have no possibility to close the pool again. That way we open up a big window for very ugly bugs. These unclosed pools can cause memory issues that are very hard to find (as you can see with this CI bug we have).

What is wrong with my suggestion to give the client the possibility to manage the pool itself? He can pass in a pool, we use that pool and the client can close it again. This pool is optional. If the client does not pass a pool he gets no pool and is single threased als "normal".

@tanaysoni
Copy link
Contributor

If we keep that, then we (the API client code) have no possibility to close the pool again. That way we open up a big window for very ugly bugs. These unclosed pools can cause memory issues that are very hard to find (as you can see with this CI bug we have).

Yes, using Pools with API is tricky, as you mention. Since it's disabled by default, the users will not run into issues.

What is wrong with my suggestion to give the client the possibility to manage the pool itself? He can pass in a pool, we use that pool and the client can close it again. This pool is optional. If the client does not pass a pool he gets no pool and is single threased als "normal".

I think it might require more engineering to build a robust solution for using Pool with the APIs. Here's related issues for FastAPI/Gunicorn deployment: tiangolo/fastapi#841 & encode/uvicorn#548.

As we discussed earlier, instead of multiprocessing, a message queue/worker based would be a better option to leverage multiprocessing.

@PhilipMay
Copy link
Contributor Author

Yes, we should do this quick fix for the CI pipeline until we have a more stable solution.

The quick fix PR is here: #406

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 11, 2020

@tanaysoni
So you suggest to change the default so no pooling is used (num_processes=0). Right?

My problem with this is that the bug still remains when the API user is not using the default. I see 2 options.

  1. this fix here -> let the API user provide (open / close) the pool
  2. Remove the pooling completely.

Since you are not happy with option 1.) what about option 2.)?

Wouldn't it be better to remove a feature with a bug then providing an optional feature that has an ugly bug that is hard to debug and find?

@PhilipMay
Copy link
Contributor Author

The discussion has somehow stalled. How can we continue?

Although we have a temporary fix for the CI OOM problem we still have a bug that opens a worker poll that can not be closed again.

@tanaysoni
Copy link
Contributor

Hi @PhilipMay, thank you for your patience.

@tholor, @Timoeller, and I had a discussion. We agree that opening a pool with no option for closing it is not an ideal situation. For that, we can implement a close_multiprocessing_pool() method in the Inferencer and add a docstring suggesting to close the pool when you're done using the Inferencer. This approach will have no breaking changes in FARM.

As for the REST API, FARM currently has a minimal Flask API to serve as an explanatory implementation. For very large scale deployments, as we discussed, having a Celery/RabbitMQ like solution would be a good fit, but that is out-of-scope for FARM. So, for now, we keep the same behaviour of disabling multiprocessing by default in the Flask API.

@PhilipMay
Copy link
Contributor Author

@tholor, @Timoeller, and I had a discussion. We agree that opening a pool with no option for closing it is not an ideal situation. For that, we can implement a close_multiprocessing_pool() method in the Inferencer and add a docstring suggesting to close the pool when you're done using the Inferencer. This approach will have no breaking changes in FARM.

Ok. I agree to this because I did not have those potential breaking changes in my mind. Want me to implement a suggestion? I would revert this PR to the beginning to fix it here without opening a new one.

@tanaysoni
Copy link
Contributor

Want me to implement a suggestion?

@PhilipMay, do you mean another approach than close_multiprocessing_pool()? In that case, it might be efficient if we discuss the solution before you put the effort into implementation.

@PhilipMay
Copy link
Contributor Author

No. I mean the approach suggested by you.

@tanaysoni
Copy link
Contributor

Yes, that'd be great @PhilipMay!

@PhilipMay PhilipMay closed this Jun 20, 2020
@PhilipMay
Copy link
Contributor Author

Reverted all changes and rebased master.

- add funtion
- call funtion in test
- disabled quick fix
@PhilipMay
Copy link
Contributor Author

Reopen PR

@PhilipMay PhilipMay reopened this Jun 20, 2020
@PhilipMay
Copy link
Contributor Author

The docstring of the Inferencer class sais that num_processes when set to None lets Inferencer use all CPU cores. Is that true since we say num_processes = mp.cpu_count() - 1?
Or do we do that because the root task also needs CPU time?

Thank you for bringing this up. The docstrings should be updated to reflect this n-1 behaviour.

I changed the docstring to reflect that.

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 21, 2020

@tanaysoni can you please check this idea: #403 (comment)

The question is: should we extend the test and add one more dependency (psutil) to check if all subprocesses are closed? What do you think?

The code would look like this:

import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Jun 21, 2020

I would like to suggest to change the default setting for num_processes from None to 0.

When setting to None by default we use multiprocessing which requires special attention (call close_multiprocessing_pool in an finally block). Setting it to 0 would not require any special attention by the API user because nothing needs to be closed. Also the examples do not have to be changed (extended) which makes them easier to read.

@tanaysoni what do you think?

@tanaysoni
Copy link
Contributor

The question is: should we extend the test and add one more dependency (psutil) to check if all subprocesses are closed? What do you think?

The code would look like this:

import psutil
current_process = psutil.Process()
children = current_process.children()
assert len(children) == 0

Yes, that sounds good!

I would like to suggest to change the default setting for num_processes from None to 0.

When setting to None by default we use multiprocessing which requires special attention (call close_multiprocessing_pool in an finally block). Setting it to 0 would not require any special attention by the API user because nothing needs to be closed. Also the examples do not have to be changed (extended) which makes them easier to read.

@tholor and I had a discussion on this. We think that many new FARM users work with notebooks or Python scripts(similar to FARM examples). In both cases, the processes eventually terminate upon exit. So, in our opinion, it would be nice to have the performance gains of multiprocessing for the new users.

Copy link
Contributor

@tanaysoni tanaysoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @PhilipMay for working on this! This looks good-to-go.

@tanaysoni tanaysoni merged commit 451bc0b into deepset-ai:master Jun 25, 2020
@PhilipMay
Copy link
Contributor Author

Hi @tanaysoni
That merge was very quick. I was not fully done with this. There are still 2 open TODOs (see first entry on top).
I will just open a new PR for them.
If you want to prepare a release soon - please tell me so I can add the missing stuff.

@tanaysoni
Copy link
Contributor

Apologies for that @PhilipMay. Yes, I think the easiest is to open a new PR. There isn't a release planned as yet, so you can take your time to add the missing stuff.

@tholor
Copy link
Member

tholor commented Jun 26, 2020

If the only change there is the additional line of inferencer.close_multiprocessing_pool(), I'll be fine with it. Otherwise, we should discuss this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants