Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve server_simple by adding predict_batch and adding GPU option #2064

Merged
merged 4 commits into from Nov 17, 2018

Conversation

@handsomezebra
Copy link
Contributor

handsomezebra commented Nov 16, 2018

Some simple improvements to server_simple.py:

  1. Add a predict_batch method so a client can send a list of items for prediction.
  2. Add the command line option --cuda-device.
@matt-gardner

This comment has been minimized.

Copy link
Member

matt-gardner commented Nov 16, 2018

What's the use case for batch predictions in the server? I'm just wondering if this is adding complexity that we don't really need, or if there's a better way to solve the problem here (e.g., just using a Predictor in a python script instead of in a server). It's probably fine, I just want to be sure this is a use case we want to support. Also, @joelgrus, can you take this one (when you have time to get to it)?

@handsomezebra

This comment has been minimized.

Copy link
Contributor Author

handsomezebra commented Nov 16, 2018

Thanks for the comments. The batch prediction and the GPU option are both useful to improve the response time and/or throughput. For example if I want to tag a long document I can split them into segments and tag them in batches.

@joelgrus joelgrus self-requested a review Nov 16, 2018
@matt-gardner

This comment has been minimized.

Copy link
Member

matt-gardner commented Nov 16, 2018

Well, yes, of course batch prediction makes things faster. I'm just wondering why you're sending them to a server instead of just running the Predictor directly in your script. Going through a server will add all kinds of overhead (serializing things to json, reading them from json, inter-process communication, etc.). I'm not sure in what circumstance you'll actually want to have batched input to a server.

Your "if I have a long document I can split them into segments" statement - is that splitting into segments happening in a browser? Or in a python script? If it's happening in a python script, you are definitely better off just using a Predictor directly. If it's happening in javascript in a browser, then you have a good use-case for this PR.

@handsomezebra

This comment has been minimized.

Copy link
Contributor Author

handsomezebra commented Nov 16, 2018

In my use case one request may go through several steps/models to get a final response. So some upstream components will preprocess the request and generate inputs to a model. Each step or model is a separate service. This is for better isolation and easier deployment of those components. For example I may want to host different services on different machines, use GPU machines for some services but CPU machines for others, and when it's necessary I may need to create multiple instances of the same model and doing load balancing, etc.

Copy link
Collaborator

joelgrus left a comment

lgtm, thanks

@joelgrus joelgrus merged commit 2a2d9c9 into allenai:master Nov 17, 2018
2 of 3 checks passed
2 of 3 checks passed
codecov/patch 62% of diff hit (target 90%)
Details
Pull Requests (AllenNLP Library) TeamCity build finished
Details
codecov/project 92% (-1%) compared to 19c784f
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.