-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it a good idea to support predicting multiple instances upon one request? #2929
Comments
Hey @zyxue, I'm not sure if you've checked out yet the early support for batch processing in Seldon Core? One of the points that it tries to tackle is precisely how to send / process large batches of requests. It would be good to hear your thoughts on that! |
the batch processing in Seldon Core is for offline use (not real time), right? |
You can use gRPC with a batch as long as any components in your graph can handle a batch of instances. As @adriangonz mentioned for offline tasks the batch processing route is probably more suited but it sounds like you are looking for real time. |
Closing now. Please reopen if further discussion needed |
Currently, we're using the gPRC and
predict_raw
interface of seldon to do model serving. I wonder if there is any cons of supporting batch instances prediction (< 20 instances).e.g. we are thinking of using a message like
which is serialized to the
binData
field of SeldonMessage.Similarly, the response would be
Is it a good idea?
I wonder if supporting batch prediction would make it impossible to integrate with more complex features of Seldon in the future like the those in the "Complex Seldon Core Inference Graph" as shown in this picture
The text was updated successfully, but these errors were encountered: