Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it a good idea to support predicting multiple instances upon one request? #2929

Closed
zyxue opened this issue Feb 5, 2021 · 4 comments
Closed
Assignees

Comments

@zyxue
Copy link
Contributor

zyxue commented Feb 5, 2021

Currently, we're using the gPRC and predict_raw interface of seldon to do model serving. I wonder if there is any cons of supporting batch instances prediction (< 20 instances).

e.g. we are thinking of using a message like

message BatchRequest {
    repeated PredictionRequest requests = 1;
}

message PredictionRequest {
    feature_1 = 1;
    feature_2 = 2;
    ...
}

which is serialized to the binData field of SeldonMessage.

Similarly, the response would be

message BatchResponse {
    repeated PredictionResponse responses = 1;
}

Is it a good idea?

I wonder if supporting batch prediction would make it impossible to integrate with more complex features of Seldon in the future like the those in the "Complex Seldon Core Inference Graph" as shown in this picture

alt text

@zyxue zyxue added the triage Needs to be triaged and prioritised accordingly label Feb 5, 2021
@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Feb 11, 2021
@adriangonz
Copy link
Contributor

Hey @zyxue, I'm not sure if you've checked out yet the early support for batch processing in Seldon Core? One of the points that it tries to tackle is precisely how to send / process large batches of requests.

It would be good to hear your thoughts on that!

@zyxue
Copy link
Contributor Author

zyxue commented Feb 18, 2021

the batch processing in Seldon Core is for offline use (not real time), right?

@ukclivecox
Copy link
Contributor

You can use gRPC with a batch as long as any components in your graph can handle a batch of instances. As @adriangonz mentioned for offline tasks the batch processing route is probably more suited but it sounds like you are looking for real time.

@ukclivecox
Copy link
Contributor

Closing now. Please reopen if further discussion needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants