You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered:
Hi @pengxin233
yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.
馃摎 The doc issue
I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: