If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

pengxin233 · 2024-04-29T02:59:58Z

📚 The doc issue

I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?

Suggest a potential alternative/fix

No response

mreso · 2024-04-29T18:48:27Z

Hi @pengxin233
yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

pengxin233 commented Apr 29, 2024

mreso commented Apr 29, 2024

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

Comments

pengxin233 commented Apr 29, 2024

📚 The doc issue

Suggest a potential alternative/fix

mreso commented Apr 29, 2024