Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing? #3120

Open
pengxin233 opened this issue Apr 29, 2024 · 1 comment

Comments

@pengxin233
Copy link

馃摎 The doc issue

I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?

Suggest a potential alternative/fix

No response

@mreso
Copy link
Collaborator

mreso commented Apr 29, 2024

Hi @pengxin233
yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants