Is your feature request related to a problem?
Performance is not optimal.
Describe the Solution you'd like
Though we implemented micro_batch in the batch scheduling part, we haven't refactored run_loop to overlap process_batch with send_multipart. Also, similar to zero-overhead scheduling, we should hide the CPU latency introduced by the scheduling itself as well.
Alternatives Considered (Optional)
No response
Additional Context (Optional)
No response