-
Notifications
You must be signed in to change notification settings - Fork 129
Open
Description
Problem:
Customers with high volumes of traffic want to trade off latency for throughput by grouping requests as dynamic batches.
Goal:
Leverage Triton's dynamic batching capabilities to enable support for dynamic batches in Merlin.
New Functionality
-
Models
- ...
-
Transformers4Rec
- ...
-
NVTabular
-
Dataloader
Systems
- Dynamic batching with Triton
- Serving-time padding operator (to use with dynamic batching)
Examples
- Example of dynamic batching
- Blog post on dynamic batching and tradeoff between latency and throughput.
Constraints:
Within Triton
Starting Point:
Reactions are currently unavailable