Skip to content

[RMP] Dynamic Batching support at serving time #906

@EvenOldridge

Description

@EvenOldridge

Problem:

Customers with high volumes of traffic want to trade off latency for throughput by grouping requests as dynamic batches.

Goal:

Leverage Triton's dynamic batching capabilities to enable support for dynamic batches in Merlin.

New Functionality

  • Models

    • ...
  • Transformers4Rec

    • ...
  • NVTabular

  • Dataloader

Systems

  • Dynamic batching with Triton
  • Serving-time padding operator (to use with dynamic batching)

Examples

  • Example of dynamic batching
  • Blog post on dynamic batching and tradeoff between latency and throughput.

Constraints:

Within Triton

Starting Point:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions