Skip to content

Support multiple model inferences per prediction request for TensorFlow and ONNX #562

@deliahu

Description

@deliahu

Description

It should be possible to run multiple model inferences for one prediction request in the TensorFlow and ONNX runtimes.

Example use cases:

  1. An NLP model (e.g. BERT) which as an input length limit, but the user wants to analyze an entire document
  2. A text generator model that generates one word at a time, whereas the user wants to generate a paragraph

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions