Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inferring batch size from tensor argument inputs #4617

Merged
merged 3 commits into from
Jan 31, 2023

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Jan 26, 2023

Signed-off-by: Kamil Tokarski ktokarski@nvidia.com

Category:s

New feature (non-breaking change which adds functionality)

Description:

This PR updates the way executor sets requested batch size. If there are any inputs available, executor assumes the operator expected uniform batch sizes across inputs and outputs and sets requested batch size to any tensor input it can find: either positional or named (argument input). Otherwise, if there are none avialable, the stage queue batch size is used.

Additional information:

The common patern introduced with the end2end variable batch size was to look at the regular inputs or, if there were none, resort to GetRequestedBatchSize. This PR moves that behaviour to the executor and includes named arguments in the search of batch before resorting to stage batch size. @klecki updated the autograph layer to reflect that by appropriate "hoisting" of the splits here #4618

Affected modules and functionalities:

The regular usages of the operators should not see any difference.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@@ -245,30 +245,6 @@ class DLTensorPythonFunctionImpl : public Operator<Backend> {
bool synchronize_stream_;
bool batch_processing;
std::vector<TensorLayout> output_layouts_;

private:
int GetCurrBatchSize(Workspace &ws) {
Copy link
Member Author

@stiepan stiepan Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the batch size can be done with the InferBatchSizeFromInput. The extra checks were unnecessary: the EnforceUniformInput/OutputBatchSize at the operator level does that. And as for the check of requested batch size (ws.GetRequestedBatchSize(i) = ws.GetRequestedBatchSize(0)), the value is set uniformly batch_sizes_.resize(NumOutput(), batch_size); so it was true anyway.

@klecki klecki self-assigned this Jan 26, 2023
@stiepan stiepan marked this pull request as ready for review January 30, 2023 10:54
@stiepan stiepan force-pushed the infer_batch_size_from_arg_input branch from 40726d2 to 916bc82 Compare January 30, 2023 11:37
@@ -72,7 +72,7 @@ class TransformBaseOp : public SequenceOperator<Backend, true> {

bool SetupImpl(std::vector<OutputDesc> &output_descs, const Workspace &ws) override {
has_input_ = ws.NumInput() > 0;
auto curr_batch_size = has_input_ ? ws.GetInputBatchSize(0) : ws.GetRequestedBatchSize(0);
auto curr_batch_size = InferBatchSizeFromInput(ws);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defeats the purpose of RequestedBatchSize. Instead, move InferBatchSize to the executor and invoke it prior to SetBatchSize. It will be a workaround, but at least we'll have one place where we can change/improve batch size inference instead of spreading it across operators. It also allows for some smart(er) solutions or non-local batch source.

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the batch size calculation to the executor and keep RequestedBatchSize.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan stiepan force-pushed the infer_batch_size_from_arg_input branch from 916bc82 to 7403586 Compare January 31, 2023 11:07
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
def test_named_tensor_arguments(op):

ops2params = {
fn.permute_batch: _tensor_arg_permute_batch_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

permute_batch does have an input - although by original design the output batch size was taken from indices. I don't think there's any special case for this op - or any other means to infer the batch size - which would mean that it takes the batch_size just from the input, not arg. input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it on purpose here as it calls GetRequestedBatchSize manually as an extra check that GetRequestedBatchSize returns the "split" batch size.

@stiepan
Copy link
Member Author

stiepan commented Jan 31, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7159582]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7159582]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7159582]: BUILD PASSED

@stiepan stiepan merged commit 90e112c into NVIDIA:main Jan 31, 2023
aderylo pushed a commit to zpp-dali-2022/DALI that referenced this pull request Mar 17, 2023
* Set requested batch size based on the op tensor arguments if avialbale
* Include arg inputs in the executor's check for all empty batch

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@JanuszL JanuszL mentioned this pull request Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants