-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MixedBackend
support for InputOperator
#4603
Conversation
!build |
CI MESSAGE: [7086603]: BUILD STARTED |
!build |
CI MESSAGE: [7088060]: BUILD STARTED |
CI MESSAGE: [7088060]: BUILD PASSED |
|
||
private: | ||
void PutTogetherDaliGraph() { | ||
pipeline_->AddOperator(OpSpec("PassthroughInput") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this PassthroughInput would be defined in a _test.* file so that it doesn't belong to DALI library
} | ||
// if the output is pinned and input not it needs to be copied | ||
if (target.is_pinned() && !tensor_list_elm.front()->is_pinned()) { | ||
const auto &shapes = tensor_list_elm.front()->shape(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: auto &elem = tensor_list_elm.front();
(you use it several times in this scope)
target.CopySample(sample_id, *tensor_list_elm.front(), sample_id, | ||
AccessOrder::host()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target.CopySample(sample_id, *tensor_list_elm.front(), sample_id, | |
AccessOrder::host()); | |
target.CopySample(sample_id, *tensor_list_elm.front(), sample_id); |
leave the order to be figured out from the arguments?
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
Signed-off-by: szalpal <mszolucha@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, mostly nitpicks about test scope and the misleading naming of the test operator.
DALI_SCHEMA(PassthroughInput) | ||
.DocStr( | ||
R"code( | ||
The operator that is a passthrough operator and also an input operator. Used for test only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be a Passthrough operator it must contain a PassThrough
or SamplewisePassThrough
declaration in the schema. This one doesn't, so I would name it something along the lines of TestInputOperator
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to IdentityInput
this->ForwardCurrentData(intermediate, | ||
std::is_same_v<Backend, CPUBackend> ? ws.GetThreadPool() : *tp_); | ||
out.Copy(intermediate, ws.stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we check, for the CPUBackend, if we can just ForwardCurrentData to the output without copying it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you have in mind, but let me explain what happens here.
The point of this test is to check both overloads:
void DLL_PUBLIC ForwardCurrentData(TensorList<CPUBackend> &target, ThreadPool &tp);
void DLL_PUBLIC ForwardCurrentData(TensorList<GPUBackend> &target, cudaStream_t stream = nullptr);
inside an InputOperator<MixedBackend>
operator. We don't care for InputOperator<CPUBackend>
and InputOperator<GPUBackend>
, because these two are tested with the ExternalSource
tests.
Therefore we have a cpu_input
flag, specified and switched in tests, which picks the proper overload:
void RunCpuInput(Workspace &ws) {
auto &out = ws.Output<OutBackend>(0);
TensorList<CPUBackend> intermediate;
this->ForwardCurrentData(intermediate, // ForwardCurrentData(TensorList<CPUBackend>, ThreadPool &)
std::is_same_v<Backend, CPUBackend> ? ws.GetThreadPool() : *tp_);
out.Copy(intermediate, ws.stream());
}
void RunGpuInput(Workspace &ws) {
auto &out = ws.Output<OutBackend>(0);
this->ForwardCurrentData(out, ws.stream()); // ForwardCurrentData(TensorList<GPUBackend> &, cudaStream_t);
}
This makes both of them tested, inside MixedOperator
, we're happy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, as long as we don't test the CPU and GPU variants, it doesn't make sense to do it. In that case IMO it is not necessary to implement this operator as templated over Backend, it should just be implemented for MixedBackend.
I can see that we only do DALI_REGISTER_OPERATOR(IdentityInput, IdentityInput<MixedBackend>, Mixed);
, but that could just be DALI_REGISTER_OPERATOR(IdentityInput, IdentityInputMixed, Mixed);
{2, 2, true, false, true}, | ||
{3, 3, true, false, true}, | ||
{2, 2, true, true, false}, | ||
{3, 3, true, true, false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{2, 2, true, false, true}, | |
{3, 3, true, false, true}, | |
{2, 2, true, true, false}, | |
{3, 3, true, true, false}, | |
{2, 2, true, false, true}, | |
{3, 3, true, false, false}, | |
{2, 2, true, true, false}, | |
{3, 3, true, true, true}, |
Should we make every other usage of cpu_input different for the particular executor type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [7097931]: BUILD STARTED |
CI MESSAGE: [7097931]: BUILD PASSED |
* Add In/Out type logic to the InputOperator. * Add ForwardCurrentData specializations for MixedBackend. * Refactor SetExternalInput routines and add MixedBackend handling. * Add IdentityInput test-only operator. * Add InputOperator<MixedBackend> tests. So far we never had a `MixedBackend` input operator. None of: `ExternalSourceCpu`, `ExternalSourceGpu` and `VideoInputCpu` had to inherit from `InputOperator<MixedBackend>`. With `VideoInputGpu` the situation changes, since this operator is actually a mixed operator. Therefore the support for `MixedBackend` operators in the `InputOperator` had to be added. The mixed input operator has somehow different meaning than the other two. Both CPU and GPU have a uniform device across the execution, while Mixed operator changes the device in the process. In other words, the following pattern occurs: OpBackend InpB OutB CPU CPU CPU MXD CPU GPU GPU GPU GPU Let's take an example of the operator, that subclasses `InputOperator`: 1. `MyOp<CPUBackend>`. This part is already handled by the `InputOperator`. Since operator takes the CPU input and returns a CPU output, the "intermediate" data (i.e. the data that is acquired using InputOperator API `ForwardCurrentData`) will also have a CPUBackend. 2. `MyOp<GPUBackend>`. This part is also handled already. Since the operator takes the GPU input and returns a GPU output, the intermediate data has to be on the GPU. 3. `MyOp<MixedOperator>`. This part is what the PR targets. The input operator grabs the actual input from the top of the queue (i.e. `CachingList` in the implementation). Since it has a CPU input and returns a GPU output, we do not know what backend of the data the operator needs to proceed. It might be a CPU data (which I believe will be the more common case), when e.g. the operator need to conduct Huffman CPU-based decoding and then proceed with JPEG decoding on the GPU. However, maybe the operator does not need CPU data, but it needs to be a Mixed operator for any other reason - it this case, the operator would like `ForwardCurrentData` to return a tensor already with GPU backend. For these reasons, there are 2 overloads of the `ForwardCurrentData`. One is for CPU and one for GPU target tensor. The former part would be more common and the latter part is just a convenience function, making the code a lot simpler. `ForwardCurrentData<Mixed>(Tensor<CPU>)` would behave in the same way as its CPU sibling - copy or swap depending on the pageability. In case of `ForwardCurrentData<Mixed>(Tensor<GPU>)` the data on the top of the queue is always a CPU one (since MixedOperator has a CPU input). Therefore the InputOperator needs to copy this data to the GPU. For the CPU part, the operator has to create its own ThreadPool, since MixedOperators are not supplied with one automatically. Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>
Category:
New feature (*non-breaking change
Description:
So far we never had a
MixedBackend
input operator. None of:ExternalSourceCpu
,ExternalSourceGpu
andVideoInputCpu
had to inherit fromInputOperator<MixedBackend>
. WithVideoInputGpu
the situation changes, since this operator is actually a mixed operator. Therefore the support forMixedBackend
operators in theInputOperator
had to be added.The mixed input operator has somehow different meaning than the other two. Both CPU and GPU have a uniform device across the execution, while Mixed operator changes the device in the process. In other words, the following pattern occurs:
Let's take an example of the operator, that subclasses
InputOperator
:MyOp<CPUBackend>
. This part is already handled by theInputOperator
. Since operator takes the CPU input and returns a CPU output, the "intermediate" data (i.e. the data that is acquired using InputOperator APIForwardCurrentData
) will also have a CPUBackend.MyOp<GPUBackend>
. This part is also handled already. Since the operator takes the GPU input and returns a GPU output, the intermediate data has to be on the GPU.MyOp<MixedOperator>
. This part is what the PR targets.The input operator grabs the actual input from the top of the queue (i.e.
CachingList
in the implementation). Since it has a CPU input and returns a GPU output, we do not know what backend of the data the operator needs to proceed. It might be a CPU data (which I believe will be the more common case), when e.g. the operator need to conduct Huffman CPU-based decoding and then proceed with JPEG decoding on the GPU. However, maybe the operator does not need CPU data, but it needs to be a Mixed operator for any other reason - it this case, the operator would likeForwardCurrentData
to return a tensor already with GPU backend.For these reasons, there are 2 overloads of the
ForwardCurrentData
. One is for CPU and one for GPU target tensor. The former part would be more common and the latter part is just a convenience function, making the code a lot simpler.ForwardCurrentData<Mixed>(Tensor<CPU>)
would behave in the same way as its CPU sibling - copy or swap depending on the pageability. In case ofForwardCurrentData<Mixed>(Tensor<GPU>)
the data on the top of the queue is always a CPU one (since MixedOperator has a CPU input). Therefore the InputOperator needs to copy this data to the GPU.For the CPU part, the operator has to create its own ThreadPool, since MixedOperators are not supplied with one automatically.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A