Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant operator and Python wrapper. #1699

Merged
merged 6 commits into from
Feb 4, 2020
Merged

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Jan 28, 2020

Add constant operator and unify it in Python with types.Constant.

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It adds new feature to make operators like Slice and writing tests easier and more productive.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • Added an operator which returns same data all the time, without copying
    • Renamed dali.types.Constant to dali.types.ScalarConstant and added a new dali.types.Constant function which can produce the new Constant node.
  • Affected modules and functionalities:
    • Python wrapper (ops, types)
  • Key points relevant for the review:
    • The python wrapper mostly
  • Validation and testing:
    • Python tests *
  • Documentation (including examples):
    • Not yet *

JIRA TASK: N/A

@mzient mzient requested review from Kh4L, klecki, ptrendx and a team January 28, 2020 17:04
@mzient mzient changed the title Add constant operator and unify it in Python with types.Constant. Constant operator and Python wrapper. Jan 28, 2020
@mzient
Copy link
Contributor Author

mzient commented Jan 28, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1095600]: BUILD STARTED

@@ -21,7 +21,7 @@
from nvidia.dali import backend as b
from nvidia.dali.types import _type_name_convert_to_string, _type_convert_value, \
_vector_element_type, _bool_types, _int_types, _int_like_types, _float_types, \
DALIDataType, CUDAStream, Constant
DALIDataType, CUDAStream, ScalarConstant as _Constant
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...actually, we should add more underscores and import aliases here, because it pollutes ops module.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1095600]: BUILD FAILED

arg_inp = kwargs[k]
if arg_inp is None:
continue
if type(arg_inp) is _Constant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write this for consistency:

Suggested change
if type(arg_inp) is _Constant:
if isinstance(arg_inp, _Constant):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -291,6 +291,9 @@ def __init__(self):
def id(self):
return self._id

def _instantiate_constant_node(constant):
return Constant(value = [constant.value], dtype = constant.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Constant(value = [constant.value], dtype = constant.dtype)
return _Constant(value = [constant.value], dtype = constant.dtype)

Copy link
Contributor Author

@mzient mzient Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intended (it should be ops.Constant, but I don't think a module can refer to itself in Python). I renamed _Constant to _ScalarConstant to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic changed. Now I use the types.Constant function.

def _numpy_to_dali_type(t):
if t is None:
return None
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this import here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it's already there in the outer scope.

@@ -21,7 +21,7 @@
from nvidia.dali import backend as b
from nvidia.dali.types import _type_name_convert_to_string, _type_convert_value, \
_vector_element_type, _bool_types, _int_types, _int_like_types, _float_types, \
DALIDataType, CUDAStream, Constant
DALIDataType, CUDAStream, ScalarConstant as _Constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALIDataType, CUDAStream, ScalarConstant as _Constant
DALIDataType, CUDAStream, Constant as _Constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it needs to be exactly the ScalarConstant. I can use _ScalarConstant instead.

Copy link
Contributor

@JanuszL JanuszL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add the operator itself.

@mzient
Copy link
Contributor Author

mzient commented Jan 29, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1097209]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1097209]: BUILD PASSED

@mzient mzient requested a review from Kh4L January 29, 2020 13:23
if arg_inp is None:
continue
if isinstance(type(arg_inp), _ScalarConstant):
arg_inp = instantiate_constant_node(arg_inp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only see _instantiate_constant_node (starting with underscore). Is this codepath tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug. Fixed.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@@ -314,14 +317,19 @@ def __init__(self, inputs, op, **kwargs):
# Argument inputs
for k in sorted(kwargs.keys()):
if k not in ["name"]:
if not isinstance(kwargs[k], _EdgeReference):
arg_inp = kwargs[k]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe loop over (keys, values) here? But I don't know what up with sorting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I can overwrite this reference in a loop when enumerating like that. It would certainly be quite unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting is necessary to properly match inputs (which are numbered) to arguments. I'd leave it like this.

dali/python/nvidia/dali/ops.py Show resolved Hide resolved
.AddOptionalArg<int>("shape", "The desired shape of the output. "
"If not set, the data is assumed to be 1D",
std::vector<int>())
.AddOptionalArg<float>("fdata", "Contents of the constant produced (for floating point types).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be prohibited to pass both of fdata and idata to one node. I think we should provide an error in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also warn the user in the doc string?

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Jan 29, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1097734]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1097734]: BUILD PASSED

@mzient mzient requested review from JanuszL and klecki January 30, 2020 07:53
Copy link
Contributor

@jantonguirao jantonguirao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please check my comments


The floating point input data should be placed in `fdata` argument and integer data in `idata`.
The data is a flat vector of values or a single scalar. The data is then reshaped according
to the `shape` argument. If the data is scalar, it will be broadcast to fill the entire shape.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to the `shape` argument. If the data is scalar, it will be broadcast to fill the entire shape.
to the `shape` argument. If the data is scalar, it will be broadcasted to fill the entire shape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to a dictionary:
to broadcast; p. broadcast, occas. broadcasted.

.NumInput(0)
.NumOutput(1)
.AddOptionalArg<int>("shape", "The desired shape of the output. "
"If not set, the data is assumed to be 1D",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: messy indentation here

assert(!idata_.empty());
FillTensorList<type>(output_, output_shape_, idata_, ws.stream());
}
), (DALI_FAIL("Unsupported type"))); // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print the type

#include "dali/core/static_switch.h"

#define CONSTANT_OP_SUPPORTED_TYPES \
(bool, int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering, how about float16? It might be useful if we want to make a quick synthetic pipeline producing constant data just to measure the training speed without preprocessing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need special treatment for GPU code to compile, but yeah, I can try.

"explicitly before casting to builtin `float`.").format(_float_types))

def __str__(self):
return "{}:{}".format(self.value, self.dtype)

def __repr__(self):
return "{}".format(self.value)

def _is_scalar_shape(shape):
return shape is None or shape == 1 or shape == [1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return shape is None or shape == 1 or shape == [1]
return shape is None or shape == 1 or shape == [1] or shape == (1,)

Does it make sense?

if value.dtype == np.uint64:
value = value.astype(np.uint32)

def _numpy_to_dali_type(t):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems useful enough to be exposed

@@ -0,0 +1,144 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: 2020

Comment on lines +80 to +86
bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override {
output_desc.resize(1);
if (output_shape_.empty()) {
int batch_size = this->spec_.template GetArgument<int>("batch_size");
output_shape_ = uniform_list_shape(batch_size, shape_arg_);
}
output_desc[0] = { output_shape_, TypeTable::GetTypeInfo(output_type_) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not want executor to allocate for us, why do we bother with this function?

Copy link
Contributor Author

@mzient mzient Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. That is, the function can be there, but filling the output_desc is not necessary.


out.Reset();
out.ShareData(&output_);
out.Resize(output_shape_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to cal Resize after ShareData?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, the metadata (without the layout) is copied.


int n = tmp.size() * sizeof(Dst);
for (int i = 0; i < shape.num_samples(); i++)
cudaMemcpyAsync(dst.mutable_tensor<Dst>(i), tmp.data(), n, cudaMemcpyHostToDevice, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this happens once (or rather batch size times), but someone complained to me once about invoking a lot of cudaMemcpys.

You probably can do the Host -> Device copy once and than do Device -> Device for the rest.

void FillTensorVector(
TensorVector<CPUBackend> &dst, const TensorListShape<> &shape, const std::vector<Src> &src) {
dst.SetContiguous(false);
dst.Resize(shape);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Resize + mutable_data<Dst> later will actually do a batch size of allocations.

If you really want to save that memory, you probably should do a singe Tensor, and share data with it.

Do we really want to be that clever or should we just go with one big allocation and have the actual TensorList underneath here?

It was supposed to get optimized in some MakeContiguous scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the constant node is supposed to be copied to the GPU, then it should have had been created with device="gpu" in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the first point - I traced in in the debugger, it doesn't allocate any memory because at the point of resize the underlying tensors don't have a type yet.

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape = value.shape
data = value.flatten().tolist()
else:
def _type_from_value_or_list(v):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What with bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

shape = shape, dtype = dtype, layout = layout)
return op()

def Constant(value, dtype = None, shape = None, layout = None, device = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pass additional kwargs to the ops.Constant?
I'm thinking about naming the operator for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

yield _test_scalar_constant_promotion, "cpu"
yield _test_scalar_constant_promotion, "gpu"

def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let nose run the test, and duplicate that. Should we maintain both the main and the test case definitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much easier to filter it here for debugging purposes.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Jan 31, 2020

!build

@mzient mzient requested a review from klecki January 31, 2020 11:16
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1101517]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1101517]: BUILD PASSED

@mzient
Copy link
Contributor Author

mzient commented Jan 31, 2020

@klecki Regarding docs:
I agree, we should update - and so should we also provide an example. But, as you said, the code is indeed backward-compatible and we can adjust docs in a separate PR. This one has grown big enough.

@mzient mzient merged commit ab36ff5 into NVIDIA:master Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants