-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video reader resize #2097
Video reader resize #2097
Conversation
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1455408]: BUILD STARTED |
CI MESSAGE: [1455429]: BUILD STARTED |
CI MESSAGE: [1455408]: BUILD FAILED |
CI MESSAGE: [1455429]: BUILD FAILED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1455477]: BUILD STARTED |
!build |
CI MESSAGE: [1455540]: BUILD STARTED |
CI MESSAGE: [1455477]: BUILD FAILED |
} | ||
|
||
frames.set_type(sequence.type()); | ||
frames.Resize(shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use
ShareData(void *ptr, size_t bytes, const TensorListShape<> &shape,
const TypeInfo &type = {}```?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CI MESSAGE: [1455540]: BUILD FAILED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
99db416
to
e7d5513
Compare
build! |
!build |
CI MESSAGE: [1456327]: BUILD STARTED |
CI MESSAGE: [1456327]: BUILD FAILED |
|
||
namespace dali { | ||
|
||
class VideoReaderResize : public VideoReader, protected ResizeAttr, protected ResizeBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheritance or composition?
I don't think ti should inherit from ResizeAttr, not sure about ResizeBase either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Resize
is built, so I wanted to follow this pattern. If we want to change, let's change both, in separate PR maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
output.Resize(output_shape); | ||
} | ||
|
||
void ShareSingleOutput(int data_idx, TensorList<GPUBackend> &batch_output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void ShareSingleOutput(int data_idx, TensorList<GPUBackend> &batch_output, | |
void SequenceToTensorList(int data_idx, TensorList<GPUBackend> &batch_output, |
I'm not sure if ShareSingleOutput tells what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for sample in range(batch_size): | ||
yield [sequences_out[sample]] | ||
|
||
gt_pipeline = dali.pipeline.Pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use ElementExtract to extract all frames from a sequence as a separate batches and call resize on it instead of going through ExternalSource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm mistaken, but I think that writing a pipeline with ElementExtract
that would be robust to changes of batch_size
and sequence_length
throughout the tests might be somewhat involved. You need to adjust number of outputs and calls to Resize
depending on sequence_length
. If you feel strongly about it, I can do it, but maybe we can leave it as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would depend only on the sequence_length
, and you can do it in the loop. So you can compare two pipelines:
VideoReader+seq_lenElementExtract+seq_lenResize vs VideoesizeReader+seq_len*ElementExtract. But it is just an idea. Let us wait for another opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1457164]: BUILD STARTED |
CI MESSAGE: [1457164]: BUILD FAILED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1459098]: BUILD STARTED |
for i in range(video_length): | ||
resized_frames[i] = self.resize(resized_frames[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should do:
for i in range(video_length): | |
resized_frames[i] = self.resize(resized_frames[i]) | |
resized_frames = self.resize(resized_frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Albert Wolant <awolant@nvidia.com>
CI MESSAGE: [1459098]: BUILD FAILED |
@@ -35,6 +35,14 @@ enum t_idInfo : uint32_t { | |||
t_mirrorVert | |||
}; | |||
|
|||
struct TransformMeta { | |||
int H, W, C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: non-standard indentation (we are using 2 spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
void share_frames(TensorList<GPUBackend> &frames) { | ||
void *current_sequence = sequence.raw_mutable_data(); | ||
|
||
TensorListShape<> shape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can:
auto shape = TensorListShape<>::make_uniform(count, frame_shape());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
struct TransformMeta { | ||
int H, W, C; | ||
int rsz_h, rsz_w; | ||
std::pair<int, int> crop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this crop
pair refers to. Sizes? assuming 0 anchors? I'd rather have a CropWindow instance here but you can dismiss this change as out-of-scope if you consider it so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to change existing resize related code as little as possible, so maybe let's leave it for now. AFAIK it will be heavily changed soon.
inline ~VideoReaderResize() override = default; | ||
|
||
protected: | ||
void SetupSharedSampleParams(DeviceWorkspace &ws) override {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
TensorList<GPUBackend> &video_output, | ||
SequenceWrapper &prefetched_video, | ||
DeviceWorkspace &ws) override { | ||
std::fill_n( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::fill_n( | |
auto params = detail::GetResamplingParams(...); | |
for (auto& entry : resample_params_) | |
entry = params; |
is it equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. STL version is more readable for me though, so I would like to leave it as it is, if that is ok with you?
frame = sample[frame_id] | ||
gt_frame = gt_batch[frame_id].at(sample_id) | ||
|
||
if gt_frame.shape == frame.shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
assert (gt_frame.shape == frame.shape), "Shapes are not equal: {} != {}".format(
gt_frame.shape, frame.shape)
assert (gt_frame == frame).all(), "Images are not equal"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I did an if to be able to set the breakpoint inside it and look at stuff while working on it.
batch, = pipeline.run() | ||
batch = batch.as_cpu() | ||
gt_batch = list(gt_pipeline.run()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
gt_batch = [out.as_cpu() for out in gt_pipeline.run()]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return pipeline | ||
|
||
|
||
def compare_video_resize_pipelines(pipeline, gt_pipeline, batch_size, video_length, iterations=16): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you use compare_pipelines
from test_utils.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that straight forward, because gt_pipeline
has different layout. It returns sequence_length
outputs where n-th outputs has n-th frame for all videos.
def test_video_resize(batch_size=2): | ||
for vp in video_reader_params: | ||
for rp in resize_params: | ||
yield run_for_params, batch_size, vp, rp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is printed when you run with nosetest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs 14 separate tests. All info from vp
and rp
is printed. Maybe not supper pretty, but it's all there, so I guess it's fine.
label_output->set_type(TypeInfo::Create<int>()); | ||
label_output->Resize(label_shape_); | ||
label_output_ = &ws.Output<GPUBackend>(output_index++); | ||
label_output_->set_type(TypeInfo::Create<int>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_output_->set_type(TypeInfo::Create<int>()); | |
label_output_->set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<int>())); |
It's rather better to use type table than create a new type info.
BTW, TypeTable should definitely have a method to obtain a TypeInfo from a static type (basically those two methods in one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added this method as: TypeTable::GetTypeInfoFromStatic<int>()
frame_num_output->set_type(TypeInfo::Create<int>()); | ||
frame_num_output->Resize(frame_num_shape_); | ||
frame_num_output_ = &ws.Output<GPUBackend>(output_index++); | ||
frame_num_output_->set_type(TypeInfo::Create<int>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frame_num_output_->set_type(TypeInfo::Create<int>()); | |
frame_num_output_->set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<int>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
timestamp_output->set_type(TypeInfo::Create<double>()); | ||
timestamp_output->Resize(timestamp_shape_); | ||
timestamp_output_ = &ws.Output<GPUBackend>(output_index++); | ||
timestamp_output_->set_type(TypeInfo::Create<double>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp_output_->set_type(TypeInfo::Create<double>()); | |
timestamp_output_->set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<double>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (dtype_ == DALI_FLOAT) { | ||
tl_sequence_output.set_type(TypeInfo::Create<float>()); | ||
output.set_type(TypeInfo::Create<float>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, also GetTypeInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} else { // dtype_ == DALI_UINT8 | ||
tl_sequence_output.set_type(TypeInfo::Create<uint8>()); | ||
output.set_type(TypeInfo::Create<uint8>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here, GetTypeInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, with minor comments
CI MESSAGE: [1459098]: BUILD PASSED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1459445]: BUILD STARTED |
CI MESSAGE: [1459445]: BUILD PASSED |
Why we need this PR?
What happened in this PR?
Create new operator
VideoReaderResize
that combines reading and resizing the videosCode around video reader and resize
VideoReaderResize
, refactoring ofVideoReader
Added Python tests to compare with existing VideoReader and Resize ops
Spec for new op has docs, inherits docs from existing ops as well
JIRA TASK: [Use DALI-681]