Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random clip per video #4039

Open
dschoerk opened this issue Jul 7, 2022 · 12 comments
Open

Random clip per video #4039

dschoerk opened this issue Jul 7, 2022 · 12 comments
Assignees
Labels
enhancement New feature or request Video Video related feature/question

Comments

@dschoerk
Copy link

dschoerk commented Jul 7, 2022

Hi, I'm currently experimenting with action recognition on the Kinetics 400 dataset. For training an action recognition model I would like to extract a single (sub-) clip per video in the dataset. What I so far came up with is the following pipeline.

@pipeline_def
def video_pipe(filenames, labels, shuffle):
    x, l = fn.readers.video(device="gpu", filenames=filenames, labels=labels, sequence_length=sequence_length,
                              shard_id=0, num_shards=1, random_shuffle=shuffle, initial_fill=initial_prefetch_size, lazy_init=False, stride=6, step=9999)

Setting step to a high value ensures that I only get one clip per video, but it always starts at the beginning of the video. What I'm trying to achieve is to get a sub clip of N=16 frames at a random starting time. Similar to what is possible in Pytorchvideo. The clip_sampler here samples a random clip of _CLIP_DURATION length.

val_dataset = pytorchvideo.data.Kinetics(
            data_path=os.path.join(_DATA_PATH, "val"),
            clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", self._CLIP_DURATION),
            decode_audio=False,
            transform=val_transform
        )

Best regards and thanks for your help

@awolant awolant assigned awolant and unassigned prak-nv Jul 7, 2022
@awolant awolant added Video Video related feature/question enhancement New feature or request labels Jul 8, 2022
@awolant
Copy link
Contributor

awolant commented Jul 8, 2022

Hi, thanks for creating the issue.

Unfortunately, currently something like this is not supported in DALI.

We are working on improvements to video support in DALI. We introduced new VideoReaderDecoder that will be new way to consume video in DALI. For now it is in experimental module and is not yet ready to be used in production. This is still ongoing work.

I am adding the feature you requested as something to be considered for the new video reader. Tracked internally as DALI-2881

Are there any more features that you would like to see in DALI with regards to video support? We are very interested in making the reworked video reader relevant for our users. Any suggestions and feedback is appreciated. Thanks!

@davidsvy
Copy link

@awolant
This comment is irrelevant to the subject of this issue. However, since you are encouraging people to propose new features, could you implement a framewise argument for nvidia.dali.fn.flip (similar to horizontal, vertical & depthwise)? This way, videos could be randomly reversed in the time dimension.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 10, 2022

Hi @davidsvy,

As @awolant already stated, this is not possible in DALI but it is good to know about such a use case so we can account for it during new video reader development.
Regarding nvidia.dali.fn.flip, depthwise can be threaded as a time dimension, so it should be possible now to reverse to order of frames randomly.

@dschoerk
Copy link
Author

hi @awolant, thank you for the response. meanwhile i have read more through the DALI docs and became aware of the file_list parameter for nvidia.dali.fn.readers.video. along with the option file_list_frame_num it allows to specify start and end time (in frames) for each video, which enables me to create a trimmed sample from each input video. by creating a new file list after each epoch, i can even avoid overfitting to a single trim for each video.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.video.html?highlight=file_list

what i'm currently working on is to integrate this with a pytorch dataloader, such that dali also works with multiple workers. i hope this can work out :)

thank you for making me aware of the new VideoReaderDecoder. will this be an additional feature or a replacement for fn.readers.video? i'm unsure how it is relevant to my issue. i read here that this new feature supports CFR and VFR videos which would definitely be important when processing the original videos from the Kinetics dataset. i currently got this issue out of the way by preprocessing the videos with ffmpeg.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 11, 2022

Hi @dschoerk,

what i'm currently working on is to integrate this with a pytorch dataloader, such that dali also works with multiple workers. i hope this can work out :)

DALI is a drop-in replacement for PyTorch dataloader, running it from multiple threads won't yield any benefit. Please check this part of the DALI documentation to see how to use DALI with PyTorch.

thank you for making me aware of the new VideoReaderDecoder. will this be an additional feature or a replacement for fn.readers.video?

Long term we plan to replace, but we cannot commit to any timeline.

@bpleshakov
Copy link

@awolant
Hello.
I am in desperate need for something like nvidia.dali.fn.decoders.image but for video to be able read video from an external source byte buffer using Nvidia Triton. It would be much appreciated.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 14, 2022

@sapiosexual,

I think it would be best to check [DeepStream][(https://docs.nvidia.com/metropolis/deepstream/dev-guide/). I think it should support a video stream for inference.

@bpleshakov
Copy link

@JanuszL
I believe Deep Stream is not a kind of a tool that I need to use since it is crucial to me an ability to perform various number of video augmentation. It is not clear for me how to make Deep Stream work in that way.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 14, 2022

@sapiosexual,

Can you use DALI and DeepSteam together? DeepSteam to get the video and DALI to process it further (it means to have two ensembles for data processing?

@bpleshakov
Copy link

@JanuszL

I am not sure, need time to explore.
But I definitely don't want to bring one more dep to production env.

@Alwahsh
Copy link

Alwahsh commented Sep 9, 2023

Was DALI-2881 implemented or not yet? Is it visible for you when it will be available?

@JanuszL
Copy link
Contributor

JanuszL commented Sep 11, 2023

@Alwahsh,
I'm sorry but it hasn't been. If you have spare cycles we would be more than happy to review and accept an external contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Video Video related feature/question
Projects
None yet
Development

No branches or pull requests

7 participants