Add video support to NeVA LLM conversation data [VideoNeVA] #8220

PannuMuthu · 2024-01-23T01:13:02Z

What does this PR do ?

Refactors preprocess_multimodal to support video tokens in LLM conversation data

Collection: Multimodal - NeVA

Changelog

Adds videoneva_dataset.py and conversation.py to /nemo/collections/multimodal/data/
videoneva_dataset.py implements TarOrFolderVisualLoader (modified from neva_dataset.py), tokenize, and preprocess_multimodal
- TarOrFolderVisualLoader is a modification of TarOrFolderImageLoader (from neva_dataset.py) which extends support to video files with the open_video function
- tokenize is identical to neva_dataset.py implementation
- preprocess_multimodal extends preprocess_multimodal from neva_dataset.py to include support for video tokens
videoneva/conversation.py is a modification of neva/conversation.py to include additional video-related constants

Usage

You can potentially add a usage example below

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

PannuMuthu · 2024-01-25T23:44:43Z

@yaoyu-33

yaoyu-33 · 2024-01-26T21:04:41Z

nemo/collections/multimodal/data/videoneva/videoneva_dataset.py

@@ -0,0 +1,221 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.


rename to video_neva_dataset

similar, too many copy paste from image neva, re-use some of the components?

Agreed. Removed tokenize because it is identical to neva_dataset.py implementation. preprocess_multimodal must be reimplemented in video_neva_dataset.py because of video token integration.

I have split TarOrFolderVisualLoader which processes images and videos together into TarOrFolderImageLoader and TarOrFolderVideoLoader, where TarOrFolderImageLoader is imported from neva_dataset.py and TarOrFolderVideoLoader is implemented in video_neva_dataset.py.

yaoyu-33 · 2024-01-26T21:05:41Z

examples/multimodal/multimodal_llm/neva/conf/neva_peft.yaml

@@ -193,6 +193,7 @@ model:
    data_path:
    lazy_preprocess: True
    is_multimodal: True
+    num_frames: 8


set default to null and add a comment plz, similar above

yaoyu-33 · 2024-01-26T21:09:41Z

nemo/collections/multimodal/data/videoneva/conversation.py

@@ -0,0 +1,411 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.


why not reuse neva conversation.py?

Good point, will reuse nemo.collections.multimodal.data.neva.conversation

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

for more information, see https://pre-commit.ci

yaoyu-33 · 2024-02-02T20:43:48Z

nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+        return None
+
+
+def preprocess_multimodal(sources: dict, multimodal_cfg: dict, cur_token_len: int, use_plain: bool = False) -> Dict:


don't see where this class getting used yet. Is this a partial pr?

github-actions · 2024-02-17T01:43:24Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-02-25T01:44:47Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+from torch.utils.data import Dataset, default_collate
+from transformers import CLIPImageProcessor
+
+import nemo.collections.multimodal.data.neva.conversation as conversation_lib


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import copy


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+import copy
+import json


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+# limitations under the License.
+import copy
+import json
+import logging


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+import json
+import logging
+import os
+import re


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+from omegaconf import DictConfig
+from PIL import Image
+from torch.utils.data import Dataset, default_collate
+from transformers import CLIPImageProcessor


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+from transformers import CLIPImageProcessor
+
+import nemo.collections.multimodal.data.neva.conversation as conversation_lib
+from nemo.collections.multimodal.data.clip.augmentations.augmentations import image_transform


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+from nemo.collections.multimodal.data.neva.conversation import (
+    DEFAULT_BOS_TOKEN,
+    DEFAULT_EOS_TOKEN,
+    DEFAULT_IM_END_TOKEN,
+    DEFAULT_IM_START_TOKEN,
+    DEFAULT_IMAGE_PATCH_TOKEN,
+    DEFAULT_IMAGE_TOKEN,
+    DEFAULT_LABELS_TOKEN,
+    DEFAULT_PAD_TOKEN,
+    DEFAULT_SEPARATOR_TOKEN,
+    DEFAULT_SYSTEM_TOKEN,
+    DEFAULT_UNK_TOKEN,
+)


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+    DEFAULT_SYSTEM_TOKEN,
+    DEFAULT_UNK_TOKEN,
+)
+from nemo.collections.multimodal.data.neva.neva_dataset import TarOrFolderImageLoader


nemo/collections/multimodal/data/videoneva/video_neva_dataset.py

+    DEFAULT_UNK_TOKEN,
+)
+from nemo.collections.multimodal.data.neva.neva_dataset import TarOrFolderImageLoader
+from nemo.collections.nlp.modules.common.megatron.utils import get_ltor_masks_and_position_ids


github-actions · 2024-08-11T01:55:51Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-08-19T01:52:52Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

Pratyush Muthukumar and others added 6 commits January 25, 2024 09:42

add video support to preprocess_multimodal

f8f4046

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

da65867

for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

videoneva preprocess_multimodal implementation

470506e

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

revert neva

1f85291

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

156c34c

for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

signoff

59d52a4

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

PannuMuthu force-pushed the video-preprocess-multimodal branch from 88fbe60 to 59d52a4 Compare January 25, 2024 17:42

PannuMuthu marked this pull request as ready for review January 25, 2024 17:42

PannuMuthu changed the title ~~Add video support to preprocess_multimodal~~ Add video support to NeVA LLM Conversation Data [VideoNeVA] Jan 25, 2024

PannuMuthu changed the title ~~Add video support to NeVA LLM Conversation Data [VideoNeVA]~~ Add video support to NeVA LLM conversation data [VideoNeVA] Jan 25, 2024

yaoyu-33 reviewed Jan 26, 2024

View reviewed changes

requested changes

6cb1ab1

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

PannuMuthu force-pushed the video-preprocess-multimodal branch from cb75f89 to 6cb1ab1 Compare January 29, 2024 22:36

Pratyush Muthukumar and others added 2 commits January 29, 2024 14:37

separate video + image loader

1474f76

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

14efbf8

for more information, see https://pre-commit.ci

PannuMuthu requested a review from yaoyu-33 January 29, 2024 22:39

yaoyu-33 reviewed Feb 2, 2024

View reviewed changes

PannuMuthu marked this pull request as draft February 2, 2024 22:29

github-actions bot added the stale label Feb 17, 2024

github-actions bot closed this Feb 25, 2024

yaoyu-33 reopened this Apr 5, 2024

github-actions bot added the Multi Modal label Apr 5, 2024

github-advanced-security bot found potential problems Apr 5, 2024

View reviewed changes

github-actions bot removed the stale label Jul 27, 2024

github-actions bot added the stale label Aug 11, 2024

github-actions bot closed this Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add video support to NeVA LLM conversation data [VideoNeVA] #8220

Add video support to NeVA LLM conversation data [VideoNeVA] #8220

PannuMuthu commented Jan 23, 2024 •

edited

Loading

PannuMuthu commented Jan 25, 2024

yaoyu-33 Jan 26, 2024

yaoyu-33 Jan 26, 2024

PannuMuthu Jan 29, 2024

yaoyu-33 Jan 26, 2024

yaoyu-33 Jan 26, 2024

PannuMuthu Jan 29, 2024

yaoyu-33 Feb 2, 2024

github-actions bot commented Feb 17, 2024

github-actions bot commented Feb 25, 2024

github-actions bot commented Aug 11, 2024

github-actions bot commented Aug 19, 2024

		@@ -0,0 +1,221 @@
		# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.

		@@ -0,0 +1,411 @@
		# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.

		return None


		def preprocess_multimodal(sources: dict, multimodal_cfg: dict, cur_token_len: int, use_plain: bool = False) -> Dict:

Add video support to NeVA LLM conversation data [VideoNeVA] #8220

Add video support to NeVA LLM conversation data [VideoNeVA] #8220

Conversation

PannuMuthu commented Jan 23, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

PannuMuthu commented Jan 25, 2024

yaoyu-33 Jan 26, 2024

Choose a reason for hiding this comment

yaoyu-33 Jan 26, 2024

Choose a reason for hiding this comment

PannuMuthu Jan 29, 2024

Choose a reason for hiding this comment

yaoyu-33 Jan 26, 2024

Choose a reason for hiding this comment

yaoyu-33 Jan 26, 2024

Choose a reason for hiding this comment

PannuMuthu Jan 29, 2024

Choose a reason for hiding this comment

yaoyu-33 Feb 2, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 17, 2024

github-actions bot commented Feb 25, 2024

github-actions bot commented Aug 11, 2024

github-actions bot commented Aug 19, 2024

PannuMuthu commented Jan 23, 2024 •

edited

Loading