-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add video support to NeVA LLM conversation data [VideoNeVA] #8220
Conversation
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
for more information, see https://pre-commit.ci Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
88fbe60
to
59d52a4
Compare
@@ -0,0 +1,221 @@ | |||
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to video_neva_dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar, too many copy paste from image neva, re-use some of the components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Removed tokenize
because it is identical to neva_dataset.py
implementation. preprocess_multimodal
must be reimplemented in video_neva_dataset.py
because of video token integration.
I have split TarOrFolderVisualLoader
which processes images and videos together into TarOrFolderImageLoader
and TarOrFolderVideoLoader
, where TarOrFolderImageLoader
is imported from neva_dataset.py
and TarOrFolderVideoLoader
is implemented in video_neva_dataset.py
.
@@ -193,6 +193,7 @@ model: | |||
data_path: | |||
lazy_preprocess: True | |||
is_multimodal: True | |||
num_frames: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set default to null and add a comment plz, similar above
@@ -0,0 +1,411 @@ | |||
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reuse neva conversation.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will reuse nemo.collections.multimodal.data.neva.conversation
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
cb75f89
to
6cb1ab1
Compare
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
for more information, see https://pre-commit.ci
return None | ||
|
||
|
||
def preprocess_multimodal(sources: dict, multimodal_cfg: dict, cur_token_len: int, use_plain: bool = False) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see where this class getting used yet. Is this a partial pr?
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
from torch.utils.data import Dataset, default_collate | ||
from transformers import CLIPImageProcessor | ||
|
||
import nemo.collections.multimodal.data.neva.conversation as conversation_lib |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import copy |
Check notice
Code scanning / CodeQL
Unused import Note
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import copy | ||
import json |
Check notice
Code scanning / CodeQL
Unused import Note
# limitations under the License. | ||
import copy | ||
import json | ||
import logging |
Check notice
Code scanning / CodeQL
Unused import Note
import json | ||
import logging | ||
import os | ||
import re |
Check notice
Code scanning / CodeQL
Unused import Note
from omegaconf import DictConfig | ||
from PIL import Image | ||
from torch.utils.data import Dataset, default_collate | ||
from transformers import CLIPImageProcessor |
Check notice
Code scanning / CodeQL
Unused import Note
from transformers import CLIPImageProcessor | ||
|
||
import nemo.collections.multimodal.data.neva.conversation as conversation_lib | ||
from nemo.collections.multimodal.data.clip.augmentations.augmentations import image_transform |
Check notice
Code scanning / CodeQL
Unused import Note
from nemo.collections.multimodal.data.neva.conversation import ( | ||
DEFAULT_BOS_TOKEN, | ||
DEFAULT_EOS_TOKEN, | ||
DEFAULT_IM_END_TOKEN, | ||
DEFAULT_IM_START_TOKEN, | ||
DEFAULT_IMAGE_PATCH_TOKEN, | ||
DEFAULT_IMAGE_TOKEN, | ||
DEFAULT_LABELS_TOKEN, | ||
DEFAULT_PAD_TOKEN, | ||
DEFAULT_SEPARATOR_TOKEN, | ||
DEFAULT_SYSTEM_TOKEN, | ||
DEFAULT_UNK_TOKEN, | ||
) |
Check notice
Code scanning / CodeQL
Unused import Note
Import of 'DEFAULT_EOS_TOKEN' is not used.
Import of 'DEFAULT_PAD_TOKEN' is not used.
Import of 'DEFAULT_SEPARATOR_TOKEN' is not used.
Import of 'DEFAULT_SYSTEM_TOKEN' is not used.
Import of 'DEFAULT_UNK_TOKEN' is not used.
Import of 'DEFAULT_LABELS_TOKEN' is not used.
DEFAULT_SYSTEM_TOKEN, | ||
DEFAULT_UNK_TOKEN, | ||
) | ||
from nemo.collections.multimodal.data.neva.neva_dataset import TarOrFolderImageLoader |
Check notice
Code scanning / CodeQL
Unused import Note
DEFAULT_UNK_TOKEN, | ||
) | ||
from nemo.collections.multimodal.data.neva.neva_dataset import TarOrFolderImageLoader | ||
from nemo.collections.nlp.modules.common.megatron.utils import get_ltor_masks_and_position_ids |
Check notice
Code scanning / CodeQL
Unused import Note
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
What does this PR do ?
Refactors
preprocess_multimodal
to support video tokens in LLM conversation dataCollection: Multimodal - NeVA
Changelog
videoneva_dataset.py
andconversation.py
to/nemo/collections/multimodal/data/
videoneva_dataset.py
implementsTarOrFolderVisualLoader
(modified fromneva_dataset.py
),tokenize
, andpreprocess_multimodal
TarOrFolderVisualLoader
is a modification ofTarOrFolderImageLoader
(fromneva_dataset.py
) which extends support to video files with theopen_video
functiontokenize
is identical toneva_dataset.py
implementationpreprocess_multimodal
extendspreprocess_multimodal
fromneva_dataset.py
to include support for video tokensvideoneva/conversation.py
is a modification ofneva/conversation.py
to include additional video-related constantsUsage
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information