Skip to content

Enhance VideoReader initialization with path validation#1845

Merged
suiyoubi merged 6 commits intomainfrom
aot/video-pipeline-file-not-found-bug-fix
Apr 22, 2026
Merged

Enhance VideoReader initialization with path validation#1845
suiyoubi merged 6 commits intomainfrom
aot/video-pipeline-file-not-found-bug-fix

Conversation

@suiyoubi
Copy link
Copy Markdown
Contributor

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Ao Tang <aot@nvidia.com>
@suiyoubi suiyoubi requested a review from abhinavg4 as a code owner April 21, 2026 17:02
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@suiyoubi
Copy link
Copy Markdown
Contributor Author

/ok to test 799c7a2

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR adds path validation to VideoReader.__post_init__ for local paths: it checks existence, distinguishes single-file from directory inputs, validates supported video extensions, and skips all checks for remote URLs. Corresponding tests are added with an autouse fixture to keep pre-existing tests passing.

  • The single-file extension check is case-insensitive (.lower()), but the directory rglob patterns are lowercase-only — on Linux, a directory containing only VIDEO.MP4-style files would raise FileNotFoundError even though valid videos are present.

Confidence Score: 4/5

Safe to merge after addressing the case-sensitivity inconsistency in the directory rglob check.

One remaining P2 finding (case-sensitive rglob vs case-insensitive suffix check) describes a real inconsistency that could cause false negatives on Linux; prior P1/P2 concerns from earlier review rounds are mostly addressed (directory guard added, test iterator fixed with side_effect). The change is otherwise well-structured with good test coverage.

nemo_curator/stages/video/io/video_reader.py — specifically the rglob extension patterns on line 269

Important Files Changed

Filename Overview
nemo_curator/stages/video/io/video_reader.py Adds path validation in __post_init__: existence check, single-file extension guard, and directory emptiness guard via rglob. One remaining issue: rglob patterns are lowercase-only while the single-file suffix check is case-insensitive, creating inconsistent behaviour on case-sensitive filesystems.
tests/stages/video/io/test_video_reader.py Adds an autouse fixture that patches Path for all TestVideoReader tests (allowing pre-existing tests to continue passing after the new validation), plus five new targeted test cases covering nonexistent path, empty directory, valid single file, unsupported file, and remote URL. The fixture correctly uses side_effect for fresh iterators per call.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[VideoReader.__post_init__] --> B{is_remote_url?}
    B -->|Yes| Z[Skip validation - OK]
    B -->|No| C[path = Path input_video_path]
    C --> D{path.exists?}
    D -->|No| E[raise FileNotFoundError\nVideo directory does not exist]
    D -->|Yes| F{path.is_file?}
    F -->|Yes| G{suffix.lower in\nvideo_extensions?}
    G -->|No| H[raise FileNotFoundError\nNot a supported video file]
    G -->|Yes| Z
    F -->|No - is dir| I{any rglob match\nfor lowercase ext?}
    I -->|No| J[raise FileNotFoundError\nNo video files found]
    I -->|Yes| Z
Loading

Reviews (5): Last reviewed commit: "Merge branch 'main' into aot/video-pipel..." | Re-trigger Greptile

@sarahyurick sarahyurick added the r1.2.0 Pick this label for auto cherry-picking into r1.2.0 label Apr 21, 2026
Comment thread nemo_curator/stages/video/io/video_reader.py Outdated
Comment thread nemo_curator/stages/video/io/video_reader.py
Comment thread tests/stages/video/io/test_video_reader.py Outdated
video_extensions = (".mp4", ".mov", ".avi", ".mkv", ".webm")
if path.is_file():
if path.suffix.lower() not in video_extensions:
msg = f"Not a supported video file: {self.input_video_path}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this message. Support formats: .....

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion

Copy link
Copy Markdown
Contributor

@abhinavg4 abhinavg4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT comments

Signed-off-by: Ao Tang <aot@nvidia.com>
Comment on lines +266 to +267
msg = f"Not a supported video file: {self.input_video_path}. Supported formats: {', '.join(video_extensions)}"
raise FileNotFoundError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 FileNotFoundError is semantically wrong for a format mismatch

FileNotFoundError signals that the file doesn't exist, but here the file was found — it just has an unsupported extension. Any caller that catches FileNotFoundError to handle "path not found" cases (e.g., to create a path or retry) will silently swallow a format-mismatch error. ValueError (or a custom exception) is the appropriate type here.

Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
@suiyoubi
Copy link
Copy Markdown
Contributor Author

/ok to test 98defc5

@suiyoubi
Copy link
Copy Markdown
Contributor Author

/ok to test 567bc1c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r1.2.0 Pick this label for auto cherry-picking into r1.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants