Support Unix shell-style wildcards in dataset filter configuration#1146
Support Unix shell-style wildcards in dataset filter configuration#1146rapids-bot[bot] merged 3 commits intoNVIDIA:developfrom
Conversation
WalkthroughUnix shell-style wildcard matching (fnmatch) was added to DatasetFilter via a new static Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant DatasetFilter.apply_filters
participant DatasetFilter._match_wildcard_patterns
participant fnmatch
Caller->>DatasetFilter.apply_filters: apply_filters(df, allowlist, denylist)
alt allowlist present
DatasetFilter.apply_filters->>DatasetFilter._match_wildcard_patterns: (series, allowlist)
DatasetFilter._match_wildcard_patterns->>fnmatch: fnmatch.fnmatch(str(value), pattern) per value
fnmatch-->>DatasetFilter._match_wildcard_patterns: boolean masks
DatasetFilter._match_wildcard_patterns-->>DatasetFilter.apply_filters: combined mask (any pattern)
DatasetFilter.apply_filters->>DatasetFilter.apply_filters: df = df[mask] (keep matching rows)
end
alt denylist present
DatasetFilter.apply_filters->>DatasetFilter._match_wildcard_patterns: (series, denylist)
DatasetFilter._match_wildcard_patterns->>fnmatch: fnmatch.fnmatch(str(value), pattern) per value
fnmatch-->>DatasetFilter._match_wildcard_patterns: boolean masks
DatasetFilter._match_wildcard_patterns-->>DatasetFilter.apply_filters: combined mask (any pattern)
DatasetFilter.apply_filters->>DatasetFilter.apply_filters: df = df[~mask] (remove matching rows)
end
DatasetFilter.apply_filters-->>Caller: filtered DataFrame
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (5)**/*.py📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
{src/**/*.py,packages/*/src/**/*.py}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
src/nat/**/*📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
⚙️ CodeRabbit configuration file
Files:
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2025-10-22T22:02:12.903ZApplied to files:
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/nat/eval/dataset_handler/dataset_filter.py (1)
29-29: Consider adding more detail about supported wildcard patterns.While the current documentation mentions wildcard support, consider expanding it to include examples of the supported patterns (
*,?,[seq],[!seq]) for better developer experience.Example addition to the docstring:
- - Supports Unix shell-style wildcards (*, ?, [seq], [!seq]) for string matching + - Supports Unix shell-style wildcards for string matching: + * `*` matches everything + * `?` matches any single character + * `[seq]` matches any character in seq + * `[!seq]` matches any character not in seq
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/nat/eval/dataset_handler/dataset_filter.py(4 hunks)tests/nat/eval/dataset_handler/test_dataset_filter.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
src/nat/eval/dataset_handler/dataset_filter.pytests/nat/eval/dataset_handler/test_dataset_filter.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
src/nat/eval/dataset_handler/dataset_filter.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/nat/**/* contains core functionality; changes should prioritize backward compatibility
Files:
src/nat/eval/dataset_handler/dataset_filter.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/eval/dataset_handler/dataset_filter.py
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
src/nat/eval/dataset_handler/dataset_filter.pytests/nat/eval/dataset_handler/test_dataset_filter.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/eval/dataset_handler/dataset_filter.pytests/nat/eval/dataset_handler/test_dataset_filter.py
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests live in tests/ or examples/*/tests and use markers defined in pyproject.toml (e.g., integration)
Files:
tests/nat/eval/dataset_handler/test_dataset_filter.py
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}: Use pytest with pytest-asyncio for async code
Test functions should be named with test_ prefix using snake_case
Extract frequently repeated test code into pytest fixtures
Pytest fixtures should define the name argument in @pytest.fixture and the function should be prefixed fixture_ using snake_case
Mock external services with pytest_httpserver or unittest.mock; avoid live endpoints
Mark slow tests with @pytest.mark.slow to skip in default test suite
Mark integration tests requiring external services with @pytest.mark.integration and follow nat-tests guidelines
Files:
tests/nat/eval/dataset_handler/test_dataset_filter.py
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/eval/dataset_handler/test_dataset_filter.py
🧬 Code graph analysis (1)
tests/nat/eval/dataset_handler/test_dataset_filter.py (2)
src/nat/data_models/dataset_handler.py (2)
EvalFilterConfig(47-49)EvalFilterEntryConfig(42-44)src/nat/eval/dataset_handler/dataset_filter.py (2)
DatasetFilter(23-83)apply_filters(65-83)
🪛 Ruff (0.14.3)
src/nat/eval/dataset_handler/dataset_filter.py
60-60: Function definition does not bind loop variable pattern_str
(B023)
🔇 Additional comments (3)
src/nat/eval/dataset_handler/dataset_filter.py (2)
16-16: Good choice of library for wildcard matching.Using
fnmatchfrom the standard library is appropriate for implementing Unix shell-style wildcard patterns with no additional dependencies.
73-74: Wildcard matching integration looks correct.The integration of wildcard pattern matching into both allowlist and denylist filtering is implemented correctly. The change maintains backward compatibility since exact values (without wildcards) still match as expected with
fnmatch.Also applies to: 80-81
tests/nat/eval/dataset_handler/test_dataset_filter.py (1)
107-192: Excellent and comprehensive test coverage!The test suite thoroughly validates the wildcard functionality with coverage for:
- Basic wildcard patterns (
*) in both allowlist and denylist- Single-character wildcards (
?)- Character sets (
[seq])- Mixed wildcard and exact value patterns
- Numeric value handling (converted to strings)
- Backward compatibility with exact matches
All tests follow proper conventions with clear naming, descriptive docstrings, and specific assertions. This provides strong confidence in the implementation.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/nat/eval/dataset_handler/dataset_filter.py (1)
52-52: Consider documenting the string conversion behavior.Converting all values to strings (line 52) enables wildcard matching but means that numeric values like
123will be matched as"123". While this behavior is necessary for wildcard patterns likeiproute2_6[0-5]to work, it could be surprising for users filtering on numeric columns.Consider adding a note in the method's docstring or the class docstring that all values are converted to strings for pattern matching.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/eval/dataset_handler/dataset_filter.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
src/nat/eval/dataset_handler/dataset_filter.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
src/nat/eval/dataset_handler/dataset_filter.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/nat/**/* contains core functionality; changes should prioritize backward compatibility
Files:
src/nat/eval/dataset_handler/dataset_filter.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/eval/dataset_handler/dataset_filter.py
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
src/nat/eval/dataset_handler/dataset_filter.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/eval/dataset_handler/dataset_filter.py
🔇 Additional comments (3)
src/nat/eval/dataset_handler/dataset_filter.py (3)
16-17: LGTM! Clean import of standard library module.The
fnmatchimport is correctly positioned and necessary for Unix shell-style wildcard matching.
29-29: LGTM! Clear documentation of wildcard support.The docstring update clearly documents the new Unix shell-style wildcard functionality.
39-64: Good resolution of the static analysis warning.The list comprehension approach (lines 60-61) effectively addresses the B023 warning from the previous review by avoiding lambda capture of loop variables. The implementation is clean and the type hints and docstring follow the coding guidelines.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/eval/dataset_handler/dataset_filter.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
src/nat/eval/dataset_handler/dataset_filter.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
src/nat/eval/dataset_handler/dataset_filter.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/nat/**/* contains core functionality; changes should prioritize backward compatibility
Files:
src/nat/eval/dataset_handler/dataset_filter.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/eval/dataset_handler/dataset_filter.py
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
src/nat/eval/dataset_handler/dataset_filter.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/eval/dataset_handler/dataset_filter.py
🔇 Additional comments (4)
src/nat/eval/dataset_handler/dataset_filter.py (4)
16-17: LGTM: Appropriate import for wildcard matching.The
fnmatchimport is correctly placed and is the standard library module for Unix shell-style pattern matching.
39-64: LGTM: Clean implementation that addresses previous static analysis warning.The method correctly implements wildcard pattern matching using
fnmatch. The list comprehension approach (lines 60-61) properly addresses the B023 static analysis warning from the previous review by avoiding lambda capture.Note: Converting values to strings (line 52) means numeric patterns are matched as strings (e.g., pattern "12*" matches numeric ID 123). This is appropriate for the intended use case with patterns like "iproute2_*".
74-75: LGTM: Allowlist filtering correctly updated for wildcard support.The change from exact-match
isin()to_match_wildcard_patterns()properly enables Unix shell-style wildcard matching for allowlist filters.
81-82: LGTM: Denylist filtering correctly updated for wildcard support.The change from exact-match
isin()to_match_wildcard_patterns()properly enables Unix shell-style wildcard matching for denylist filters. The negation operator~correctly excludes matching rows.
f97a47d to
49423e2
Compare
Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Updated the pattern matching logic in the DatasetFilter class to utilize list comprehension instead of a lambda function for improved performance and readability. This change enhances the efficiency of checking patterns against the input series. Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
- Also include types to docstring args and returns Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
49423e2 to
71a0daf
Compare
|
/merge |
|
/ok to test 71a0daf |
…VIDIA#1146) This PR adds support for Unix shell-style wildcards `(*, ?, [seq], [!seq])` in dataset filter configurations. In below example, dataset entries starting with `iproute2_` are included except for those with 3-digit suffix, and suffixes from 60-65 ```yaml filter: allowlist: field: id: - iproute2_* denylist: field: id: - iproute2_??? - iproute2_6[0-5] ``` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added Unix shell-style wildcard pattern matching to dataset filters (supports *, ?, and character classes); numeric values are treated as strings for pattern matching. * **Tests** * Added comprehensive tests for allowlist, denylist, mixed patterns, single-character and character-class matching, numeric-string handling, and performance of exact-match paths. * **Documentation** * Updated filter guidance to note wildcard support. Authors: - Patrick Chin (https://github.com/thepatrickchin) Approvers: - Will Killian (https://github.com/willkill07) URL: NVIDIA#1146 Signed-off-by: Sangharsh Aglave <aglave@synopsys.com>
Description
This PR adds support for Unix shell-style wildcards
(*, ?, [seq], [!seq])in dataset filter configurations.In below example, dataset entries starting with
iproute2_are included except for those with 3-digit suffix, and suffixes from 60-65By Submitting this PR I confirm:
Summary by CodeRabbit