fix: use data_dir for directory paths in ShardedDataset#1301
fix: use data_dir for directory paths in ShardedDataset#1301yeyu-nvidia merged 1 commit intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
datasets' resolve_pattern only matches entries with type=="file", so passing a bare directory path as data_files results in FileNotFoundError even when the directory exists on disk. Detect directory paths and pass them via data_dir instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
cc6f899 to
45e866e
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1301 +/- ##
==========================================
+ Coverage 75.39% 75.69% +0.30%
==========================================
Files 462 462
Lines 49955 49960 +5
==========================================
+ Hits 37662 37817 +155
+ Misses 12293 12143 -150
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
datasets'resolve_patternonly matches entries withtype=="file", so passing a bare directory path asdata_filestoload_datasetresults inFileNotFoundErroreven when the directory exists on diskShardedDataset._load_dataset()and pass them viadata_dirinstead ofdata_filesReproduction
Test plan
data_files)data_files=None(no data_files arg) still works🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes