Fix #8599: Add track_meta and weights_only arguments to PersistentDataset for MetaTensor support. #8628

mccle · 2025-11-12T05:13:38Z

Description

PersistentDataset currently casts all MetaTensor objects to torch.Tensor objects and forces the use of torch.load with weights_only=True. This makes it impossible to save or load metadata to cached files, which may be necessary for accurate post-transform operations.

To address this, this PR introduces the track_meta and weights_only arguments directly to PersistentDataset. They are internally passed to convert_to_tensor and torch.load, respectively. A ValueError is raised when track_meta=True and weights_only=True, since MetaTensor objects cannot be loaded with weights_only=True and the cached files would be continually deleted and rewritten.

These changes restore the ability to cache MetaTensor objects by allowing explicit control over data casting and torch.load behavior. The default values of track_meta=False and weights_only=True will preserve the current behavior of PersistentDataset.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Mason Cleveland <mccleve@umich.edu>

coderabbitai · 2025-11-12T05:13:50Z

Walkthrough

Added two public constructor options to PersistentDataset: track_meta: bool and weights_only: bool. Constructor raises ValueError if both are True. The options are stored as instance attributes and propagated to cache I/O: torch.load(..., weights_only=self.weights_only) when reading and torch.save(..., track_meta=self.track_meta) when writing (via convert_to_tensor(..., track_meta=...)). Tests were extended to cover combinations of track_meta and weights_only, asserting error conditions or that returned items are MetaTensor when expected.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Check constructor signature, validation, and assignment of track_meta and weights_only in monai/data/dataset.py
Verify _cachecheck uses self.weights_only for torch.load and passes track_meta=self.track_meta on save
Review new parameterized tests in tests/data/test_persistentdataset.py for correct setup and assertions (including MetaTensor)

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly and specifically references the issue number and main feature addition (track_meta and weights_only arguments for MetaTensor support).
Description check	✅ Passed	Description covers all required template sections: issue reference, detailed explanation of changes and rationale, types of changes checkboxes properly marked, and validation of testing and documentation updates.
Linked Issues check	✅ Passed	PR fully addresses #8599 objectives: restores metadata caching via track_meta/weights_only arguments, prevents continuous cache deletion, maintains backward-compatible defaults, and adds comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the linked issue: modifications to PersistentDataset parameters, validation logic, torch.load/torch.save behavior, and corresponding test coverage. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/data/test_persistentdataset.py (1)
179-211: Consider validating metadata preservation.

The test correctly validates the type of the returned object, but doesn't verify that metadata is actually preserved when track_meta=True. Consider adding an assertion to check that the MetaTensor contains expected metadata (e.g., affine, filename).

Example enhancement:
             im = test_dataset[0]["image"]
             self.assertIsInstance(im, expected_type)
+            if track_meta and isinstance(im, MetaTensor):
+                self.assertIsNotNone(im.meta.get("filename_or_obj"))
monai/data/dataset.py (1)
446-503: Consider adding support for track_meta and weights_only in CacheNTransDataset.

CacheNTransDataset inherits _cachecheck from PersistentDataset, which uses torch.save/torch.load. Users may want to cache MetaTensors with this dataset type as well.

Add the parameters to the constructor:
 def __init__(
     self,
     data: Sequence,
     transform: Sequence[Callable] | Callable,
     cache_n_trans: int,
     cache_dir: Path | str | None,
     hash_func: Callable[..., bytes] = pickle_hashing,
     pickle_module: str = "pickle",
     pickle_protocol: int = DEFAULT_PROTOCOL,
     hash_transform: Callable[..., bytes] | None = None,
     reset_ops_id: bool = True,
+    track_meta: bool = False,
+    weights_only: bool = True,
 ) -> None:
Then pass them to super:
 super().__init__(
     data=data,
     transform=transform,
     cache_dir=cache_dir,
     hash_func=hash_func,
     pickle_module=pickle_module,
     pickle_protocol=pickle_protocol,
     hash_transform=hash_transform,
     reset_ops_id=reset_ops_id,
+    track_meta=track_meta,
+    weights_only=weights_only,
 )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a8a7e57 and 341538d.

📒 Files selected for processing (2)

monai/data/dataset.py (5 hunks)
tests/data/test_persistentdataset.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

tests/data/test_persistentdataset.py
monai/data/dataset.py

🪛 Ruff (0.14.4)

monai/data/dataset.py

295-298: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)

GitHub Check: packaging
GitHub Check: quick-py3 (macOS-latest)
GitHub Check: build-docs
GitHub Check: flake8-py3 (codeformat)
GitHub Check: quick-py3 (ubuntu-latest)
GitHub Check: quick-py3 (windows-latest)
GitHub Check: flake8-py3 (mypy)
GitHub Check: flake8-py3 (pytype)
GitHub Check: min-dep-os (windows-latest)
GitHub Check: min-dep-os (macOS-latest)
GitHub Check: min-dep-py3 (3.10)
GitHub Check: min-dep-os (ubuntu-latest)
GitHub Check: min-dep-py3 (3.9)
GitHub Check: min-dep-pytorch (2.8.0)
GitHub Check: min-dep-py3 (3.12)
GitHub Check: min-dep-py3 (3.11)
GitHub Check: min-dep-pytorch (2.5.1)
GitHub Check: min-dep-pytorch (2.6.0)
GitHub Check: min-dep-pytorch (2.7.1)

🔇 Additional comments (7)

tests/data/test_persistentdataset.py (2)

23-23: LGTM!

MetaTensor import is necessary for type assertions in the new test cases.

46-52: LGTM!

Test cases comprehensively cover all combinations of track_meta and weights_only flags, including the invalid combination that should raise ValueError.

monai/data/dataset.py (5)

233-234: LGTM!

New parameters have appropriate defaults that preserve backward compatibility.

269-278: LGTM!

Documentation clearly explains the new parameters and their interaction.

294-300: Validation logic is correct.

The check prevents the invalid combination that would cause cache thrashing. Error message is clear.

Minor: Static analysis suggests defining exception messages as constants or within exception classes, but this is a style preference and can be deferred.

398-398: LGTM!

Correctly propagates weights_only to torch.load.

419-419: LGTM!

Correctly propagates track_meta to convert_to_tensor when writing cache.

monai/data/dataset.py

tests/data/test_persistentdataset.py

ericspod

Hi @mccle thanks for this fix. We had changed things in the past in response to a security concern so it's important to discuss the implication of loading more than just weights. Since torch.load still retains this functionality it should be enough to do as they do and make weights_only True by default as you have. We are investigating alternative storage representations that would be safe as well.

Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com> Signed-off-by: Mason C. Cleveland <104479423+mccle@users.noreply.github.com>

Signed-off-by: mccle <mccleve@umich.edu>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/data/test_persistentdataset.py (1)
178-201: Test logic is correct; consider verifying cache reload.

The test correctly implements the suggested contextlib pattern and validates parameter combinations. Consider adding a second PersistentDataset instantiation (after line 198) with the same cache_dir to verify that cached items reload correctly with the same settings, especially for track_meta=True.

Example enhancement:
                # Verify cache reload works with same settings
                test_dataset_reload = PersistentDataset(
                    data=test_data,
                    transform=transform,
                    cache_dir=cache_dir,
                    track_meta=track_meta,
                    weights_only=weights_only,
                )
                im_reload = test_dataset_reload[0]["image"]
                self.assertIsInstance(im_reload, expected_type)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0bacb8b and facc345.

📒 Files selected for processing (1)

tests/data/test_persistentdataset.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

tests/data/test_persistentdataset.py

🔇 Additional comments (2)

tests/data/test_persistentdataset.py (2)

14-14: LGTM on imports.

Both additions are necessary: contextlib for the nullcontext pattern and MetaTensor for type assertions in the new test.

Also applies to: 24-24

47-53: Test cases cover all critical combinations.

The four test cases correctly validate: MetaTensor with track_meta=True, ValueError when both flags are True, and torch.Tensor for default and non-tracking modes.

mccle · 2025-11-14T20:55:36Z

Hello @ericspod, thank you for your quick response! I completely understand your concern about the potential security implications of using torch.load with weights_only=False, so I agree that your warning in the documentation will be helpful for users. I have also verified that your modification to the new test runs successfully with the addition of import contextlib, so I have committed both of those changes as well.

mccle added 4 commits November 11, 2025 22:56

Update PersistentDataset for MetaTensors

9c0e375

Signed-off-by: Mason Cleveland <mccleve@umich.edu>

codeformat autofix

42afa85

Signed-off-by: Mason Cleveland <mccleve@umich.edu>

Add ValueError for track_meta and weights_only; Update docstring

d16616c

Signed-off-by: Mason Cleveland <mccleve@umich.edu>

Add new test

341538d

Signed-off-by: Mason Cleveland <mccleve@umich.edu>

mccle requested review from KumoLiu, Nic-Ma and ericspod as code owners November 12, 2025 05:13

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

mccle mentioned this pull request Nov 12, 2025

PersistentDataset not usable anymore (v1.5.1) ? #8599

Open

ericspod reviewed Nov 14, 2025

View reviewed changes

monai/data/dataset.py Outdated Show resolved Hide resolved

ericspod reviewed Nov 14, 2025

View reviewed changes

tests/data/test_persistentdataset.py Outdated Show resolved Hide resolved

ericspod approved these changes Nov 14, 2025

View reviewed changes

ericspod and others added 4 commits November 14, 2025 20:08

Merge branch 'dev' into dev

c2d9be8

Update PersistentDataset docstring

15f0eb5

Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com> Signed-off-by: Mason C. Cleveland <104479423+mccle@users.noreply.github.com>

Update tests/data/test_persistentdataset.py

0bacb8b

Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com> Signed-off-by: Mason C. Cleveland <104479423+mccle@users.noreply.github.com>

Import contextlib for updated test

facc345

Signed-off-by: mccle <mccleve@umich.edu>

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #8599: Add track_meta and weights_only arguments to PersistentDataset for MetaTensor support. #8628

Fix #8599: Add track_meta and weights_only arguments to PersistentDataset for MetaTensor support. #8628

Uh oh!

mccle commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

ericspod left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

mccle commented Nov 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix #8599: Add track_meta and weights_only arguments to PersistentDataset for MetaTensor support. #8628

Are you sure you want to change the base?

Fix #8599: Add track_meta and weights_only arguments to PersistentDataset for MetaTensor support. #8628

Uh oh!

Conversation

mccle commented Nov 12, 2025

Description

Types of changes

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ericspod left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mccle commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

mccle commented Nov 14, 2025 •

edited

Loading