Fix migration of assets with duplicate filenames and improve model logging#19
Fix migration of assets with duplicate filenames and improve model logging#19
Conversation
The copy phase used a dict keyed by fileName to store asset metadata, causing duplicate-named assets to be silently dropped. The download phase also saved duplicate-named non-remote assets to the same file path, overwriting earlier ones. Change assets_metadata from a dict to a list so all entries are preserved, and assign unique diskFileName values during download for non-remote assets with colliding names.
Updates code to use log_model/log_remote_model to log model assets, rather than generically log_asset/log_remote_asset. This ensures that the model elements roll into the right parent model, and that the models can be registered to the registry if desired.
dsblank
left a comment
There was a problem hiding this comment.
Review
The core fix (dict → list for assets_metadata) is solid and addresses a real data-loss bug. The _log_asset refactor and switch to log_model/log_remote_model are good improvements. A few issues to flag:
Bug: filename_counts initialized inside the metadata-write block but used outside it
In download_manager.py, filename_counts = {} is initialized inside the if self._should_write(filepath): block. However, the download loop that follows (which reads asset.get("diskFileName", ...)) runs regardless of that condition. If _should_write returns False (i.e., metadata already exists), filename_counts is never populated, and the diskFileName keys were never written to the metadata file either. This means on a re-download where metadata already exists, the code reads the old metadata file (which lacks diskFileName fields) and the in-memory assets list never got diskFileName set, so it falls back to fileName — duplicates would still overwrite each other on disk. Consider initializing filename_counts and assigning diskFileName unconditionally, before both the metadata-write and the download loop.
Potential issue: log_model drops step and epoch
The old code passed step, copy_to_tmp, and grouping_name to _log_asset. The new log_model() call drops step and epoch. If step/epoch metadata matters for model elements, this is a regression. Worth confirming whether log_model supports step/epoch — if it does, they should be passed through.
Minor: log_remote_model doesn't update asset_map
For remote model-element assets, the result of log_remote_model() is not captured, so asset_map won't contain a mapping for the old asset ID. If downstream datagrid/embedding rewriting depends on model asset IDs being in the map, this could be an issue.
Minor: metadata handling inconsistency for remote assets
For remote model-element assets, metadata is parsed with json.loads(). For remote non-model assets, metadata is passed raw (as a string) to log_remote_asset(). The inconsistency is worth checking — does log_remote_asset expect a string or a dict for its metadata parameter?
Nit: spaces in diskFileName
The diskFileName uses "model (1)" style naming with spaces and parens. While functional, filenames with spaces can be annoying in CLI tools. Consider model_1 or model-1 instead.
Overall the fix is well-motivated and the approach is correct. The main actionable item is the filename_counts scoping issue — it should be moved outside the _should_write block so re-downloads also get unique disk filenames.
dsblank
left a comment
There was a problem hiding this comment.
Inline comments on specific lines below.
Addressing Doug's comments in the review
Summary
Files changed
cometx/cli/copy.py — metadata loading, log_assets, and _log_asset refactored
cometx/framework/comet/download_manager.py — duplicate filename tracking during download
Tests Run:
[ ] Create an experiment with duplicate asset names (e.g., two log_remote_model calls with the same model_name, two log_remote_asset calls with the same remote_file_name, and two log_asset calls with the same file)
exp_duplicate_assets.py
[ ] Download the experiment with cometx download
[ ] Copy the experiment with cometx copy to a new project
[ ] Verify all assets (including duplicates) appear in the destination experiment
[ ] Verify remote models appear under the correct model folder (not as generic assets)
[ ] Verify non-duplicate assets also appear properly