Fix migration of assets with duplicate filenames and improve model logging by chasefortier · Pull Request #19 · comet-ml/cometx

chasefortier · 2026-02-19T19:05:03Z

Summary

Fix duplicate asset loss during migration: Assets with the same fileName (e.g., two remote models both named "model") were silently dropped during the copy phase. The assets_metadata dict was keyed by fileName, so only the last entry per name survived. Changed to a list to preserve all entries. Additionally, non-remote assets with duplicate names now get unique disk filenames (e.g., model (1)) via a new diskFileName metadata field, preventing file overwrites during download.
Log models using the proper model API: Remote model-element assets were being logged with log_remote_asset(), causing them to appear as generic assets instead of under the model registry folder. Now uses log_remote_model() with the correct model name and sync_mode=False. Similarly, local model-element assets now use the public log_model() API instead of the internal _log_asset().
Refactor _log_asset signature: The method now receives the asset data dict directly instead of a filename key used to look up a shared dictionary, which simplifies the code and eliminates the coupling to the old dict-based structure.

Files changed
cometx/cli/copy.py — metadata loading, log_assets, and _log_asset refactored
cometx/framework/comet/download_manager.py — duplicate filename tracking during download

Tests Run:
[ ] Create an experiment with duplicate asset names (e.g., two log_remote_model calls with the same model_name, two log_remote_asset calls with the same remote_file_name, and two log_asset calls with the same file)
exp_duplicate_assets.py
[ ] Download the experiment with cometx download
[ ] Copy the experiment with cometx copy to a new project
[ ] Verify all assets (including duplicates) appear in the destination experiment
[ ] Verify remote models appear under the correct model folder (not as generic assets)
[ ] Verify non-duplicate assets also appear properly

The copy phase used a dict keyed by fileName to store asset metadata, causing duplicate-named assets to be silently dropped. The download phase also saved duplicate-named non-remote assets to the same file path, overwriting earlier ones. Change assets_metadata from a dict to a list so all entries are preserved, and assign unique diskFileName values during download for non-remote assets with colliding names.

Updates code to use log_model/log_remote_model to log model assets, rather than generically log_asset/log_remote_asset. This ensures that the model elements roll into the right parent model, and that the models can be registered to the registry if desired.

dsblank

Review

The core fix (dict → list for assets_metadata) is solid and addresses a real data-loss bug. The _log_asset refactor and switch to log_model/log_remote_model are good improvements. A few issues to flag:

Bug: `filename_counts` initialized inside the metadata-write block but used outside it

In download_manager.py, filename_counts = {} is initialized inside the if self._should_write(filepath): block. However, the download loop that follows (which reads asset.get("diskFileName", ...)) runs regardless of that condition. If _should_write returns False (i.e., metadata already exists), filename_counts is never populated, and the diskFileName keys were never written to the metadata file either. This means on a re-download where metadata already exists, the code reads the old metadata file (which lacks diskFileName fields) and the in-memory assets list never got diskFileName set, so it falls back to fileName — duplicates would still overwrite each other on disk. Consider initializing filename_counts and assigning diskFileName unconditionally, before both the metadata-write and the download loop.

Potential issue: `log_model` drops `step` and `epoch`

The old code passed step, copy_to_tmp, and grouping_name to _log_asset. The new log_model() call drops step and epoch. If step/epoch metadata matters for model elements, this is a regression. Worth confirming whether log_model supports step/epoch — if it does, they should be passed through.

Minor: `log_remote_model` doesn't update `asset_map`

For remote model-element assets, the result of log_remote_model() is not captured, so asset_map won't contain a mapping for the old asset ID. If downstream datagrid/embedding rewriting depends on model asset IDs being in the map, this could be an issue.

Minor: `metadata` handling inconsistency for remote assets

For remote model-element assets, metadata is parsed with json.loads(). For remote non-model assets, metadata is passed raw (as a string) to log_remote_asset(). The inconsistency is worth checking — does log_remote_asset expect a string or a dict for its metadata parameter?

Nit: spaces in `diskFileName`

The diskFileName uses "model (1)" style naming with spaces and parens. While functional, filenames with spaces can be annoying in CLI tools. Consider model_1 or model-1 instead.

Overall the fix is well-motivated and the approach is correct. The main actionable item is the filename_counts scoping issue — it should be moved outside the _should_write block so re-downloads also get unique disk filenames.

dsblank

Inline comments on specific lines below.

cometx/framework/comet/download_manager.py

cometx/cli/copy.py

cometx/framework/comet/download_manager.py

Addressing Doug's comments in the review

dsblank

lgtm!

chasefortier added 2 commits February 19, 2026 12:43

Fix model logging

972a28e

Updates code to use log_model/log_remote_model to log model assets, rather than generically log_asset/log_remote_asset. This ensures that the model elements roll into the right parent model, and that the models can be registered to the registry if desired.

dsblank reviewed Feb 19, 2026

View reviewed changes

Address PR Review Feedback

aa0a60a

Addressing Doug's comments in the review

dsblank approved these changes Feb 19, 2026

View reviewed changes

dsblank merged commit 6b0f547 into main Feb 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix migration of assets with duplicate filenames and improve model logging#19

Fix migration of assets with duplicate filenames and improve model logging#19
dsblank merged 3 commits intomainfrom
fix/duplicate-asset-migration

chasefortier commented Feb 19, 2026

Uh oh!

dsblank left a comment

Uh oh!

dsblank left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsblank left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chasefortier commented Feb 19, 2026

Uh oh!

dsblank left a comment

Choose a reason for hiding this comment

Review

Bug: filename_counts initialized inside the metadata-write block but used outside it

Potential issue: log_model drops step and epoch

Minor: log_remote_model doesn't update asset_map

Minor: metadata handling inconsistency for remote assets

Nit: spaces in diskFileName

Uh oh!

dsblank left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsblank left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug: `filename_counts` initialized inside the metadata-write block but used outside it

Potential issue: `log_model` drops `step` and `epoch`

Minor: `log_remote_model` doesn't update `asset_map`

Minor: `metadata` handling inconsistency for remote assets

Nit: spaces in `diskFileName`