Summary
ref datasets ingest --source-type cmip6 raises NotImplementedError("Removing files is not yet supported") from register_dataset (climate_ref/datasets/base.py:335) whenever any dataset in the input tree has fewer files on disk than the DB has tracked for it. This aborts the whole ingest run; no progress on the rest of the catalog is committed.
Reproduction
- Ingest a CMIP6 tree once:
ref datasets ingest --source-type cmip6 /data/cmip6 (succeeds, registers N files for some dataset).
- Remove or rename one of the
.nc files belonging to that dataset on disk.
- Re-run the same command.
Expected: the dataset is reconciled to match disk (file dropped from the registry), or at least the run continues for the rest of the catalog and the failing dataset is reported and skipped.
Actual: the run aborts on the first dataset where files_to_remove is non-empty.
Fixes
I'm not actually sure about the right behaviour here? Does this dirty any execution groups that uses this dataset?
Summary
ref datasets ingest --source-type cmip6raisesNotImplementedError("Removing files is not yet supported")fromregister_dataset(climate_ref/datasets/base.py:335) whenever any dataset in the input tree has fewer files on disk than the DB has tracked for it. This aborts the whole ingest run; no progress on the rest of the catalog is committed.Reproduction
ref datasets ingest --source-type cmip6 /data/cmip6(succeeds, registers N files for some dataset)..ncfiles belonging to that dataset on disk.Expected: the dataset is reconciled to match disk (file dropped from the registry), or at least the run continues for the rest of the catalog and the failing dataset is reported and skipped.
Actual: the run aborts on the first dataset where
files_to_removeis non-empty.Fixes
I'm not actually sure about the right behaviour here? Does this dirty any execution groups that uses this dataset?