Skip to content

ingest: NotImplementedError when files are removed from a dataset directory #676

@lewisjared

Description

@lewisjared

Summary

ref datasets ingest --source-type cmip6 raises NotImplementedError("Removing files is not yet supported") from register_dataset (climate_ref/datasets/base.py:335) whenever any dataset in the input tree has fewer files on disk than the DB has tracked for it. This aborts the whole ingest run; no progress on the rest of the catalog is committed.

Reproduction

  1. Ingest a CMIP6 tree once: ref datasets ingest --source-type cmip6 /data/cmip6 (succeeds, registers N files for some dataset).
  2. Remove or rename one of the .nc files belonging to that dataset on disk.
  3. Re-run the same command.

Expected: the dataset is reconciled to match disk (file dropped from the registry), or at least the run continues for the rest of the catalog and the failing dataset is reported and skipped.

Actual: the run aborts on the first dataset where files_to_remove is non-empty.

Fixes

I'm not actually sure about the right behaviour here? Does this dirty any execution groups that uses this dataset?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions