Skip to content

mixed cache directory when ingest pmp climatology #596

@minxu74

Description

@minxu74

run the following:

export REF_DATASET_CACHE_DIR="/pscratch/sd/m/minxu/mytest_ref_2026-03-14/cache"
export REF_CONFIGURATION="/pscratch/sd/m/minxu/mytest_ref_2026-03-14/config"
export REF_INSTALLATION_DIR="/pscratch/sd/m/minxu/mytest_ref_2026-03-14/climate-ref"

ref datasets ingest --source-type pmp-climatology "${REF_DATASET_CACHE_DIR}/datasets/pmp-climatology"

got the error as follows:

2026-03-14 14:40:04.493 -07:00 | WARNING  | climate_ref.datasets.base - Files to remove: ['/global/homes/m/minxu/.cache/climate_ref/PMP_obs4MIPsClims/ts/gr/v20250224/ts_mon_ERA-5_PCMDI_gr_198101-200412_AC_v20250224_2.5x2.5.nc']
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /pscratch/sd/m/minxu/mytest_ref_2026-03-14/climate-ref/packages/climate-ref/src/climate_ref/cli/datasets.py:202 in ingest                                                                                                                                │
│                                                                                                                                                                                                                                                          │
│   199 │   │   │   │   │   │   logger.info(f"Would save dataset {instance_id} to the database")                                                                                                                                                           │
│   200 │   │   else:                                                                                                                                                                                                                                      │
│   201 │   │   │   # Use shared ingestion logic with pre-validated catalog                                                                                                                                                                                │
│ ❱ 202 │   │   │   stats = ingest_datasets(adapter, None, db, data_catalog=data_catalog, skip_i                                                                                                                                                           │
│   203 │   │   │   stats.log_summary()                                                                                                                                                                                                                    │
│   204 │                                                                                                                                                                                                                                                  │
│   205 │   if solve:                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                          │
│ /pscratch/sd/m/minxu/mytest_ref_2026-03-14/climate-ref/packages/climate-ref/src/climate_ref/datasets/__init__.py:138 in ingest_datasets                                                                                                                  │
│                                                                                                                                                                                                                                                          │
│   135 │   for instance_id, data_catalog_dataset in data_catalog.groupby(adapter.slug_column):                                                                                                                                                            │
│   136 │   │   logger.debug(f"Processing dataset {instance_id}")                                                                                                                                                                                          │
│   137 │   │   with db.session.begin():                                                                                                                                                                                                                   │
│ ❱ 138 │   │   │   results = adapter.register_dataset(db, data_catalog_dataset)                                                                                                                                                                           │
│   139 │   │   │                                                                                                                                                                                                                                          │
│   140 │   │   │   if results.dataset_state == ModelState.CREATED:                                                                                                                                                                                        │
│   141 │   │   │   │   stats.datasets_created += 1                                                                                                                                                                                                        │
│                                                                                                                                                                                                                                                          │
│ /pscratch/sd/m/minxu/mytest_ref_2026-03-14/climate-ref/packages/climate-ref/src/climate_ref/datasets/base.py:316 in register_dataset                                                                                                                     │
│                                                                                                                                                                                                                                                          │
│   313 │   │   if files_to_remove:                                                                                                                                                                                                                        │
│   314 │   │   │   files_removed = list(files_to_remove)                                                                                                                                                                                                  │
│   315 │   │   │   logger.warning(f"Files to remove: {files_removed}")                                                                                                                                                                                    │
│ ❱ 316 │   │   │   raise NotImplementedError("Removing files is not yet supported")                                                                                                                                                                       │
│   317 │   │                                                                                                                                                                                                                                              │
│   318 │   │   # Update existing files if any file-specific metadata has changed                                                                                                                                                                          │
│   319 │   │   for file_path, existing_file in current_file_paths.items():                                                                                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: Removing files is not yet supported

The ingestion code tried to remove the data in the default cache directory $HOME/.cache, and did not respect cache the directory that I set using the environment variables.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions