Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORDEX tables missing some cordex data #146

Closed
paolap opened this issue Mar 4, 2021 · 2 comments
Closed

CORDEX tables missing some cordex data #146

paolap opened this issue Mar 4, 2021 · 2 comments

Comments

@paolap
Copy link
Member

paolap commented Mar 4, 2021

I found an issue either with the clef.nci.org.au database or the database tables.sql file.

This is a simplified version of one of the queries done by clef:

clef=> SELECT * from cordex_dataset WHERE cordex_dataset.driving_experiment_name ='historical' AND cordex_dataset.frequency = 'mon' and cordex_dataset.model_id = 'CSIRO-CCAM';
dataset_id | model_id | frequency | institute_id | cordex_domain | experiment_i
d | rcm_version_id | driving_model_id | driving_experiment_name | driving_model_
ensemble_member
------------+----------+-----------+--------------+---------------+-------------
--+----------------+------------------+-------------------------+---------------

(0 rows)

As you can see here if I put 'CSIRO-CCAM' as model_id it doesn't return any results.

If I query the 'UNSW-WRF360L' as model_id it works:

clef=> SELECT * from cordex_dataset WHERE cordex_dataset.driving_experiment_name ='historical' AND cordex_dataset.frequency = 'mon' and cordex_dataset.model_id = 'UNSW-WRF360L';
dataset_id | model_id | frequency | institute_id
| cordex_domain | experiment_id | rcm_version_id | driving_model_id | driving
_experiment_name | driving_model_ensemble_member
--------------------------------------+--------------+-----------+--------------
+---------------+---------------+----------------+---------------------+--------
-----------------+-------------------------------
915f271a-2b7a-5ae4-56ea-05ac19d40aed | UNSW-WRF360L | mon | UNSW
| AUS-44i | historical | v1 | CSIRO-BOM-ACCESS1-3 | histori
cal | r1i1p1
e5e4d21e-6175-629b-8c32-41af705ae441 | UNSW-WRF360L | mon | UNSW
| AUS-44 | historical | v1 | CSIRO-BOM-ACCESS1-3 | histori
cal | r1i1p1
32445df4-97fb-2287-4dfd-f467ffed8853 | UNSW-WRF360L | mon | UNSW
| AUS-44i | historical | v1 | CSIRO-BOM-ACCESS1-0 | histori
cal | r1i1p1
fbf3f02a-29f1-bc8e-6341-a1602f8588aa | UNSW-WRF360L | mon | UNSW
| AUS-44 | historical | v1 | CSIRO-BOM-ACCESS1-0 | histori
cal | r1i1p1
(4 rows)

Using the default clef query based on checksums, I can find the CCAM results:

clef cordex -e historical -v tas -m CCAM -d AUS-44i -f mon
/g/data/rr3/publications/CORDEX/output/AUS-44i/CSIRO/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CSIRO-CCAM/v201312/mon/tas/files/d20170804/
/g/data/rr3/publications/CORDEX/output/AUS-44i/CSIRO/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CSIRO-CCAM/v201312/mon/tas/v20170804/
/g/data/rr3/publications/CORDEX/output/AUS-44i/CSIRO/CSIRO-BOM-ACCESS1-0/historical/r1i1p1/CSIRO-CCAM/v201312/mon/tas/files/d20170804/
/g/data/rr3/publications/CORDEX/output/AUS-44i/CSIRO/CSIRO-BOM-ACCESS1-0/historical/r1i1p1/CSIRO-CCAM/v201312/mon/tas/v20170804/
.....

This last first runs the same query on the ESGF, extracts the checksums and then query clef.nci.org.au for these specific checksums,

This means that somehow these directories have been crawled but possibly the metadata has not being extracted to cordex_dataset?
Or tables.sql needs refreshing?

@paolap
Copy link
Member Author

paolap commented Mar 22, 2021

Found issue, lots of these files attributes do not follow conventions including the project_id was CORDEX-Australia, which doesn't exists, the model name which is different from official CCAM (CCAM-1391M). While the domain is not included at all. While we can't solve all this issues it makes sense to change this part of tables.sql

FROM metadata
WHERE md_type = 'netcdf'
AND md_json->'attributes'->>'project_id' = 'CORDEX'

to

AND md_json->'attributes'->>'project_id' LIKE 'CORDEX%'

In this way also legitimate CORDEX sub-projects like CORDEX-ESD are included.

I also changed other parts of the code where a similar exclusion might apply, for example in cli.py I changed
project='CORDEX'
to
project="CORDEX,CORDEX-Adjust,CORDEX-ESD,CORDEXReklies"
In this way the ESGF queries will search for data in any of the projects.
Before calling local I changed this back to 'CORDEX' because we put all the projects in one table in the database.

Waiting for NCI tor fresh the tables to do final tests.

@paolap paolap mentioned this issue Apr 29, 2021
@paolap
Copy link
Member Author

paolap commented May 4, 2021

Solved in v1.3.0 release

@paolap paolap closed this as completed May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant