Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused attributes when processing CMIP6Plus datasets #723

Closed
mauzey1 opened this issue Jan 16, 2024 · 14 comments
Closed

Remove unused attributes when processing CMIP6Plus datasets #723

mauzey1 opened this issue Jan 16, 2024 · 14 comments
Milestone

Comments

@mauzey1
Copy link
Collaborator

mauzey1 commented Jan 16, 2024

As discussed in #718 (comment)

Some of these might be best as their own issues, but the small tweaks we've discussed are;

  • remove product field from output - currently defaults to model-output with the value at the top of a table overwriting this

  • avoid further_info_url automatic creation as this isn't going to be used beyond CMIP6 for now

  • (possibly) remove short_description from output -- it relies on mip_era from tables (now removed) and this field will do something strange when the CORDEX tables are picked up.

EDIT: The above attributes are to be removed when processing datasets for CMIP6Plus. CMIP6 datasets should retain these attributes.

To determine whether an attribute should be removed or not, we can check the CV file's required_global_attributes section. If attributes such as product and further_info_url are not present in required_global_attributes, then CMOR shouldn't add those attributes to the NetCDF.

@mauzey1 mauzey1 changed the title Remove unused global attributes Remove unused attributes when processing CMIP6Plus datasets Jan 17, 2024
@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jan 26, 2024

Through a combination of setting the default value of product in CMOR from model_output to an empty string and removing the product attribute from the headers of mip-cmor-tables I was able to remove the attribute from the netCDF file.

Is product an attribute that we plan to keep in the headers of mip-cmor-tables. If not, then removing it will make this part of the task easy. Otherwise, CMOR will need to not set the product value from the table header if the attribute is not in required_global_attributes. Do we want attributes that are not required by the CV to not be set even if the tables contain values for them?

@taylor13
Copy link
Collaborator

product is meant to distinguish model_output from, for example, observations and forcing_dataset, so I think it should be included as a global attribute in the file and should be harvested by ESGF into its catalog. What would be the rationale for removing it?

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jan 26, 2024

@taylor13 That would be a question for @matthew-mizielinski to answer.

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jan 26, 2024

Are we all in agreement that further_info_url is an attribute should not be used beyond CMIP6?

@taylor13
Copy link
Collaborator

o.k. by me.

@durack1
Copy link
Contributor

durack1 commented Jan 26, 2024

@mauzey1 yes, at least for CMIP6Plus, input4MIPs and obs4MIPs this is not supported by the ES-DOC/IS-ENES folks, so we need it removed as a required global attribute. For CMIP7, in a couple of years, it may reappear, but not for the immediate future

@durack1
Copy link
Contributor

durack1 commented Jan 26, 2024

product is meant to distinguish model_output from, for example, observations and forcing_dataset, so I think it should be included as a global attribute in the file and should be harvested by ESGF into its catalog. What would be the rationale for removing it?

Thanks for chiming in @taylor13. You make a very good point about having this to delineate between projects (or is it mip_era, or activity_id). We do need to figure out how to manage all this, if we had CVs that captured these options in the mip-cmor-tables (highest level "multi-verse") repo it would make more sense to me - so model-output, observations, forcing which was never defined in the CMIP6_CVs, in input4MIPs and obs4MIPs we have derived, observations, reanalysis

@taylor13
Copy link
Collaborator

Yes, projects/activity_id/mip_era have sometimes been used to indicate something about "project", and under each project we can have multiple output types. (In CMIP5 we distinguished between output1 and output2, but folks found that confusing.). The terms should be clearly defined in CVs, but the allowed values would be project-specific.

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Feb 2, 2024

@durack1 @taylor13 Can somebody explain the reason behind this section of cmor_setGblAttr?

cmor/Src/cmor.c

Lines 3049 to 3056 in c2f7450

//
// Copy block to ensure all attributes are set for obs4MIPs
// especially (source_label)
//
if ( cmor_current_dataset.furtherinfourl[0] != '\0') {
ierr += cmor_CV_checkSourceID(cmor_tables[nVarRefTblID].CV);
ierr += cmor_CV_checkFurtherInfoURL(nVarRefTblID);
}

The source_id and further_info_url check is done here if the further_info_url value of the dataset isn't an empty string. By default, CMOR sets the the value of further_info_url to a built-in template value. Unless the user explicitly sets further_info_url to be an empty string, these checks will happen. Shouldn't the source_id check happen regardless of what further_info_url is set to?

@taylor13
Copy link
Collaborator

taylor13 commented Feb 2, 2024

Yes, I think the check on source_id should always be done. Perhaps it is done elsewhere and then repeated here (for no reason I can think of) when there is a further_info_url defined. There's no reason to check it twice.

@durack1
Copy link
Contributor

durack1 commented Feb 2, 2024

I agree with @taylor13. We might have a further_info_url in the CMIP7 project, it depends if the ES-Doc folks manage to get supported in time for the specs to be defined. As it is, this would be a relatively trivial additional to the project CVs, but it would be useful to think about what structure we would need to put in place so that such attributes could be validated by CMOR in a generic way - i.e. a template that requires certain info to be defined in the CVs to check against

@taylor13
Copy link
Collaborator

taylor13 commented Feb 3, 2024

Yes, each new project (i.e., each CMIP phase, obs4MIPs phase, and input4MIPs phase, where phase implies a new set of required attributes) could "tell" CMOR which of its global attributes are required (and which of those are constrained by CVs or by templates), and then CMOR could check that the attribute had been defined (and that it conformed to the CV or template).

@durack1
Copy link
Contributor

durack1 commented Feb 14, 2024

@matthew-mizielinski @wolfiex can you both take a quick pass over the above, just to make sure no obvious issues are not being considered? @mauzey1 is nearing finalizing a CMOR 3.8.0 release, so wanted to double check whether we're good to start testing with the current main branches in CMIP6Plus_CVs and mip-cmor-tables.

@durack1 durack1 added this to the 3.8.0 milestone Apr 7, 2024
@mauzey1
Copy link
Collaborator Author

mauzey1 commented Apr 8, 2024

Marking this complete since the automatic creation of the further_info_url attribute has been removed, and the other attributes listed can be handled by the CV's required_global_attributes.

@mauzey1 mauzey1 closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants