Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrePARE missing _FillValue exit #367

Closed
wachsylon opened this issue Jul 6, 2018 · 14 comments · Fixed by #517
Closed

PrePARE missing _FillValue exit #367

wachsylon opened this issue Jul 6, 2018 · 14 comments · Fixed by #517
Projects

Comments

@wachsylon
Copy link
Collaborator

wachsylon commented Jul 6, 2018

Hi,
I have data files with variables which do not include _FillValue and _MissingValue attributes. This causes the following error wihtin PrePARE:

Traceback (most recent call last):
...
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 446, in ControlVocab
    varmissing = infile.variables[var[0]]._FillValue[0]
AttributeError: _FillValue

However, I would like to have an overview about all missing or wrong variable attributes so PrePARE should not exit at this point without checking the other ones. I installed it via conda.
Is there already an option to ignore that error or is it a bug?
I have not looked into the src of PrePARE yet.
Best regards,
Fabi

@taylor13
Copy link
Collaborator

taylor13 commented Jul 7, 2018

As you know, PrePARE is tailored to examine files that are to be published on ESGF with metadata conforming to the CMIP6 specifications. For CMIP6 the variable attributes missing_value and FillValue are required, but I'm surprised that PrePARE checks that they are there, but if it does this is desirable for CMIP6. If you look at the code, let us know what you find.
thanks,
Karl

@wachsylon
Copy link
Collaborator Author

Just to be clear: I think it would be good for data providers - who want to publish data via ESGF - to know all attributes that are missing in the files instead of only _FillValue.

I saw that the code causing this error was changed since commit
6ce9ee2
so I will once more check whether I used an old version however I am pretty sure that I used the newest Conda. If I can install PrePare with the GitHub code it would be great if you can provide a short guidance about which python and so on..

@wachsylon
Copy link
Collaborator Author

Ok after an update to the most recent version the error is gone but instead I have:

Traceback (most recent call last):
  File "/work/bm0021/miniconda2/envs/cmor/bin/PrePARE", line 11, in <module>
    load_entry_point('CMOR==3.3.3', 'console_scripts', 'PrePARE')()
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 857, in main
    errors += sequential_process(source)
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 710, in sequential_process
    checker.ControlVocab(ncfile)
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 482, in ControlVocab
    out_names_tests = json.loads(open(os.path.join(prepare_path, 'out_names_tests.json')).read())
IOError: [Errno 2] No such file or directory: '/mnt/lustre01/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/out_names_tests.json'

Any ideas? Also, it is no longer mentioned in the output that the "_FillValue" is missing.

@wachsylon
Copy link
Collaborator Author

wachsylon commented Jul 16, 2018

And when I include the missing file from the github repo:
cp cmor-git/LibCV/PrePARE/out_names_tests.json /mnt/lustre01/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/out_names_tests.json
I receive again:

Traceback (most recent call last):
  File "/work/bm0021/miniconda2/envs/cmor/bin/PrePARE", line 11, in <module>
    load_entry_point('CMOR==3.3.3', 'console_scripts', 'PrePARE')()
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 857, in main
    errors += sequential_process(source)
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 710, in sequential_process
    checker.ControlVocab(ncfile)
  File "/work/bm0021/miniconda2/envs/cmor/lib/python2.7/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 495, in ControlVocab
    self.dictVar['_FillValue'][0],
KeyError: '_FillValue'

which corresponds to the previous error.

@doutriaux1 doutriaux1 added this to Needed in 3.4 Jul 18, 2018
@doutriaux1
Copy link
Collaborator

@wachsylon concerning #367 (comment) this is really issue #373

@mauzey1 mauzey1 removed this from Needed in 3.4 Nov 29, 2018
@mauzey1 mauzey1 added this to To Do in 3.5 Nov 29, 2018
@taylor13
Copy link
Collaborator

The request is that PrePARE should not error exit on encountering the first error. Rather it should continue on and complete all its checks. This will help users to more quickly modify their output files to be compliant with the specs.

@mauzey1
Copy link
Collaborator

mauzey1 commented Jun 26, 2019

@taylor13 @doutriaux1 Getting back to this, it looks like it tries to get the values for the attributes _FillValue and units from the file before it checks all attributes.

varid = cmip6_cv.setup_variable(variable_cmor_entry,
self.dictVar['units'],
self.dictVar['_FillValue'],
startime,
endtime,
startimebnds,
endtimebnds)

Why do we need _FillValue and units from the file if we are checking them against the attributes in the table?

@doutriaux1
Copy link
Collaborator

@mauzey1 I'm not positive but it looks like we are trying to validate that values stored in the file match what they should be? @taylor13 any idea?

@mauzey1
Copy link
Collaborator

mauzey1 commented Jun 26, 2019

@doutriaux1 The cmip6_cv.setup_variable line is setting up a variable that has attribute values from the table. However, it is getting _FillValue from the file and using that as the value for the table's _FillValue and missing_value. It doesn't change the table value of units though.

edit:
The variable varid is used to validate the attributes in the file in this section:

cv_attrs = cmip6_cv.list_variable_attributes(varid)
for key in cv_attrs:
if key == "long_name":
continue
if key == "comment":
continue
if key == "cell_measures":
if cv_attrs[key].find("OPT") != -1 or cv_attrs[key].find("MODEL") != -1:
continue
# Is this attribute in file?
if key in list(self.dictVar.keys()):
# Verify that attribute value is equal to file attribute
table_value = cv_attrs[key]
file_value = self.dictVar[key]
# PrePARE accept units of 1 or 1.0 so adjust the table_value
if key == "units":
if (table_value == "1") and (file_value == "1.0"):
table_value = "1.0"
if (table_value == "1.0") and (file_value == "1"):
table_value = "1"
if isinstance(table_value, str) and isinstance(file_value, numpy.ndarray):
if numpy.array([int(value) for value in table_value.split()] == file_value).all():
file_value = True
table_value = True
if isinstance(table_value, numpy.ndarray):
table_value = table_value[0]
if isinstance(file_value, numpy.ndarray):
file_value = file_value[0]
if isinstance(table_value, float):
if abs(table_value - file_value) <= 0.00001 * abs(table_value):
table_value = file_value
if key == "cell_methods":
idx = file_value.find(" (")
if idx != -1:
file_value = file_value[:idx]
table_value = table_value[:idx]
if key == "cell_measures":
pattern = re.compile('(?P<param>[\w.-]+): (?P<val1>[\w.-]+) OR (?P<val2>[\w.-]+)')
values = re.findall(pattern, table_value)
table_values = [""] # Empty string is allowed in case of useless attribute
if values:
tmp = dict()
for param, val1, val2 in values:
tmp[param] = [str('{}: {}'.format(param, val1)), str('{}: {}'.format(param, val2))]
table_values.extend([' '.join(i) for i in list(itertools.product(*list(tmp.values())))])
if str(file_value) not in list(map(str, table_values)):
print(BCOLORS.FAIL)
print("=====================================================================================")
print("Your file contains \"" + key + "\":\"" + str(file_value) + "\" and")
print("CMIP6 tables requires \"" + key + "\":\"" + str(table_value) + "\".")
print("=====================================================================================")
print(BCOLORS.ENDC)
self.errors += 1
continue
if str(table_value) != str(file_value):
print(BCOLORS.FAIL)
print("=====================================================================================")
print("Your file contains \"" + key + "\":\"" + str(file_value) + "\" and")
print("CMIP6 tables requires \"" + key + "\":\"" + str(table_value) + "\".")
print("=====================================================================================")
print(BCOLORS.ENDC)
self.errors += 1
else:
# That attribute is not in the file
table_value = cv_attrs[key]
if isinstance(table_value, numpy.ndarray):
table_value = table_value[0]
if isinstance(table_value, float):
table_value = "{0:.2g}".format(table_value)
print(BCOLORS.FAIL)
print("=====================================================================================")
print("CMIP6 variable " + variable + " requires \"" + key + "\":\"" + str(table_value) + "\".")
print("=====================================================================================")
print(BCOLORS.ENDC)
self.errors += 1

So when it's validating the file's _FillValue and missing_value values it is actually validating those attibutes againsts the file's _FillValue value.

@taylor13
Copy link
Collaborator

As we discussed, the suggestion is to either avoid passing any information about the file's units and _FillValue as arguments of cmip6_cv.setup_variable (i.e., either remove the arguments self.dictVar['units'], self.dictVar['_FillValue'], or set them to some value serving as a flag that tells cmip6_cv.setup_variable to skip any code where these arguments are currently required). There should be no need to read from the file _FillValue and units prior to checking them against the information in the table.

@mauzey1 mauzey1 moved this from To Do to In Progress in 3.5 Jun 26, 2019
@mauzey1
Copy link
Collaborator

mauzey1 commented Jun 26, 2019

@taylor13 The variable basin in table Ofx is the only variable I have found that has the type integer.

"basin": {
"frequency": "fx",
"modeling_realm": "ocean",
"standard_name": "region",
"units": "1",
"cell_methods": "area: mean",
"cell_measures": "area: areacello",
"long_name": "Region Selection Index",
"comment": "A variable with the standard name of region contains strings which indicate geographical regions. These strings must be chosen from the standard region list.",
"dimensions": "longitude latitude",
"out_name": "basin",
"type": "integer",
"positive": "",
"valid_min": "",
"valid_max": "",
"flag_values": "0 1 2 3 4 5 6 7 8 9 10",
"flag_meanings": "global_land southern_ocean atlantic_ocean pacific_ocean arctic_ocean indian_ocean mediterranean_sea black_sea hudson_bay baltic_sea red_sea",
"ok_min_mean_abs": "",
"ok_max_mean_abs": ""
},

I thought it would use -999 as its missing value.
"int_missing_value": "-999",

However, when I got a NetCDF file for basin from ESGF I found out that it was using 0 as its missing_value and _FillValue.

Should we treat basin as a special case? It's the only variable that has the attributes flag_values and flag_meanings.

@taylor13
Copy link
Collaborator

"0" should definitely not be declared as missing_value or _FillValue. "0" indicates the region is "global_land", according to flag_meanings and flag_values. Where is CMOR getting missing_value="0"? I suspect that somewhere along the line -999 gets transformed to float or something, and then gets transformed back to 0. The missing_value and _FillValue attributes in the file should be as specified in the table: -999.

I don't think this should be a special case (other than it is declared integer rather than float).

@mauzey1
Copy link
Collaborator

mauzey1 commented Jul 18, 2019

@doutriaux1 @taylor13

I submitted a pull request that will make PrePARE get the units and missing values from the table instead of the NetCDF file. I also added a parameter for the missing integer value when setting up the variable.

Alternatively, we could just get rid of the units and missing value parameters from the variable setting functions since cmor_CV_variable can get the values directly from the table. I don't think anything other than PrePARE uses these functions.

cmor/Src/cmor_CV.c

Lines 2292 to 2295 in 84fd521

int cmor_CV_variable(int *var_id, char *name, char *units,
float *missing, int *imissing,
double startime, double endtime,
double startimebnds, double endtimebnds)

@taylor13
Copy link
Collaborator

I'll try to check this tomorrow.

@mauzey1 mauzey1 moved this from In Progress to Done in 3.5 Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
3.5
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants