Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ASCAT SM reader and matlab util for clim_stats #656

Merged
merged 107 commits into from
Dec 24, 2023

Conversation

amfox37
Copy link
Contributor

@amfox37 amfox37 commented Jun 2, 2023

This branch contains:

  1. Code for reading and scaling 25km ASCAT soil moisture product
  2. Matlab code for calculating climatology statistics to be used by the scaling on a regular 0.25 degree lat/lon grid and writing to a single netCDF file containing all 73 pentads.

The matlab code still needs minor attention as it contains some legacy of trying to be backwards compatible (which it isn't).

TO BE RESOLVED:
The new BUFR reader for the ASCAT obs is exceedingly slow (see in subroutine read_obs_sm_ASCAT_EUMET() in clsm_ensupd_read_obs.F90).

amfox37 and others added 14 commits November 22, 2023 15:40
…_clim_stats*.m):

- fixed indentation
- improved vertical alignment of equations
- minor edits of comments
- removed obsolete code
…tcdf_latlon_grid.m):

- fixed indentation
- improved vertical alignment of equations
- fixed typo in 'long_name' of 'lat', 'lon' variables
…" with "mean", "std", and "sum"; avoids dependency on Stats toolbox (various *.m)
Comment on lines 160 to 161
% remove tiles when there is no obs_fcst (obs_fcst == 0 in innov output when missing)
idx = find(obs_fcst == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmao-qliu @amfox37:
Not sure why the check here is for "obs_fcst==0". Do we not have a defined no-data-value for obs_fcst? The zero value might work for ASCAT sfds, but it's easy to imagine that zero is problematic if in future we're trying to generalize the subroutine for other variables (e.g., SWE).
My guess is that obs_fcst is -9999 (SMAP no-data-value) or possibly 1.e15 (MAPL_UNDEF).
Having said that, we probably don't even need to keep this block. The matlab reader for "ObsFcstAna" files produces NaN for no-data. I think this is inherited from "get_model_and_obs_clim_stats.m" and would need to be changed there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amfox37, @gmao-qliu : With any luck, my commit 565c868 fixes this in both get_model_and_obs_clim_stats*.m scripts. Please let me know if you see any problems.

Comment on lines 129 to 139
varid_lon = netcdf.defVar(ncid, 'lon', float_precision, [dimid_lon]);
netcdf.putAtt(ncid, varid_lon, 'standard_name', 'longitude');
netcdf.putAtt(ncid, varid_lon, 'long_name', 'lower left longitude of gridcell');
netcdf.putAtt(ncid, varid_lon, 'units', 'degrees_east');
netcdf.putAtt(ncid, varid_lon, 'axis', 'X');

varid_lat = netcdf.defVar(ncid, 'lat', float_precision, [dimid_lat]);
netcdf.putAtt(ncid, varid_lat, 'standard_name', 'latitude');
netcdf.putAtt(ncid, varid_lat, 'long_name', 'lower left latitude of gridcell');
netcdf.putAtt(ncid, varid_lat, 'units', 'degrees_north');
netcdf.putAtt(ncid, varid_lat, 'axis', 'Y');
Copy link
Contributor

@gmao-rreichle gmao-rreichle Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amfox37 : The variable names 'lon' and 'lat' (and the corresponding 'standard_name') here are not good practice. They are really 'll_lons' and 'll_lats', but the 'll_*' bit is not reflected in the variable name. If someone just looks at the nc file and doesn't check the long_name, they might not get this.
The grid here is a regular lat/lon grid, and we already have the scalar variables 'll_lon' and 'll_lat' in the file. Together with 'd_lon', 'd_lat', 'N_lon', and 'N_lat' we can readily construct 'll_lons' and 'll_lats'.
The simplest and safest thing to do is probably to not write 'lon' and 'lat' into the scaling parameter file.
Alternatively, we could rename the variables to 'll_lons' and 'll_lats'.
Or we could write the grid cell center 'lon' and 'lat' values (i.e., write lon=ll_lons+d_lon/2 and lat=ll_lats+d_lat/2).

Comment on lines 8187 to 8190
! double lon(lon) ;
! lon:standard_name = "longitude" ;
! double lat(lat) ;
! lat:standard_name = "latitude" ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the meaning of 'lon' and 'lat' as the lower boundaries of the grid cells is lost. See my comment re. write_netcdf_latlon_grid.m

Comment on lines 8376 to 8377
if ( abs(tile_coord(i)%com_lat-sclprm_lat(j_ind))>tol .or. &
abs(tile_coord(i)%com_lon-sclprm_lon(i_ind))>tol ) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check compares the tile's center-of-mass with the lower boundary of the scaling parameter grid cell. It would make more sense to have 'lon' and 'lat' indicate the lon & lat of the center of the scaling parameter grid cells. I think this is the only place that uses 'lon' and 'lat' from the netcdf scaling param file. We may not even need this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after fairly close inspection, I don't think this test is actually doing anything useful. I must have misunderstood what it was doing when I copied it over from tskin_zscore. Given how j_in and i_in are calculated, all its effectively checking is if ll_lon/lat, dlon/lat and sclprm_lon/lat (all read from the netCDF file) are consistent with each other, and we don't actually use sclprm_lon/lat for anything. And given how observations are assigned to tiles using tile_coord(i)%com_lon/lat in get_tile_num_from_latlon its somewhat circular to be checking if tmp_lon/lat(i) agree with tile_coord(i)%com_lon/lat - only something wrong in the super ob-ing step would cause a large discrepancy (and if testing this was the goal you wouldn't do it like this anyway).

So I think we should remove this test and the read of lat/lon and the production of the lat/lon by the matlab script

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @amfox37, for taking another look at this. I agree with the assessment. When you get a chance, please remove the sanity check and remove lat/lon from the scaling parameter file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmao-rreichle Ok, have done this in commit 544a119 removing lat/lon matlab write, lat/lon FORTRAN read and sanity check in read_obs.
Basic testing suggest this change, and all other changes in matlab scripts are good to go.

@gmao-rreichle
Copy link
Contributor

@amfox37 @gmao-qliu :
I cleaned up the code (mostly matlab) a bit more. Here's the summary diff of my changes (excl white space changes):
https://github.com/GEOS-ESM/GEOSldas/compare/c3fa251..4113033?w=1
Included in the above is the replacement of the deprecated nanmean, nanstd, and nansum functions in this single commit:
3ffee32
I also merged two lines into one line here:

data2D(1,:) = sum(o_data_sum( i,:,1:w_days), 3,"omitnan")./N_window;

Please inspect the changes and test that the revised matlab processing scripts still work as expected.

@amfox37
Copy link
Contributor Author

amfox37 commented Dec 19, 2023

Please inspect the changes and test that the revised matlab processing scripts still work as expected.

OK, will check it reproduces existing scaling parameters.

Not sure what to do for the best with the lat/lon thing. Seems like I should change to write center lat/lon to the netCDF file, keep that name and change long_name, or just forget the check which is inherited from elsewhere, and maybe not required and so not write these lat/lon at all.

@gmao-rreichle
Copy link
Contributor

Not sure what to do for the best with the lat/lon thing. Seems like I should change to write center lat/lon to the netCDF file, keep that name and change long_name, or just forget the check which is inherited from elsewhere, and maybe not required and so not write these lat/lon at all.

Maybe best to drop 'lat' and 'lon' from the scaling param netcdf file because these vectors are redundant, after all. We could still keep the sanity check in the zscore scaling subroutine by computing the (grid-cell center) sclprm_lat/lon coordinates on the fly (from ll_lat, ll_lon, d_lat, d_lon, N_lat, and N_lon) instead of reading them from the netcdf file.

@gmao-rreichle gmao-rreichle removed the help wanted Extra attention is needed label Dec 24, 2023
@gmao-rreichle gmao-rreichle marked this pull request as ready for review December 24, 2023 19:29
@gmao-rreichle gmao-rreichle requested review from a team as code owners December 24, 2023 19:29
Copy link
Contributor

@gmao-rreichle gmao-rreichle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed full suite of GEOSldas nightly tests (@gmao-rreichle, 24 Dec 2023)

@gmao-rreichle gmao-rreichle merged commit 8459e1b into develop Dec 24, 2023
7 checks passed
@gmao-rreichle gmao-rreichle deleted the feature/amfox/metop_sm branch December 24, 2023 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants