Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSW-R (beta) not agreeing with TEOS-10 for gsw_geo_strf_dyn_height() #47

Closed
dankelley opened this issue Jul 6, 2021 · 18 comments
Closed
Assignees
Labels
Projects

Comments

@dankelley
Copy link
Contributor

@efiring do you get the same as below, if you use the C code directly? (I've asked PB if the teos-10 website might be out-of-date, and I'll post status updates here as they come in.)

In doing updates for GSW-R/1.0-1, I am finding that I cannot reproduce the check values given at http://www.teos-10.org/pubs/gsw/html/gsw_geo_strf_dyn_height.html. Details are below, but a summary is that I am getting 12.295 for the first test, while the TEOS-10 website indicates 12.172 for that value.

@efiring do you get the same if you use the C code directly?

> library(gsw)
> SA <- c(34.7118, 34.8915, 35.0256, 34.8472, 34.7366, 34.7324)
> CT <- c(28.8099, 28.4392, 22.7862, 10.2262,  6.8272,  4.3236)
> p <- c(      10,      50,     125,     250,     600,    1000)
> p_ref <- 500
> dh <- gsw_geo_strf_dyn_height(SA, CT, p, p_ref)
> stopifnot(all.equal(dh, c(12.172172845782585, 9.797739925848624, 6.070940749148281,
+                           3.042891445395256, -1.078872239804912, -4.656953829254061)))
Error: dh and c(12.1721728457826, 9.79773992584862, 6.07094074914828, 3.04289144539526,  .... are not equal:
  Mean relative difference: 0.01048202
> 
> dh
[1] 12.295114  9.931936  6.165909  3.055219 -1.080634
[6] -4.633611

Values indicated at the TEOS-10 website

Screen Shot 2021-07-06 at 10 20 45 AM

@dankelley dankelley added the bug label Jul 6, 2021
@dankelley dankelley self-assigned this Jul 6, 2021
@dankelley
Copy link
Contributor Author

Sorry, I forgot to tag @PaulMBarker in previous comment.

@PaulMBarker
Copy link
Member

PaulMBarker commented Jul 6, 2021 via email

@dankelley
Copy link
Contributor Author

Thanks, Paul. I am relaxing the built-in tests for GSW-R. Dan.

@efiring
Copy link
Member

efiring commented Jul 6, 2021

See TEOS-10/GSW-C#36, which I merged just now. I had forgotten
that it was still sitting there unmerged. The current C code, and the Python code, are
using pchip interpolation, which is better than the buggy old interpolator in the original
C code, but which does not match either the old or the new Matlab code. Implementing
the newest Matlab algorithm in C will be a big job.

@dankelley
Copy link
Contributor Author

@efiring thanks for the note on the GSW-C update. Might I ask a question? I see (snapshot of git diff HEAD^1 below) that there are some comments about not using gsw_data_v3_0.nc, and I'm wondering if that means GSW-R should avoid that file. At the moment, I use the GSW-Fortran/test/gsw_data_v3_0.nc file and I'd like to know which (if either) I ought to use.

Please note that I do not incorporate this netcdf file directly into GSW-R. Doing so would put it over the package-size limit. Instead, I read it in and save a file in what's called "rda" R-data-archive format, which shrinks it from 4.7M to 1.4M. (I've no idea how it can compress so much better than netcdf, but I think it is doing it right, since all test values from matlab work, except for the one that this issue is about.)

Screen Shot 2021-07-06 at 11 06 11 AM

@efiring
Copy link
Member

efiring commented Jul 6, 2021

See also TEOS-10/GSW-C#35.
The current C version is using the v3_06_11 matfile data and check
values for everything except geo_strf_dyn_height; for the latter,
I patched in values that match the pchip interpolator being used.
In the C code, the check values are all stored in ascii as C source
code.
Can you use that C source? Or read it in and save it in your "rda"
format? Or make a modified version of "make_data_from_mat.py" that
would generate some other format you can read and then save as
"rda"?

@dankelley
Copy link
Contributor Author

@efiring Just to check, do you mean the following file, which I got from my minutes-old download from http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip?

$ pwd
/Users/kelley/src/gsw_matlab_v3_06_12/library
$ md5 *mat
MD5 (gsw_data_v3_0.mat) = 25b840e6f8bdd72522e28d99eccd9e99

@efiring
Copy link
Member

efiring commented Jul 6, 2021

No, the base is from the Matlab version before that, gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat.
And then the check values for strf_dyn_height have to be adjusted.

@efiring
Copy link
Member

efiring commented Jul 6, 2021

You can see the adjustment in https://github.com/TEOS-10/GSW-C/pull/36/files, in the diff
for gsw_check_data.c.

@dankelley
Copy link
Contributor Author

Any idea how I can obtain gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat? The website only provides the more recent version, gsw_matlab_v3_06_12, at http://www.teos-10.org/software.

Things are quite tricky without git for versioning.

@dankelley
Copy link
Contributor Author

Oh, sorry @efiring -- I just realized that I can download http://www.teos-10.org/software/gsw_matlab_v3_06_11.zip even though it's not linked-to in that webpage. The rest of this comment refers to http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip ...

As a matter of interest, I tried reading the matlab file within http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip and comparing with the netcdf file in https://github.com/TEOS-10/GSW-Fortran and I get identical values for delta_SA, but differing values for SAAR_ref. The graph below shows a histogram of the difference in SAAR_ref between netcdf and matlab (top panel) and, for context, shows a histogram of SAAR_ref from netcdf. The top panel is trimmed of a very few values that are far from this central cloud. Also, as shown in the text above that top panel, the number of "bad" data is quite different between these two files.

In summary, the SAAR_ref data are somewhat different between the matlab file and the gsw-fortran netcdf file. A rough comparison of the two graphs shows that the difference is of order 2e-6, while the typical values are fairly broadly distributed from zero to about 3e-4. In other words, the difference in SAAR is of roughly a few percent.

My next step will be to compare http://www.teos-10.org/software/gsw_matlab_v3_06_11.zip with http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip

create_saar_new

@dankelley
Copy link
Contributor Author

Hm, interesting. The matlab versions 3.06-11 and 3.06-12 are quite different. In 3.06-11, there are 46 latitude values, spanning the range -90 to 90, but in 3.06-12, there are 45 lats, spanning the range -86 to 90. And of course there are corresponding changes in the dimensionality of the arrays.

I thank @efiring for emphasizing that GSW-C is referenced to 3.06-11, not 3.06-12. It clearly will make a big difference, as C does not have a proper array type, so indexing is done by computing locations in a vector, knowing the dimensions of the array to which it corresponds. (For example, in GSW-C/gsw_saar_data.c line 5, we have a definition that gsw_ny is 46, which is right for 3.06-11 but would be wrong for 3.06-12. Indexing past the end of a vector is a big problem in C, and so it would have been a big mistake for me to have used a 3.06-12 matlab file with GSW-C code. (I won't bother checking the netcdf in GSW-Fortran ... at some point, it just gets too confusing, trying to keep up with all these files, and the decision of GSW-R is to key on GSW-C.

# Compare data in matlab versions 3.06-11 and 3.06-12

library(R.matlab)
#> R.matlab v3.6.2 (2018-09-26) successfully loaded. See ?R.matlab for help.
#> 
#> Attaching package: 'R.matlab'
#> The following objects are masked from 'package:base':
#> 
#>     getOption, isOpen
m11 <- readMat("~/src/gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat")
str(m11, 1)
#> List of 20
#>  $ CT.ref.cast     : num [1:32, 1] 28.8 28.8 28.7 28.4 27.8 ...
#>  $ SAAR.ref        : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref.cast     : num [1:32, 1] 34.7 34.7 34.8 34.9 35 ...
#>  $ SR.ref          : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref     : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref2    : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ gamma.n.ref.cast: num [1:32, 1] 21.8 21.8 21.9 22.1 22.4 ...
#>  $ gsw.cv          :List of 791
#>   ..- attr(*, "dim")= int [1:3] 791 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ gsw.demo.data   :List of 11
#>   ..- attr(*, "dim")= int [1:3] 11 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ lat.ref.cast    : num [1, 1] 4
#>  $ lats.ref        : num [1:46, 1] -90 -86 -82 -78 -74 -70 -66 -62 -58 -54 ...
#>  $ long.ref.cast   : num [1, 1] 188
#>  $ longs.ref       : num [1:91, 1] 0 4 8 12 16 20 24 28 32 36 ...
#>  $ ndepth.ref      : num [1:46, 1:91] NaN NaN NaN NaN NaN 39 41 42 43 38 ...
#>  $ ocean.ref       : num [1:46, 1:91] NaN NaN NaN NaN NaN 6 6 6 6 6 ...
#>  $ p.ref           : num [1:45, 1] 0 10 20 30 40 50 76 101 126 151 ...
#>  $ p.ref.cast      : num [1:32, 1] 10 20 30 50 75 100 125 150 200 250 ...
#>  $ sigma.2.ref.cast: num [1:32, 1] 30 30.1 30.1 30.3 30.6 ...
#>  $ version.date    : chr [1, 1] "15th_May_2011"
#>  $ version.number  : chr [1, 1] "3.0.11"
#>  - attr(*, "header")=List of 3

m12 <- readMat("~/src/gsw_matlab_v3_06_12/library/gsw_data_v3_0.mat")
str(m12, 1)
#> List of 20
#>  $ CT.ref.cast     : num [1:32, 1] 28.8 28.8 28.7 28.4 27.8 ...
#>  $ SAAR.ref        : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref          : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref.cast     : num [1:32, 1] 34.7 34.7 34.8 34.9 35 ...
#>  $ SR.ref          : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref     : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ gamma.n.ref.cast: num [1:32, 1] 21.8 21.8 21.9 22.1 22.4 ...
#>  $ gsw.cv          :List of 816
#>   ..- attr(*, "dim")= int [1:3] 816 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ gsw.demo.data   :List of 11
#>   ..- attr(*, "dim")= int [1:3] 11 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ lat.ref.cast    : num [1, 1] 4
#>  $ lats.ref        : num [1:45, 1] -86 -82 -78 -74 -70 -66 -62 -58 -54 -50 ...
#>  $ long.ref.cast   : num [1, 1] 188
#>  $ longs.ref       : num [1:91, 1] 0 4 8 12 16 20 24 28 32 36 ...
#>  $ ndepth.ref      : num [1:45, 1:91] NaN NaN NaN NaN 39 41 42 43 38 37 ...
#>  $ ocean.ref       : num [1:45, 1:91] NaN NaN NaN NaN 6 6 6 6 6 6 ...
#>  $ p.ref           : num [1:45, 1] 0 10 20 30 40 50 76 101 126 151 ...
#>  $ p.ref.cast      : num [1:32, 1] 10 20 30 50 75 100 125 150 200 250 ...
#>  $ sigma.2.ref.cast: num [1:32, 1] 30 30.1 30.1 30.3 30.6 ...
#>  $ version.date    : chr [1, 1] "15th_May_2011"
#>  $ version.number  : chr [1, 1] "3.06.12"
#>  - attr(*, "header")=List of 3

Created on 2021-07-06 by the reprex package (v2.0.0)

dankelley added a commit that referenced this issue Jul 6, 2021
@dankelley dankelley added this to To do in Issues Jul 6, 2021
@dankelley
Copy link
Contributor Author

I'm closing this now, because I now know that we cannot expect a high level of agreement, given that the Matlab system is different from the GSW-C system.

Issues automation moved this from To do to Done Jul 6, 2021
@dankelley dankelley mentioned this issue Jul 6, 2021
13 tasks
@ocefpaf
Copy link
Member

ocefpaf commented Jul 7, 2021

and the decision of GSW-R is to key on GSW-C.

Some for the Python wrapper. It is nice to share a common code base and hopefully that will make both libs in different languages more robust.

@dankelley
Copy link
Contributor Author

A bit along this line, and to @efiring -- the R system emits the following warnings. This is for code that I've modified to fit R, so the line numbers are wrong. In terms of GSW-C/gsw_saar.c, the relevant line numbers are 59, 62, 177 and 180. Indenting GSW-C/gsw_saar.c lines 59-60 and 177-178 will make this warning go away. (I can make a PR request if that's warranted.)

gsw_saar.c: In functiongsw_saar’:
gsw_saar.c:63:5: warning: thisifclause does not guard... [-Wmisleading-indentation]
   63 |     if (isnan(lat) || isnan(lon) || isnan(p))
      |     ^~
gsw_saar.c:66:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by theif66 |         if (lat  <  -86.0  ||  lat  >  90.0)
      |         ^~
gsw_saar.c: In functiongsw_deltasa_atlas’:
gsw_saar.c:181:5: warning: thisifclause does not guard... [-Wmisleading-indentation]
  181 |     if (isnan(lat) || isnan(lon) || isnan(p))
      |     ^~
gsw_saar.c:184:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by theif184 |         if (lat < -86.0  ||  lat  >  90.0)
      |         ^~

@dankelley
Copy link
Contributor Author

Oh, sorry, and I forgot to point out that the GSW-R version uses isnan, which is built-in to the R system, rather than the ISNAN macro that is defined in GSW-C/gsw_saar.c.

@efiring
Copy link
Member

efiring commented Jul 7, 2021

Dan, since the file is short and badly formatted, I made a PR with a complete reformatting:
TEOS-10/GSW-C#37. I will leave it out for a bit to see whether
anyone objects; if so, another PR with a minimal reformatting to remove the warnings
can be made.
Regarding isnan, I would think that using the macro would actually use plain isnan in
R--or does R define its own non-standard version of isnan for some reason?

@dankelley
Copy link
Contributor Author

Eric, thanks for the note.

R does some things to avoid conflicts between namespaces. I think you're right -- it would use the else part of the macro that starts at line 11 of GSW-C/gsw_saar.c, so likely I could have retained that part. I removed it because we know for sure that this isna() function is available in C code built into R, because of how the C-within-R system is set up.

I submitted GSW-R to the R build system. It has passed the two preliminary hurdles, and within a day I will likely either see it start appearing in the archive, or I'll get a message explaining a problem. The system is really quite nice. Submitted packages have to pass a lot of tests, on a lot of different machine types (see https://cran.r-project.org/web/checks/check_results_gsw.html for the previous version). Not only must things work on different machines, but also on the present version of R, the just-previous one and the upcoming one. And all packages that rely on the submitted package are also tested to be sure that they still work as intended. It takes about a week for a package to get into the system, with all this testing. And, sometimes, the tests reveal interesting problems, since there are lots of different C/C++ and Fortran compilers being used. All of this makes it a bit tricky to get a clean submission, but it also means that you can have a pretty high degree of trust in R packages (which number 17819 at the moment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Issues
  
Done
Development

No branches or pull requests

4 participants