Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation against various sources #10

Open
cgrdn opened this issue Jul 9, 2020 · 23 comments
Open

Validation against various sources #10

cgrdn opened this issue Jul 9, 2020 · 23 comments
Assignees
Labels
testing & validation For testing, comparison, etc.

Comments

@cgrdn
Copy link
Collaborator

cgrdn commented Jul 9, 2020

Need to validate the python module against the following:

  • SAGE-O2 (SOCCOM) matlab output
  • Already DMQC'ed Argo floats
  • DOXY profile audit list (distributed by Josh Plant at MBARI, probably uses SAGE or analogous code)

Keep in mind that results not matching exactly may be due to a mismatch in quality flags - there is a human element to QC. This issue will be kept open to log validation results to date, likely only closing upon a full "release" following satisfactory validation against a range of sources.

@cgrdn cgrdn added the testing & validation For testing, comparison, etc. label Jul 9, 2020
@cgrdn cgrdn self-assigned this Jul 9, 2020
@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 21, 2020

Ran comparison to latest DOXY audit, results look pretty good, starting to look into (1) nan values where DOXY audit values are not nan, and (2) the largest (absolute deviation > 0.2) differences to try to see why.
DOXY_audit_comparison_breakdown

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Working on diagnosing the larger differences. Starting with coriolis DAC because it appears to be disproportionately "bad" relative to the number of gains in the DOXY audit:

%nan (N)        %big (N)        %audit (N)       dac
26.20 (350)     27.22 (135)     31.77 (1715)    aoml
0.30 (4)        1.61 (8)        0.26 (14)       bodc
48.95 (654)     41.53 (206)     23.78 (1284)    coriolis
2.02 (27)       8.47 (42)       12.52 (676)     csiro
10.78 (144)     5.44 (27)       11.41 (616)     incois
0.22 (3)        0.60 (3)        0.74 (40)       kma
4.57 (61)       14.72 (73)      4.54 (245)      kordi
6.96 (93)       0.20 (1)        10.00 (540)     meds

For float 7900559, there seems to be an issue with the WOA longitude weights:

>>> syn.__WOAweights__
Out:
array([[ 6.88364583e-01,  6.39927469e-01,  5.90669367e-01,
         5.41263117e-01,  4.91562114e-01,  4.41393133e-01,
         3.91177469e-01,  3.39982253e-01,  2.89084491e-01,
         2.38097222e-01,  1.86896219e-01,  1.35644290e-01,
         8.40455247e-02,  3.30412809e-02,  1.01544406e+00,
         9.55925926e-01,  8.95963735e-01,  8.44268133e-01,
         7.92746914e-01,  7.41060185e-01,  6.91586420e-01,
         6.41908565e-01,  5.90193673e-01,  5.38507330e-01,
         5.10801312e-01,  5.60521605e-01,  6.09676312e-01,
         6.59154321e-01,  7.08733410e-01,  7.58305556e-01,
         8.08459105e-01,  8.60202932e-01,  9.10690586e-01,
         9.59998843e-01,  9.56616637e-01,  9.06692055e-01,
         8.57066906e-01,  8.10974836e-01,  7.65177718e-01,
         5.98092145e-01,  4.36322058e-01,  2.72361111e-01,
         1.12086694e-01,  9.47886734e-01,  4.30633381e-01,
         2.54879071e-01,  9.16140606e-01,  6.95106033e-01,
         6.29616189e-01,  5.64675179e-01,  5.00448402e-01,
         4.36080869e-01,  3.71301150e-01,  3.08304585e-01,
         2.42206168e-01,  1.77703480e-01,  1.13303091e-01,
         4.87974164e-02,  9.84925154e-01,  9.18016975e-01,
         8.50967593e-01,  7.83937886e-01,  7.17261188e-01,
         6.50378086e-01,  5.83856096e-01,  5.17262346e-01,
         4.50608025e-01,  3.84018904e-01,  3.17274691e-01,
         2.50585262e-01,  1.83907793e-01,  1.17221836e-01,
         5.04425154e-02,  9.84288008e-01,  9.19772252e-01,
         8.55593265e-01,  7.91158154e-01,  7.26203704e-01,
         6.61232826e-01,  5.96902255e-01,  5.31550926e-01,
         4.67561977e-01,  4.02976777e-01,  3.38758961e-01,
         2.74327210e-01,  2.09800627e-01,  1.45084752e-01,
         8.05921446e-02,  1.65486858e-02,  9.50020062e-01,
         8.83728781e-01,  8.16517361e-01,  7.50023148e-01,
         6.83105710e-01,  6.16665123e-01,  5.49926312e-01,
         4.83533565e-01,  4.16788580e-01,  2.83089120e-01,
         1.49821373e-01,  8.34606481e-02],
       [ 9.52820000e-01,  9.82950000e-01,  1.04400000e-02,
         3.36000000e-02,  6.06000000e-02,  4.12000000e-02,
         2.60000000e-04,  9.14980000e-01,  8.11350000e-01,
         7.02310000e-01,  5.45270000e-01,  3.93410000e-01,
         5.12000000e-02,  3.65400000e-02,  9.94000000e-01,
         8.72860000e-01,  7.35300000e-02,  3.20240000e-01,
         4.60330000e-01,  7.02410000e-01,  7.91210000e-01,
         7.98960000e-01,  9.76180000e-01,  9.24200000e-02,
         7.32100000e-02,  3.77400000e-02,  9.96510000e-01,
         2.78000000e-03,  1.44100000e-02,  4.35000000e-03,
         4.15000000e-03,  2.70000000e-03,  9.45020000e-01,
         9.58690000e-01,  9.82550000e-01,  8.62630000e-01,
         7.06750000e-01,  5.67160000e-01,  5.16050000e-01,
         5.09910000e-01,  4.82230000e-01,  4.76410000e-01,
         4.30960000e-01,  4.67420000e-01,  7.87290000e-01,
         8.77590000e-01,  6.26700000e-02,  1.84920000e-01,
         1.99160000e-01,  7.51000000e-02,  8.95300000e-02,
         1.65850000e-01,  2.05810000e-01,  1.89760000e-01,
         1.27170000e-01,  7.74800000e-02,  7.30400000e-02,
         9.93580000e-01,  9.64300000e-01,  9.25150000e-01,
         9.00980000e-01,  8.80340000e-01,  8.50110000e-01,
         8.40040000e-01,  8.52320000e-01,  8.92450000e-01,
         9.05380000e-01,  9.46920000e-01,  9.48600000e-01,
         9.51600000e-01,  9.29870000e-01,  9.03370000e-01,
         8.68820000e-01,  8.16600000e-01,  7.49510000e-01,
         7.40370000e-01,  7.37900000e-01,  7.51840000e-01,
         7.15660000e-01,  6.72510000e-01,  5.61910000e-01,
         4.80050000e-01,  2.94190000e-01,  1.42310000e-01,
         2.62500000e-02,  8.89810000e-01,  7.48830000e-01,
         6.31420000e-01,  4.32980000e-01,  3.20490000e-01,
         1.44120000e-01,  1.09040000e-01,  1.72930000e-01,
         3.17990000e-01,  3.31580000e-01,  2.63650000e-01,
         2.97670000e-01,  3.10560000e-01,  3.46080000e-01,
         2.29610000e-01,  1.13490000e-01],
       [-4.32640000e+00, -4.35086000e+00, -4.38054000e+00,
        -4.43682000e+00, -4.50559000e+00, -4.59104000e+00,
        -4.68564000e+00, -4.77660000e+00, -4.85162000e+00,
        -4.89363000e+00, -4.90572000e+00, -4.95583000e+00,
        -4.62640000e+00, -4.58491000e+00, -4.41805000e+00,
        -4.19289000e+00, -4.04815000e+00, -3.54761000e+00,
        -3.13531000e+00, -2.95750000e+00, -2.85714000e+00,
        -2.80712000e+00, -2.54943000e+00, -2.18277000e+00,
        -2.13530000e+00, -1.93603000e+00, -1.76176000e+00,
        -1.68385000e+00, -1.63593000e+00, -1.58291000e+00,
        -1.48541000e+00, -1.26075000e+00, -9.59140000e-01,
        -9.00270000e-01, -5.97960000e-01, -3.76530000e-01,
        -2.41380000e-01, -1.28500000e-01, -9.85600000e-02,
        -9.76700000e-02, -5.23500000e-02,  1.62800000e-02,
         2.53000000e-03, -1.37000000e-02, -1.60860000e-01,
        -1.95320000e-01, -2.42880000e-01, -3.68790000e-01,
        -5.98620000e-01, -7.58300000e-01, -8.35770000e-01,
        -9.69160000e-01, -1.03068000e+00, -1.08100000e+00,
        -1.04145000e+00, -1.01946000e+00, -9.63950000e-01,
        -9.76540000e-01, -1.04272000e+00, -1.12081000e+00,
        -1.18185000e+00, -1.26735000e+00, -1.38612000e+00,
        -1.34984000e+00, -1.39104000e+00, -1.42102000e+00,
        -1.54400000e+00, -1.55120000e+00, -1.63714000e+00,
        -1.69711000e+00, -1.64675000e+00, -1.68310000e+00,
        -1.55753000e+00, -1.49094000e+00, -1.45146000e+00,
        -1.47069000e+00, -1.50129000e+00, -1.54485000e+00,
        -1.59601000e+00, -1.66917000e+00, -1.69951000e+00,
        -1.72284000e+00, -1.68488000e+00, -1.56383000e+00,
        -1.48742000e+00, -1.40972000e+00, -1.31495000e+00,
        -1.18791000e+00, -1.08557000e+00, -9.28230000e-01,
        -7.21610000e-01, -5.07820000e-01, -4.17410000e-01,
        -2.70620000e-01, -2.17370000e-01, -1.34170000e-01,
         3.52800000e-02,  1.37160000e-01,  3.06270000e-01,
         3.44390000e-01,  3.80120000e-01]])

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Founds pretty major bug in how longitude weights were being calculated. Simple indexing mistake, but was causing negative weights and so probably some rather large issues. I am guessing those that were not affected by it (i.e. are in good agreement with the audit) were ones where the bounding climatology is missing data, and so a more simple average is taken of available data (this is the same in SAGE, so holes in WOA data should result in the same gains in each).

In commit 1a8d01c this now fixed for the WOA interpolation, but should check the NCEP interpolation as well to make sure the mistake is not repeated there. Will test to see if it improves a couple individual floats and if yes, will re-run the entire audit comparison to see where else mistakes may be coming from.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Longitude doesn't fix error but found a new indexing error in calc_gain() where it loops through cycles that have surface data, which is not necessarily all cycles, but the index counts by one. So this offsets some of the gains from their proper cycle, which would make the comparison totally mismatch. Working on a fix now.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Fixed error described above in 6644c03. Should fix match ups quite a bit - going to re-run gain comparison now.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Much fewer large differences, but reduction in "perfect" matches, increase in infinite matches? Will have to look into this more.

Before longitude fix in 1a8d01c and index fix in 6644c03:

      N               name         pct
0  5399              Total  100.000000
1   804          AD < 0.01   14.891647
2  1880  0.01 <= AD < 0.05   34.821263
3   967   0.05 <= AD < 0.2   17.910724
4   496          AD >= 0.2    9.186886
5   532         NaN valued    9.853677
6   720    Both inf valued   13.335803

and after:

      N               name         pct
0  5399              Total  100.000000
1   225          AD < 0.01    4.167438
2  2333  0.01 <= AD < 0.05   43.211706
3   514   0.05 <= AD < 0.2    9.520282
4   136          AD >= 0.2    2.518985
5   631         NaN valued   11.687350
6  1560    Both inf valued   28.894240

DOXY_audit_comparison_breakdown_20200725

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Just did a quick check on the audit file only, and there are only 634 infinite values so I have a bug in the post processing somewhere.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 24, 2020

Found it - things look much better:

      N               name         pct
0  5399              Total  100.000000
1  1560          AD < 0.01   28.894240
2  2333  0.01 <= AD < 0.05   43.211706
3   514   0.05 <= AD < 0.2    9.520282
4   136          AD >= 0.2    2.518985
5   225         NaN valued    4.167438
6   631    Both inf valued   11.687350

DOXY_audit_comparison_breakdown_20200725

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 27, 2020

Seems like with the improvement, the coriolis DAC is now even more proportionally higher in mismatches than before, while others have all but disappeared, and aoml accounts for a little less than what you'd expect.

I suspect there's a good explanation for this given that coriolis is such an important DAC.

%nan (N)        %big (N)        %audit (N)       dac
26.05 (223)     27.21 (37)      31.77 (1715)    aoml
0.00 (0)        0.74 (1)        0.26 (14)       bodc
73.83 (632)     40.44 (55)      23.78 (1284)    coriolis
0.00 (0)        0.00 (0)        3.41 (184)      csio
0.00 (0)        2.94 (4)        12.52 (676)     csiro
0.12 (1)        11.03 (15)      11.41 (616)     incois
0.00 (0)        0.74 (1)        1.57 (85)       jma
0.00 (0)        0.00 (0)        0.74 (40)       kma
0.00 (0)        16.91 (23)      4.54 (245)      kordi
0.00 (0)        0.00 (0)        10.00 (540)     meds

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 29, 2020

Found the three non-matched infinite values (631 of 634 match right now):

    WMO  CYCLE       DAC            DATE  pyGAIN  sageGAIN  diffGAIN
3900531      0  coriolis  11/06/07 02:26    1.94       inf       inf
3900531    130  coriolis  05/08/11 03:08    1.35       inf       inf
3900791      0  coriolis  02/13/08 02:17    2.13       inf       inf

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 30, 2020

For the NaN-valued results, most can be attributed to using sprof(wmo).clean() to remove any bad data (any flags that are not 1, 2, 3, 5, or 8). I'm not sure if this means that the quality flags have been adjusted since the audit, or if Josh does not remove data when he does his analysis.

An issue I run into when not using clean() is that the fill value 99999 makes it into the calculation as a real value, which is obviously incorrect no matter what. Plan: write a filter_fillvalue() function that will replace any FillValues with NaN, but not <PARAM>_QC = 4 for example. That should reduce the number of NaN values and then we can see how things compare. I'll probably run that function by default when you load in data, will make functionality not to in case there was some reason the user really wanted FillValues to be included.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 30, 2020

Wrote base function dict_fillvalue_clean(floatdict) and functions sprof.rm_fillvalue() and profiles.rm_fillvalue() to behave as described above. For the class functions, they run on __init__ unless optional parameter keep_fillvalue is set to True.

Done in commits 71da833 and 0fbc9cf.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jul 30, 2020

Removing fillvalues but not filtering for quality flags has removed all NaN values:

      N               name        pct
1  1586          AD < 0.01  29.375810
2  2397  0.01 <= AD < 0.05  44.397111
3   606   0.05 <= AD < 0.2  11.224301
4   179          AD >= 0.2   3.315429
5     0         NaN valued   0.000000
6   631    Both inf valued  11.687350

DOXY_audit_comparison_breakdown_20200730

Now - at this point, there is probably some good reason for differences. If quality flags have been shifted based on the audit, obviously that would cause some differences. With ~85% in good agreement here, I'm pretty happy with that. Will still investigate the larger differences, would like to ideally get all gains within 0.05, so that's still almost 800 gain calculations to diagnose/improve.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Aug 20, 2020

Switching gears to comparing to already DMQC'ed profiles - doing the comparison is a little more tedious because reading the gain from netCDF BD-files is not so uniform. Working with a single profile its quite straight forward by reading SCIENTIFIC_CALIB_EQUATION, SCIENTIFIC_CALIB_COEFFICIENT and SCIENTIFIC_CALIB_COMMENT, but efficiently reading the value across many floats is somewhat complicated as there are often a variable number of calibrations and the equations are formatted differently. For the moment the pseudocode for looking through BD-files for this information is basically:

  • look for DOXY_ADJUSTED inside of SCIENTIFIC_CALIB_EQUATION
  • go to SCIENTIFIC_CALIB_COEFFICIENT at the index discovered above, and read that string
  • get the gain value by converting the substring after '=' as a float
  • record the comment to see how the calibration was done (WOA? In-air? Other?)

But I have a feeling this is failing in many cases for many floats, because when I ran it for ~12000 BD files I only ended up with about 500 properly read gains, but over 10000 reliably calculated gains from the python module using sprof(wmo).calc_gains(ref='WOA').

Additionally, in some cases, the BD-file comments say they are corrected using in-air data, but no BRtraj file exists, so I am not able to calculate in-air gain using NCEP data, as I don't know where to find the PPOX_DOXY variable.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Aug 20, 2020

Founds pretty major bug in how longitude weights were being calculated. Simple indexing mistake, but was causing negative weights and so probably some rather large issues. I am guessing those that were not affected by it (i.e. are in good agreement with the audit) were ones where the bounding climatology is missing data, and so a more simple average is taken of available data (this is the same in SAGE, so holes in WOA data should result in the same gains in each).

In commit 1a8d01c this now fixed for the WOA interpolation, but should check the NCEP interpolation as well to make sure the mistake is not repeated there. Will test to see if it improves a couple individual floats and if yes, will re-run the entire audit comparison to see where else mistakes may be coming from.

Checked on NCEP interpolation weights and they seem fine. Worth noting though that because in the NCEP interpolation we interpolate to match time before doing the weighted average, the weights are always 0 because the time matches perfectly. This still yields the same result, but for the sake of performance those zero weights probably don't need to be calculated. Removed in commit a4a0716.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Sep 18, 2020

Ran through plots for big differences today - observations/thoughts:

  • how does gain calculation in bgcArgo handle repeated cycles, i.e. when Nprof > 1? I suspect the indexing uses cycle number, which would lump the two (incorrectly). Float 4902480 has two cycle 1's so can use that as an example to check.
  • Large negative values appear to be removed from Josh's analysis, but since the latest iteration of the validation where I am only removing FillValue from the data, some large negative and positive values are skewing data. They probably have QC=4 as the RTQC range test is on the interval (-5, 600). Will try modifying the clean() method for this to add a QC=4 option and a in-package range test.
  • Should double check audit README for surface depth - pretty sure I made them match but should double check.
  • There are a few (3) gains in the audit that have infinite values, while bgcArgo returns finite values. In these cases there are zero concentration values, but also some other values that is causing the mean surface value to be finite. Why are the non-zero values apparently filtered out in Josh's analysis?

@cgrdn
Copy link
Collaborator Author

cgrdn commented Oct 15, 2020

Quick look at percent difference over absolute difference today. Haven't made figures yet but slightly different distribution. Hard to say if its better or worse, just a different way to break down the data:

      N             name
0  5399            Total  100.000000
1  3019        %diff < 1   55.917763
2  1354   1 <= %diff < 5   25.078718
3   313  5 <= %diff < 20    5.797370
4    75      %diff >= 20    1.389146
5     9       NaN valued    0.166698
6   626  Both inf valued   11.594740

@cgrdn
Copy link
Collaborator Author

cgrdn commented Oct 15, 2020

Distribution of percent differences basically looks like the one of actual differences, but shape is more interesting when you make the bins smaller:

pct_diff_dist
pct_diff_dist_smallbins

@cgrdn
Copy link
Collaborator Author

cgrdn commented Oct 15, 2020

Have thought about it more and the small bins are producing an artifact in the plotting because there are not many gain values that should produce % differences less that ~0.75%. For example, most gain values are about ~1.20, and have a precision of 0.01, so if there is any difference, the smallest pct difference will be 100*0.01/1.20 = 0.83%. Only large gain values can produce < 0.5% pct differences.

@cgrdn
Copy link
Collaborator Author

cgrdn commented Dec 30, 2020

I re-ran the comparison for the first time in a while, and have found that while there are fewer "bad" comparisons, there is not a positive bias, where the python module is slightly over-estimating the gain values compared to SAGE. Will look into the reasoning - starting with changes to the oxygen solubility calculation. If the source of the bias can be found, then things will have improved considerably, but obviously this offset is not ideal.
scatter_and_dist_smalllims_20201230

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jun 29, 2021

Unit issue in solubility was the reason for the above plot. This was resolved, though there is still some scatter in how WOA data in interpolated, however these are small differences generally about 0 so do not lead to differences in mean gain.

new_sage_comparison

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jun 29, 2021

Latest distribution looks like that from December of last year, but back on 1:1 line/distribution about 0. Mean difference in gain is 0.017 w/ standard deviation 3.22. Absolute difference mean is 0.185.

[edit: If you remove 16 points with very large differences in gain (both audit and python package have some gains of -6000 something - clearly problematic data - the mean difference is 0.0029, absolute difference mean is 0.0347, and standard deviation is 0.11 - quite reliable in my opinion!]

scatter_and_dist_smalllims_20210629

@cgrdn
Copy link
Collaborator Author

cgrdn commented Jun 29, 2021

      N             name
1  2403        %diff < 1   52.882923
2  1220   1 <= %diff < 5   26.848592
3   409  5 <= %diff < 20    9.000880
4    79      %diff >= 20    1.738556
5     6       NaN valued    0.132042
6   424  Both inf valued    9.330986

      N               name
1  1379          AD < 0.01  30.347711
2  2102  0.01 <= AD < 0.05  46.258803
3   488   0.05 <= AD < 0.2  10.739437
4   145          AD >= 0.2   3.191021
5     6         NaN valued   0.132042
6   424    Both inf valued   9.330986

@cgrdn cgrdn closed this as completed Jun 29, 2021
@cgrdn cgrdn reopened this Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing & validation For testing, comparison, etc.
Projects
None yet
Development

No branches or pull requests

1 participant