-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation against various sources #10
Comments
Working on diagnosing the larger differences. Starting with coriolis DAC because it appears to be disproportionately "bad" relative to the number of gains in the DOXY audit:
For float 7900559, there seems to be an issue with the WOA longitude weights: >>> syn.__WOAweights__
Out:
array([[ 6.88364583e-01, 6.39927469e-01, 5.90669367e-01,
5.41263117e-01, 4.91562114e-01, 4.41393133e-01,
3.91177469e-01, 3.39982253e-01, 2.89084491e-01,
2.38097222e-01, 1.86896219e-01, 1.35644290e-01,
8.40455247e-02, 3.30412809e-02, 1.01544406e+00,
9.55925926e-01, 8.95963735e-01, 8.44268133e-01,
7.92746914e-01, 7.41060185e-01, 6.91586420e-01,
6.41908565e-01, 5.90193673e-01, 5.38507330e-01,
5.10801312e-01, 5.60521605e-01, 6.09676312e-01,
6.59154321e-01, 7.08733410e-01, 7.58305556e-01,
8.08459105e-01, 8.60202932e-01, 9.10690586e-01,
9.59998843e-01, 9.56616637e-01, 9.06692055e-01,
8.57066906e-01, 8.10974836e-01, 7.65177718e-01,
5.98092145e-01, 4.36322058e-01, 2.72361111e-01,
1.12086694e-01, 9.47886734e-01, 4.30633381e-01,
2.54879071e-01, 9.16140606e-01, 6.95106033e-01,
6.29616189e-01, 5.64675179e-01, 5.00448402e-01,
4.36080869e-01, 3.71301150e-01, 3.08304585e-01,
2.42206168e-01, 1.77703480e-01, 1.13303091e-01,
4.87974164e-02, 9.84925154e-01, 9.18016975e-01,
8.50967593e-01, 7.83937886e-01, 7.17261188e-01,
6.50378086e-01, 5.83856096e-01, 5.17262346e-01,
4.50608025e-01, 3.84018904e-01, 3.17274691e-01,
2.50585262e-01, 1.83907793e-01, 1.17221836e-01,
5.04425154e-02, 9.84288008e-01, 9.19772252e-01,
8.55593265e-01, 7.91158154e-01, 7.26203704e-01,
6.61232826e-01, 5.96902255e-01, 5.31550926e-01,
4.67561977e-01, 4.02976777e-01, 3.38758961e-01,
2.74327210e-01, 2.09800627e-01, 1.45084752e-01,
8.05921446e-02, 1.65486858e-02, 9.50020062e-01,
8.83728781e-01, 8.16517361e-01, 7.50023148e-01,
6.83105710e-01, 6.16665123e-01, 5.49926312e-01,
4.83533565e-01, 4.16788580e-01, 2.83089120e-01,
1.49821373e-01, 8.34606481e-02],
[ 9.52820000e-01, 9.82950000e-01, 1.04400000e-02,
3.36000000e-02, 6.06000000e-02, 4.12000000e-02,
2.60000000e-04, 9.14980000e-01, 8.11350000e-01,
7.02310000e-01, 5.45270000e-01, 3.93410000e-01,
5.12000000e-02, 3.65400000e-02, 9.94000000e-01,
8.72860000e-01, 7.35300000e-02, 3.20240000e-01,
4.60330000e-01, 7.02410000e-01, 7.91210000e-01,
7.98960000e-01, 9.76180000e-01, 9.24200000e-02,
7.32100000e-02, 3.77400000e-02, 9.96510000e-01,
2.78000000e-03, 1.44100000e-02, 4.35000000e-03,
4.15000000e-03, 2.70000000e-03, 9.45020000e-01,
9.58690000e-01, 9.82550000e-01, 8.62630000e-01,
7.06750000e-01, 5.67160000e-01, 5.16050000e-01,
5.09910000e-01, 4.82230000e-01, 4.76410000e-01,
4.30960000e-01, 4.67420000e-01, 7.87290000e-01,
8.77590000e-01, 6.26700000e-02, 1.84920000e-01,
1.99160000e-01, 7.51000000e-02, 8.95300000e-02,
1.65850000e-01, 2.05810000e-01, 1.89760000e-01,
1.27170000e-01, 7.74800000e-02, 7.30400000e-02,
9.93580000e-01, 9.64300000e-01, 9.25150000e-01,
9.00980000e-01, 8.80340000e-01, 8.50110000e-01,
8.40040000e-01, 8.52320000e-01, 8.92450000e-01,
9.05380000e-01, 9.46920000e-01, 9.48600000e-01,
9.51600000e-01, 9.29870000e-01, 9.03370000e-01,
8.68820000e-01, 8.16600000e-01, 7.49510000e-01,
7.40370000e-01, 7.37900000e-01, 7.51840000e-01,
7.15660000e-01, 6.72510000e-01, 5.61910000e-01,
4.80050000e-01, 2.94190000e-01, 1.42310000e-01,
2.62500000e-02, 8.89810000e-01, 7.48830000e-01,
6.31420000e-01, 4.32980000e-01, 3.20490000e-01,
1.44120000e-01, 1.09040000e-01, 1.72930000e-01,
3.17990000e-01, 3.31580000e-01, 2.63650000e-01,
2.97670000e-01, 3.10560000e-01, 3.46080000e-01,
2.29610000e-01, 1.13490000e-01],
[-4.32640000e+00, -4.35086000e+00, -4.38054000e+00,
-4.43682000e+00, -4.50559000e+00, -4.59104000e+00,
-4.68564000e+00, -4.77660000e+00, -4.85162000e+00,
-4.89363000e+00, -4.90572000e+00, -4.95583000e+00,
-4.62640000e+00, -4.58491000e+00, -4.41805000e+00,
-4.19289000e+00, -4.04815000e+00, -3.54761000e+00,
-3.13531000e+00, -2.95750000e+00, -2.85714000e+00,
-2.80712000e+00, -2.54943000e+00, -2.18277000e+00,
-2.13530000e+00, -1.93603000e+00, -1.76176000e+00,
-1.68385000e+00, -1.63593000e+00, -1.58291000e+00,
-1.48541000e+00, -1.26075000e+00, -9.59140000e-01,
-9.00270000e-01, -5.97960000e-01, -3.76530000e-01,
-2.41380000e-01, -1.28500000e-01, -9.85600000e-02,
-9.76700000e-02, -5.23500000e-02, 1.62800000e-02,
2.53000000e-03, -1.37000000e-02, -1.60860000e-01,
-1.95320000e-01, -2.42880000e-01, -3.68790000e-01,
-5.98620000e-01, -7.58300000e-01, -8.35770000e-01,
-9.69160000e-01, -1.03068000e+00, -1.08100000e+00,
-1.04145000e+00, -1.01946000e+00, -9.63950000e-01,
-9.76540000e-01, -1.04272000e+00, -1.12081000e+00,
-1.18185000e+00, -1.26735000e+00, -1.38612000e+00,
-1.34984000e+00, -1.39104000e+00, -1.42102000e+00,
-1.54400000e+00, -1.55120000e+00, -1.63714000e+00,
-1.69711000e+00, -1.64675000e+00, -1.68310000e+00,
-1.55753000e+00, -1.49094000e+00, -1.45146000e+00,
-1.47069000e+00, -1.50129000e+00, -1.54485000e+00,
-1.59601000e+00, -1.66917000e+00, -1.69951000e+00,
-1.72284000e+00, -1.68488000e+00, -1.56383000e+00,
-1.48742000e+00, -1.40972000e+00, -1.31495000e+00,
-1.18791000e+00, -1.08557000e+00, -9.28230000e-01,
-7.21610000e-01, -5.07820000e-01, -4.17410000e-01,
-2.70620000e-01, -2.17370000e-01, -1.34170000e-01,
3.52800000e-02, 1.37160000e-01, 3.06270000e-01,
3.44390000e-01, 3.80120000e-01]]) |
Founds pretty major bug in how longitude weights were being calculated. Simple indexing mistake, but was causing negative weights and so probably some rather large issues. I am guessing those that were not affected by it (i.e. are in good agreement with the audit) were ones where the bounding climatology is missing data, and so a more simple average is taken of available data (this is the same in SAGE, so holes in WOA data should result in the same gains in each). In commit 1a8d01c this now fixed for the WOA interpolation, but should check the NCEP interpolation as well to make sure the mistake is not repeated there. Will test to see if it improves a couple individual floats and if yes, will re-run the entire audit comparison to see where else mistakes may be coming from. |
Longitude doesn't fix error but found a new indexing error in |
Fixed error described above in 6644c03. Should fix match ups quite a bit - going to re-run gain comparison now. |
Much fewer large differences, but reduction in "perfect" matches, increase in infinite matches? Will have to look into this more. Before longitude fix in 1a8d01c and index fix in 6644c03:
and after:
|
Just did a quick check on the audit file only, and there are only 634 infinite values so I have a bug in the post processing somewhere. |
Seems like with the improvement, the coriolis DAC is now even more proportionally higher in mismatches than before, while others have all but disappeared, and aoml accounts for a little less than what you'd expect. I suspect there's a good explanation for this given that coriolis is such an important DAC.
|
Found the three non-matched infinite values (631 of 634 match right now):
|
For the NaN-valued results, most can be attributed to using An issue I run into when not using |
Removing fillvalues but not filtering for quality flags has removed all NaN values:
Now - at this point, there is probably some good reason for differences. If quality flags have been shifted based on the audit, obviously that would cause some differences. With ~85% in good agreement here, I'm pretty happy with that. Will still investigate the larger differences, would like to ideally get all gains within 0.05, so that's still almost 800 gain calculations to diagnose/improve. |
Switching gears to comparing to already DMQC'ed profiles - doing the comparison is a little more tedious because reading the gain from netCDF BD-files is not so uniform. Working with a single profile its quite straight forward by reading
But I have a feeling this is failing in many cases for many floats, because when I ran it for ~12000 BD files I only ended up with about 500 properly read gains, but over 10000 reliably calculated gains from the python module using Additionally, in some cases, the BD-file comments say they are corrected using in-air data, but no BRtraj file exists, so I am not able to calculate in-air gain using NCEP data, as I don't know where to find the |
Checked on NCEP interpolation weights and they seem fine. Worth noting though that because in the NCEP interpolation we interpolate to match time before doing the weighted average, the weights are always 0 because the time matches perfectly. This still yields the same result, but for the sake of performance those zero weights probably don't need to be calculated. Removed in commit a4a0716. |
Ran through plots for big differences today - observations/thoughts:
|
Quick look at percent difference over absolute difference today. Haven't made figures yet but slightly different distribution. Hard to say if its better or worse, just a different way to break down the data:
|
Have thought about it more and the small bins are producing an artifact in the plotting because there are not many gain values that should produce % differences less that ~0.75%. For example, most gain values are about ~1.20, and have a precision of 0.01, so if there is any difference, the smallest pct difference will be 100*0.01/1.20 = 0.83%. Only large gain values can produce < 0.5% pct differences. |
Latest distribution looks like that from December of last year, but back on 1:1 line/distribution about 0. Mean difference in gain is 0.017 w/ standard deviation 3.22. Absolute difference mean is 0.185. [edit: If you remove 16 points with very large differences in gain (both audit and python package have some gains of -6000 something - clearly problematic data - the mean difference is 0.0029, absolute difference mean is 0.0347, and standard deviation is 0.11 - quite reliable in my opinion!] |
|
Need to validate the python module against the following:
Keep in mind that results not matching exactly may be due to a mismatch in quality flags - there is a human element to QC. This issue will be kept open to log validation results to date, likely only closing upon a full "release" following satisfactory validation against a range of sources.
The text was updated successfully, but these errors were encountered: