-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR [error.array.value_out_of_min_max_range] evaluation is not correct #529
Comments
@katecrombie example data products would be great. can you also provide the version of Validate you are running?
|
@katecrombie also, if you could create an issue using the Bug template in the future, that would be great so it gets assigned to me right away! otherwise this can slip through the cracks :-) |
Using Validate 2.2.0 |
roger that on the bug template (oops) |
@katecrombie do you have an example product of where this is failing? |
report.txt is the validation report for these files. A few pass. a few fail. |
@katecrombie do you have the XML labels for these products? the ZIP you sent only include the LBLs. |
erospdart_lblexample_20220429.zip This zip should have the .lbl, .fit and .xml files. The report.txt file is a validation report that shows the failures. |
@katecrombie those 2 zips do not appear to contain any XML labels? Just the lbl and .fit files. |
D'oh.... Third time is the charm. |
Fix 1 ----- Original compare method had some confusions between precision and scale. We should be using scale for handling doubles with many significant digits. > The precision is the number of digits in the unscaled value. For instance, for the number 123.45, the precision returned is 5. > If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000. Additionally, we need to be able to handle scale appropriately. Previous functionality would decrease the scale to the least between the min/max value and the actual value. This obviously doesn't make sense because when we do this, comparisons like this will be invalid: actual value: 0.0014194006 scaled actual value: 0.0014194 max value from label: 0.0014194 as you can see, if we compare the max value from the label to the scaled actual value, they are equal. but in reality they are not. this should raise an error. Fix 2 ----- that being said, we can't always trust the full scaled values that are calculated because doubles can introduce false precision. if we just change the algorithm to use whichever is the largest scale (aka number of decimal places) we were taking values that sometimes introduced false precision. thanks for our regression test in #435, we were able to see that the solution is not as simple as increasing the scale. so after some reading, it looks like using the BigDecimal(String) initializer as much as possible, and avoiding calculations/conversions to double in memory is the best way to avoid false precision. there are still some possible issues with false precision in the code, but this should resolve most of them. resolves #529 References: * https://blogs.oracle.com/javamagazine/post/four-common-pitfalls-of-the-bigdecimal-class-and-how-to-avoid-them * https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale
Fix 1 ----- Original compare method had some confusions between precision and scale. We should be using scale for handling doubles with many significant digits. > The precision is the number of digits in the unscaled value. For instance, for the number 123.45, the precision returned is 5. > If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000. Additionally, we need to be able to handle scale appropriately. Previous functionality would decrease the scale to the least between the min/max value and the actual value. This obviously doesn't make sense because when we do this, comparisons like this will be invalid: actual value: 0.0014194006 scaled actual value: 0.0014194 max value from label: 0.0014194 as you can see, if we compare the max value from the label to the scaled actual value, they are equal. but in reality they are not. this should raise an error. Fix 2 ----- that being said, we can't always trust the full scaled values that are calculated because doubles can introduce false precision. if we just change the algorithm to use whichever is the largest scale (aka number of decimal places) we were taking values that sometimes introduced false precision. thanks for our regression test in #435, we were able to see that the solution is not as simple as increasing the scale. so after some reading, it looks like using the BigDecimal(String) initializer as much as possible, and avoiding calculations/conversions to double in memory is the best way to avoid false precision. there are still some possible issues with false precision in the code, but this should resolve most of them. resolves #529 References: * https://blogs.oracle.com/javamagazine/post/four-common-pitfalls-of-the-bigdecimal-class-and-how-to-avoid-them * https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale
@jordanpadams @nutjob4life @tloubrieu-jpl Is this an error that should be looked at or a missing file?
|
@katecrombie FYI, we are fairly certain |
@jordanpadams Here's an extreme example I created: The two attached .FITs files differ by 1 bit. At the cell in question, I think the bug is important. If the data values are like 123.45678, then this error won't fire much. Buf if the values are like 0.000012345678, this error will fire often. And some datasets will be in the latter domain. |
Not surprised this is a problem when min is represented by scientific notation. The original code, which was tweaked, compares using strings on the assumption they are in the form 0.00005 not 5.0E-5. The most probable reason for this is floating point representation is not perfect so 0.00005 can turn into 0.0000501 or 0.0000499 or something else nearby. String comparisons were then used to prevent this representation error but it makes processing forms other than direct decimal more difficult. No matter the approach, there will be problems because the continuous number line cannot be accurately represented for any number in a computer. There are a couple ov choices:
Any of those can be done, but they are fraught with peril and I need to which peril is the desired peril. |
I hope the attached file goes through. It's 700+ MB. It has a bundle that I've pared down to only those data files for which validate incorrectly flagged bad values. Below is validate's output. ` Configuration: Parameters: Product Level Validation Results PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml FAIL: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml FAIL: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml PDS4 Bundle Level Validation Results FAIL: file:/Users/rchen/Desktop/pds_review/data/.DS_Store PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml FAIL: file:/Users/rchen/Desktop/pds_review/.DS_Store PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml PASS: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml Summary: 7 error(s) Product Validation Summary: Referential Integrity Check Summary: Message Types: End of Report ` |
Cannot reproduce your errors. Please check your validate that it has code updates from 529 in them and/or that golish.zip is correct. I may not have understood your checks, but I get no errors:
|
ok, 3.1.1 did validate both my last zips correctly. |
You are okay with closing this ticket again then? |
yes, we can close this. I should have checked the latest version. OTOH, if this wasn't specifically addressed in a build, it might pop up again. |
When validating a pds4.folder of image type arrays, several errors were generated during the evaluation of the min/max array values. Some were correct, others were not.
ERROR [error.array.value_out_of_min_max_range] array 1, location (349, 345): Value is greater than the maximum value in the label (max=0.0014194, got=0.0014194006).
This is technically correct, and we can fix it by increasing the significant digits in the data product.
ERROR [error.array.value_out_of_min_max_range] array 1, location (386, 185): Value is greater than the maximum value in the label (max=0.0702686, got=0.07026859).
This is not correct.
My assumption is that validate is looking at the number of significant digits, and is giving incorrect results when the number of digits is different.
I can attach test case data products if needed.
Engineering Details
Tightly coupled with #514 and #434
see #574 for additional info
The text was updated successfully, but these errors were encountered: