Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR [error.array.value_out_of_min_max_range] evaluation is not correct #529

Closed
katecrombie opened this issue Jul 21, 2022 · 20 comments · Fixed by #577
Closed

ERROR [error.array.value_out_of_min_max_range] evaluation is not correct #529

katecrombie opened this issue Jul 21, 2022 · 20 comments · Fixed by #577

Comments

@katecrombie
Copy link

katecrombie commented Jul 21, 2022

When validating a pds4.folder of image type arrays, several errors were generated during the evaluation of the min/max array values. Some were correct, others were not.

ERROR [error.array.value_out_of_min_max_range] array 1, location (349, 345): Value is greater than the maximum value in the label (max=0.0014194, got=0.0014194006).

This is technically correct, and we can fix it by increasing the significant digits in the data product.

ERROR [error.array.value_out_of_min_max_range] array 1, location (386, 185): Value is greater than the maximum value in the label (max=0.0702686, got=0.07026859).

This is not correct.

My assumption is that validate is looking at the number of significant digits, and is giving incorrect results when the number of digits is different.

I can attach test case data products if needed.

Engineering Details

Tightly coupled with #514 and #434

see #574 for additional info

@jordanpadams
Copy link
Member

@katecrombie example data products would be great.

can you also provide the version of Validate you are running?

validate -V

@jordanpadams
Copy link
Member

@katecrombie also, if you could create an issue using the Bug template in the future, that would be great so it gets assigned to me right away! otherwise this can slip through the cracks :-)

@jordanpadams jordanpadams added the bug Something isn't working label Jul 21, 2022
@jordanpadams jordanpadams self-assigned this Jul 21, 2022
@katecrombie
Copy link
Author

Using Validate 2.2.0

@katecrombie
Copy link
Author

roger that on the bug template (oops)

@jordanpadams
Copy link
Member

@katecrombie do you have an example product of where this is failing?

@katecrombie
Copy link
Author

erospdart_lblexample2.zip

report.txt is the validation report for these files. A few pass. a few fail.

@jordanpadams
Copy link
Member

@katecrombie do you have the XML labels for these products? the ZIP you sent only include the LBLs.

@katecrombie
Copy link
Author

erospdart_lblexample_20220429.zip

This zip should have the .lbl, .fit and .xml files. The report.txt file is a validation report that shows the failures.

@jordanpadams
Copy link
Member

@katecrombie those 2 zips do not appear to contain any XML labels? Just the lbl and .fit files.

@katecrombie
Copy link
Author

D'oh.... Third time is the charm.
erospdart_lblexample_20220429_2.zip

jordanpadams added a commit that referenced this issue Dec 13, 2022
Fix 1
-----
Original compare method had some confusions between precision and scale. We should be using scale for handling
doubles with many significant digits.

> The precision is the number of digits in the unscaled value. For instance, for the number 123.45, the precision returned is 5.

> If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000.

Additionally, we need to be able to handle scale appropriately. Previous functionality would decrease the scale to the least between the min/max value and the actual value. This obviously doesn't make sense because when we do this, comparisons like this will be invalid:

actual value: 0.0014194006
scaled actual value: 0.0014194
max value from label: 0.0014194

as you can see, if we compare the max value from the label to the scaled actual value, they are equal. but in reality they are not. this should raise an error.

Fix 2
-----
that being said, we can't always trust the full scaled values that are calculated because doubles can introduce false precision. if we just change the algorithm to use whichever is the largest scale (aka number of decimal places) we were taking values that sometimes introduced false precision. thanks for our regression test in #435, we were able to see that the solution is not as simple as increasing the scale. so after some reading, it looks like using the BigDecimal(String) initializer as much as possible, and avoiding calculations/conversions to double in memory is the best way to avoid false precision. there are still some possible issues with false precision in the code, but this should resolve most of them.

resolves #529

References:
* https://blogs.oracle.com/javamagazine/post/four-common-pitfalls-of-the-bigdecimal-class-and-how-to-avoid-them
* https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale
jordanpadams added a commit that referenced this issue Dec 20, 2022
Fix 1
-----
Original compare method had some confusions between precision and scale. We should be using scale for handling
doubles with many significant digits.

> The precision is the number of digits in the unscaled value. For instance, for the number 123.45, the precision returned is 5.

> If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000.

Additionally, we need to be able to handle scale appropriately. Previous functionality would decrease the scale to the least between the min/max value and the actual value. This obviously doesn't make sense because when we do this, comparisons like this will be invalid:

actual value: 0.0014194006
scaled actual value: 0.0014194
max value from label: 0.0014194

as you can see, if we compare the max value from the label to the scaled actual value, they are equal. but in reality they are not. this should raise an error.

Fix 2
-----
that being said, we can't always trust the full scaled values that are calculated because doubles can introduce false precision. if we just change the algorithm to use whichever is the largest scale (aka number of decimal places) we were taking values that sometimes introduced false precision. thanks for our regression test in #435, we were able to see that the solution is not as simple as increasing the scale. so after some reading, it looks like using the BigDecimal(String) initializer as much as possible, and avoiding calculations/conversions to double in memory is the best way to avoid false precision. there are still some possible issues with false precision in the code, but this should resolve most of them.

resolves #529

References:
* https://blogs.oracle.com/javamagazine/post/four-common-pitfalls-of-the-bigdecimal-class-and-how-to-avoid-them
* https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale
@al-niessner
Copy link
Contributor

al-niessner commented Dec 22, 2022

@jordanpadams @nutjob4life @tloubrieu-jpl

Is this an error that should be looked at or a missing file?

  FAIL: file:/tmp/test/m0157417198f3_2p_iof_gbl.xml
      ERROR  [error.label.context_ref_not_found]   line 84: 'Context product not found: urn:nasa:pds:context:instrument:near.msi
        8 product validation(s) completed

@jordanpadams
Copy link
Member

@katecrombie FYI, we are fairly certain max=0.0014194, got=0.0014194006 is a spurious error. 0.0014194006 is just an issue with the significant digits and how doubles are stored in memory.

@rchenatjpl
Copy link

rchenatjpl commented Feb 3, 2023

@jordanpadams Here's an extreme example I created:
ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=5.0E-5, got=0.005).
If that's unclear, multiply both numbers by 10000, and validate is saying 5 is greater than 500.

The two attached .FITs files differ by 1 bit. At the cell in question,
m5under.fit: 0x3ba3d70a
m5over.fit: 0x3ba3d70b
which in decimal are 0.00499999988824 and 0.0050000003539. So yes, it's a rounding problem, but note that the value in both labels has exactly 1 significant digit (brackets removed):
minimum 0.00005 /minimum

I think the bug is important. If the data values are like 123.45678, then this error won't fire much. Buf if the values are like 0.000012345678, this error will fire often. And some datasets will be in the latter domain.
Archive.zip

@rchenatjpl rchenatjpl reopened this Feb 3, 2023
@al-niessner
Copy link
Contributor

@jordanpadams @rchenatjpl

Not surprised this is a problem when min is represented by scientific notation. The original code, which was tweaked, compares using strings on the assumption they are in the form 0.00005 not 5.0E-5. The most probable reason for this is floating point representation is not perfect so 0.00005 can turn into 0.0000501 or 0.0000499 or something else nearby. String comparisons were then used to prevent this representation error but it makes processing forms other than direct decimal more difficult.

No matter the approach, there will be problems because the continuous number line cannot be accurately represented for any number in a computer. There are a couple ov choices:

  1. convert things to doubles and live with the representation errors that introduced.
  2. convert things to doubles then forever fit the code for corner cases.
  3. convert all numbers to direct decimal formats (not scientific format)
    1. using Double to do the conversion introduces same problems as 1 and/or 2
    2. do string manipulation to move decimal point around padding with 0 as needed

Any of those can be done, but they are fraught with peril and I need to which peril is the desired peril.

@rchenatjpl
Copy link

golish.zip

I hope the attached file goes through. It's 700+ MB. It has a bundle that I've pared down to only those data files for which validate incorrectly flagged bad values. Below is validate's output.

`
PDS Validate Tool Report

Configuration:
Version 3.1.0-SNAPSHOT
Date 2023-02-07T06:47:27Z

Parameters:
Targets [file:/Users/rchen/Desktop/pds_review/]
Rule Type pds4.bundle
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Allow Unlabeled Files false
Max Errors 100000
Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json

Product Level Validation Results

PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml
1 product validation(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml
2 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml
Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit
ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=0.007827, got=0.007827956).
End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit
3 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml
Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit
ERROR [error.array.value_out_of_min_max_range] array 1, location (50, 365): Value is less than the minimum value in the label (min=0.009938, got=0.009938589).
End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit
4 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml
Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit
ERROR [error.array.value_out_of_min_max_range] array 1, location (165, 299): Value is less than the minimum value in the label (min=1.18E-4, got=1.18906144E-4).
End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit
5 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml
Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit
ERROR [error.array.value_out_of_min_max_range] array 1, location (322, 51): Value is less than the minimum value in the label (min=0.008009, got=0.00800981).
End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit
6 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml
Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit
ERROR [error.array.value_out_of_min_max_range] array 1, location (242, 287): Value is greater than the maximum value in the label (max=0.087356, got=0.08735539).
End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit
7 product validation(s) completed

PDS4 Bundle Level Validation Results

FAIL: file:/Users/rchen/Desktop/pds_review/data/.DS_Store
ERROR [error.file.name_has_invalid_characters] File name uses invalid character
WARNING [warning.file.not_referenced_in_label] File is not referenced by any label
1 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml
2 integrity check(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/.DS_Store
ERROR [error.file.name_has_invalid_characters] File name uses invalid character
WARNING [warning.file.not_referenced_in_label] File is not referenced by any label
3 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml
4 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml
5 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml
6 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml
7 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml
8 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml
9 integrity check(s) completed

Summary:

7 error(s)
2 warning(s)

Product Validation Summary:
2 product(s) passed
5 product(s) failed
0 product(s) skipped

Referential Integrity Check Summary:
7 check(s) passed
2 check(s) failed
0 check(s) skipped

Message Types:
5 error.array.value_out_of_min_max_range
2 error.file.name_has_invalid_characters
2 warning.file.not_referenced_in_label

End of Report
Completed execution in 6256 ms

`

@al-niessner
Copy link
Contributor

@rchenatjpl

Cannot reproduce your errors. Please check your validate that it has code updates from 529 in them and/or that golish.zip is correct. I may not have understood your checks, but I get no errors:

validate /tmp/golish/bundle_nearmsi_gbl.xml

PDS Validate Tool Report

Configuration:
   Version                       3.2.0-SNAPSHOT
   Date                          2023-02-07T16:29:24Z

Parameters:
   Targets                       [file:/tmp/golish/bundle_nearmsi_gbl.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /home/niessner/Projects/PDS/validate/src/main/resources/util/registered_context_products.json



Product Level Validation Results

  PASS: file:/tmp/golish/bundle_nearmsi_gbl.xml
        1 product validation(s) completed
Feb 07, 2023 8:29:27 AM gov.nasa.pds.tools.validate.task.ValidationTask execute
INFO: Validation complete for location 'file:/tmp/golish/bundle_nearmsi_gbl.xml'

Summary:

  0 error(s)
  0 warning(s)

  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped


End of Report
Completed execution in 4279 ms

@rchenatjpl
Copy link

ok, 3.1.1 did validate both my last zips correctly.

@al-niessner
Copy link
Contributor

@rchenatjpl

You are okay with closing this ticket again then?

@rchenatjpl
Copy link

yes, we can close this. I should have checked the latest version. OTOH, if this wasn't specifically addressed in a build, it might pop up again.

@miguelp1986
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants