Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate does not catch NaNs in Binary Tables #514

Closed
agicquelb opened this issue Jun 9, 2022 · 20 comments · Fixed by #588
Closed

Validate does not catch NaNs in Binary Tables #514

agicquelb opened this issue Jun 9, 2022 · 20 comments · Fixed by #588
Assignees
Labels

Comments

@agicquelb
Copy link

agicquelb commented Jun 9, 2022

I found Nan in Binary_Table. The validation tool didn't give me an error. I was looking at the data and I thought the nan. Is the validation tool not supposed to return an error? Also I can't upload file so I will see how I can send the example file to you.

Tightly coupled with #529 and #434

@agicquelb agicquelb added bug Something isn't working needs:triage labels Jun 9, 2022
@jordanpadams
Copy link
Member

@agicquelb is this bug supposed to be for validate? if so, could you also add how you ran the tool and the version you are running?

also, for uploading data, if you zip/tar the data/label up you can attach it.

@agicquelb
Copy link
Author

agicquelb commented Jun 9, 2022

@jordanpadams yes it's for validate.

I'm doing validate */*xml and the version is Version 2.2.0.

attached are the zip files.

pep_0396683517_0x691_sci.zip

@jordanpadams jordanpadams transferred this issue from NASA-PDS/mi-label Jun 9, 2022
@jordanpadams
Copy link
Member

@rchenatjpl can you help triage this for us? Is this an actual bug?

@rchenatjpl
Copy link

Hi, @agicquelb , is there a PDS4 label to go with this? The .zip contained a .fit and a .lbl, which is PDS3. I believe the validate tool just ignores all the files in the directory and reports 0 errors.
@jordanpadams , I haven't been able to run the PDS3 vtool in a long time. validate -h claims that 'validate -R pds3.volume' does something. Does it actually validate PDS3 labels? If you don't know and think this is worth investigating, please tell me and I will test stuff out.

@agicquelb
Copy link
Author

@jordanpadams @rchenatjpl attached is a zip file with the PDS4 label that I created. I never tried to validate PDS3 labels
20180818_039685.zip

@jordanpadams
Copy link
Member

thanks @agicquelb

@rchenatjpl it runs the PDS3 Volume Validator (it does not validate individual labels)

@rchenatjpl
Copy link

rchenatjpl commented Jun 28, 2022

Thanks, @agicquelb , that's great. Can you tell me where any NaN is - which Array_2D_Image (or Table_Binary), and within that, which Line and Sample (or record and field)? Or I could hunt down the binary.

Or maybe that's where the work lies. I'll start hunting.

@agicquelb
Copy link
Author

@rchenatjpl For example the column L15S00FLUX in the EXTENSION_FLUX_TABLE

@rchenatjpl
Copy link

rchenatjpl commented Jun 28, 2022

I'll look at your last comment later tonight, @agicquelb
Before I got it, I hacked up the .fit and .xml to insert NaN as well a neg and a pos Infinity, and I'm happy to say that validate caught them. The two hacked files are in the attached, as well as a screen capture of their hex values. Validate says:

% validate  -t x.xml
PDS Validate Tool Report

Configuration:
   Version                       2.2.0
   Date                          2022-06-28T22:18:31Z

Parameters:
   Targets                       [file:/Users/rchen/Desktop/x.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/rchen/PDS4tools/validate/resources/registered_context_products.json



Product Level Validation Results

  FAIL: file:/Users/rchen/Desktop/x.xml
    Begin Content Validation: file:/Users/rchen/Desktop/x.fit
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 5): Value is not within the valid range of the data type 'IEEE754MSBSingle': Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 9): Value is not within the valid range of the data type 'IEEE754MSBSingle': -Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 13): Value is not within the valid range of the data type 'IEEE754MSBSingle': NaN
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 15): Value is not within the valid range of the data type 'IEEE754MSBSingle': NaN
    End Content Validation: file:/Users/rchen/Desktop/x.fit
        1 product validation(s) completed

Summary:

  4 error(s)
  0 warning(s)

  Product Validation Summary:
    0          product(s) passed
    1          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped

  Message Types:
    4            error.array.value_out_of_data_type_range

End of Report
Completed execution in 1872 ms

Archive.zip

@rchenatjpl
Copy link

rchenatjpl commented Jun 28, 2022

yes, I see it (NaN) called out in pds4_viewer but not validate. I'll check this tonight

@rchenatjpl
Copy link

@agicquelb You are right, validate has a bug. It validates Array correctly but Table incorrectly. In the attachment, 4array.xml reads 4.bin as an array; 4table.xml reads 4.bin as a table. 4.png shows what pds4_viewer (correctly) thinks for each. Validate's output:

% validate -t 4array.xml
PDS Validate Tool Report
Configuration:
   Version                       2.2.0
   Date                          2022-06-29T08:11:54Z
Parameters:
   Targets                       [file:/Users/rchen/Desktop/4array.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
  FAIL: file:/Users/rchen/Desktop/4array.xml
    Begin Content Validation: file:/Users/rchen/Desktop/4.bin
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 1): Value is not within the valid range of the data type 'IEEE754MSBSingle': Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 2): Value is not within the valid range of the data type 'IEEE754MSBSingle': Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (2, 1): Value is not within the valid range of the data type 'IEEE754MSBSingle': -Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (2, 2): Value is not within the valid range of the data type 'IEEE754MSBSingle': -Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (4, 1): Value is not within the valid range of the data type 'IEEE754MSBSingle': NaN
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (4, 2): Value is not within the valid range of the data type 'IEEE754MSBSingle': NaN
    End Content Validation: file:/Users/rchen/Desktop/4.bin
        1 product validation(s) completed
Summary:
  6 error(s)
  0 warning(s)
  Product Validation Summary:
    0          product(s) passed
    1          product(s) failed
    0          product(s) skipped
  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped
  Message Types:
    6            error.array.value_out_of_data_type_range
End of Report
Completed execution in 2680 ms
% 
% 
% validate -t 4table.xml
PDS Validate Tool Report
Configuration:
   Version                       2.2.0
   Date                          2022-06-29T08:12:03Z
Parameters:
   Targets                       [file:/Users/rchen/Desktop/4table.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
  PASS: file:/Users/rchen/Desktop/4table.xml
        1 product validation(s) completed
Summary:
  0 error(s)
  0 warning(s)
  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped
  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped
End of Report
Completed execution in 2346 ms
% 

4.zip

@rchenatjpl
Copy link

Similarly for 8-byte values

% validate -t 8array.xml
PDS Validate Tool Report
Configuration:
   Version                       2.2.0
   Date                          2022-06-29T08:35:41Z
Parameters:
   Targets                       [file:/Users/rchen/Desktop/8array.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
  FAIL: file:/Users/rchen/Desktop/8array.xml
    Begin Content Validation: file:/Users/rchen/Desktop/8.bin
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 1): Value is not within the valid range of the data type 'IEEE754MSBDouble': Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (1, 2): Value is not within the valid range of the data type 'IEEE754MSBDouble': -Infinity
      ERROR  [error.array.value_out_of_data_type_range]   array 1, location (2, 2): Value is not within the valid range of the data type 'IEEE754MSBDouble': NaN
    End Content Validation: file:/Users/rchen/Desktop/8.bin
        1 product validation(s) completed
Summary:
  3 error(s)
  0 warning(s)
  Product Validation Summary:
    0          product(s) passed
    1          product(s) failed
    0          product(s) skipped
  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped
  Message Types:
    3            error.array.value_out_of_data_type_range
End of Report
Completed execution in 2225 ms
% 
% 
% validate -t 8table.xml 
PDS Validate Tool Report
Configuration:
   Version                       2.2.0
   Date                          2022-06-29T08:35:59Z
Parameters:
   Targets                       [file:/Users/rchen/Desktop/8table.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
  PASS: file:/Users/rchen/Desktop/8table.xml
        1 product validation(s) completed
Summary:
  0 error(s)
  0 warning(s)
  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped
  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped
End of Report
Completed execution in 2284 ms
% 

8.zip
Ball is yours, @jordanpadams

@jordanpadams jordanpadams removed their assignment Jul 22, 2022
@jordanpadams jordanpadams changed the title nan in Binary Table Validate does not catch NaNs in Binary Tables Aug 25, 2022
@agicquelb
Copy link
Author

@jordanpadams Any update on this issue about the fact the validate doesn't catch NAN in Binary Tables?

@jordanpadams
Copy link
Member

@agicquelb unfortunately, we are significantly behind in our backlog of bugs. this is in our plans for next build (June 2023), but as soon as we get this fixed, we can send you over a beta version to test out. sorry for the inconvenience.

@al-niessner
Copy link
Contributor

@jordanpadams

Reading through this ticket it seems that tables are not finding NaN and +/- Inf as arrays are. Am I reading this correctly?

@jordanpadams
Copy link
Member

@al-niessner that sounds about right to me!

jordanpadams pushed a commit that referenced this issue Jan 26, 2023
* unit test to show fix works

* fix INF/NAN check

Co-authored-by: Al Niessner <Al.Niessner@xxx.xxx>
@rchenatjpl
Copy link

@agicquelb @al-niessner @jordanpadams: Can we step back?

Are NaN and Inf illegal in binary data? I should have asked this a year ago. In the PDS4 Standards doc, Table 5A-3, the row for ASCII_Real says "PDS does not allow ... INF... -INF...NaN". So I think it only applies to the ASCII data types, not the binary ones. If this is debatable, we should call others in.
Assuming INF and NaN are legal for binary types, what should validate do? I'd say it should do nothing or throw a warning but definitely not throw an error.

@jordanpadams
Copy link
Member

from Dick Simpson:

...NaN is embedded in the IEEE754 standard, which we have adopted; there's little to be gained by claiming it's not acceptable in PDS4, so we've not attempted to say so. Further, there are multiple bit patterns that designate specific flavors of 'NaN'; on the off-chance that some of our data providers want to use those, there may be some benefit in letting them take advantage of the capability.

As for what to do, I think the present practice is acceptable — that is, nothing. If we accept that 'NaN' is allowed under the IEEE754 specification for binary data, then there's no reason to issue either a warning or an error. We do not want people to use 'NaN' in ASCII_Real fields, but that's a separate issue.

I would use the same logic for accepting '±Inf'.

@jordanpadams
Copy link
Member

@al-niessner per you comment here, hopefully the discussion above resolves which is the right answer?

@rsjoyner
Copy link

My recollection is that “NaN” and “Inf” and other such identifiers are NOT legal in either ascii or binary data.
The God of PDS3 and PDS4 created a class of Special Constants where NaN / Inf / + each have to be represented by a number (of the same data type).

For example, the number “666.66” in the string of bits file could be defined to be “error_constant”. So, each time “666.66” is located by s/w, the s/w knows that it represents “NaN” or ???.

When identifying a specific number to represent one of the <special_constants>, the data supplier obviously must choose a number that is guaranteed NOT to be present in the “normal” range of numbers.

Does this make sense ?

<xs:complexType name="Field_Binary">
xs:annotation
xs:documentationThe Field_Binary class defines a field of a
binary record or a field of a binary group.</xs:documentation>
</xs:annotation>
xs:complexContent
<xs:extension base="pds:Field">
xs:sequence
<xs:element name="field_location" type="pds:field_location" minOccurs="1" maxOccurs="1"> </xs:element>
<xs:element name="data_type" type="pds:data_type" minOccurs="1" maxOccurs="1"> </xs:element>
<xs:element name="field_length" type="pds:field_length" minOccurs="1" maxOccurs="1"> </xs:element>
<xs:element name="field_format" type="pds:field_format" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="unit" type="pds:unit" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="scaling_factor" type="pds:scaling_factor" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="value_offset" type="pds:value_offset" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="description" type="pds:description" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="Special_Constants" type="pds:Special_Constants" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="Field_Statistics" type="pds:Field_Statistics" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="Packed_Data_Fields" type="pds:Packed_Data_Fields" minOccurs="0" maxOccurs="1"> </xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="Special_Constants">
xs:annotation
xs:documentationThe Special Constants class provides a set of
values used to indicate special cases that occur in the
data.</xs:documentation>
</xs:annotation>
xs:sequence
<xs:element name="saturated_constant" type="pds:saturated_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="missing_constant" type="pds:missing_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="error_constant" type="pds:error_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="invalid_constant" type="pds:invalid_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="unknown_constant" type="pds:unknown_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="not_applicable_constant" type="pds:not_applicable_constant" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="valid_maximum" type="pds:valid_maximum" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="high_instrument_saturation" type="pds:high_instrument_saturation" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="high_representation_saturation" type="pds:high_representation_saturation" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="valid_minimum" type="pds:valid_minimum" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="low_instrument_saturation" type="pds:low_instrument_saturation" minOccurs="0" maxOccurs="1"> </xs:element>
<xs:element name="low_representation_saturation" type="pds:low_representation_saturation" minOccurs="0" maxOccurs="1"> </xs:element>
</xs:sequence>
</xs:complexType>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants