Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate does not check Header of a File_Area_Ancillary nor does not provide a meaningful error message for an incorrect Table_Character offset #361

Closed
mcostasitja opened this issue Jun 16, 2021 · 33 comments · Fixed by #393
Assignees
Labels
B12.0 bug Something isn't working s.medium

Comments

@mcostasitja
Copy link

🐛 Describe the bug

When fine tuning the generation of the label for a table file, I first struggled to debug my label because the error message I was obtaining was not pointing me towards the error. If the Table_Character offset attribute is incorrect validate interprets that there is only one record of length 1 (see attached validate_offset.log file).

Ultimately the error was that the offset was 1 byte off.

While debugging this error, I played around with the length of the Header attribute of the Table_Character and I found out that the object_length of the header is not checked and if an incorrect value (in my case it should correspond to the Table_Character offset minus one byte) is provided, the label validates.

📜 To Reproduce

The original tab file is attached (maven_orb_rec_210101_210401_v1.orb) along with the correct label (maven_orb_rec_210101_210401_v1.xml), a label with an incorrect header length that validates (maven_orb_rec_210101_210401_v1.xml.header) and a label with the incorrect offset that fails to validate (maven_orb_rec_210101_210401_v1.xml.offset).

The command used is:
$validate --local -t maven_orb_rec_210101_210401_v1.xml

🕵️ Expected behavior

I would expect an error message if the header length is incorrect.
I would expect a more meaningful error message if the offset is incorrect.

📚 Version of Software Used

Version 2.0.3

🩺 Test Data / Additional context

Test data is attached. Note that .txt has been added to the non .log extensions.

maven_orb_rec_210101_210401_v1.orb.txt
maven_orb_rec_210101_210401_v1.xml.header.txt
maven_orb_rec_210101_210401_v1.xml.offset.txt
maven_orb_rec_210101_210401_v1.xml.txt
validate_offset.log

🖥 System Info

  • OS: macOS Catalina 10.15.7

🦄 Applicable requirements**

@mcostasitja mcostasitja added bug Something isn't working needs:triage labels Jun 16, 2021
@mcostasitja mcostasitja changed the title validate does not check Header of a File_Area_Ancillary nor does not provide a meaningul error message for an incorrect Table_Character offset validate does not check Header of a File_Area_Ancillary nor does not provide a meaningful error message for an incorrect Table_Character offset Jun 16, 2021
@msbentley
Copy link

Hey @mcostasitja how would validate know the expected length of the header? I would hope it checks for overlapping byte offset+object length, but there could be unlabelled "stuff" between any data objects in the file. Just to understand the problem ;)

@jordanpadams
Copy link
Member

@mcostasitja to follow-on to @msbentley's comment, validate only knows what the label tells it, and from there, there are a bunch of bits in the data file that it tries to read. if I am understanding the issue correctly, the checks for all the possible reasons why a field does not line-up with the definition can almost be infinite.

should this maybe be a documentation update for common debugging steps when content validation fails?

@msbentley
Copy link

Yeah, I think sometimes the errors can be a bit cryptic - and byte offsets are a pain to track down - especially in binary formats like CDF where it can be really hard to work out what's going on. A document update, or even a wiki, or something would be really nice to help people figure out how to go from an error message to a solution :)

@mcostasitja
Copy link
Author

I agree with @msbentley comment.

@msbentley, @jordanpadams in this particular case, the label has all the information relevant for the header to validate:

offset
object_length
parsing_standard_id

I understand that there might be items in between the header and the table itself that do not belong to the header nor to the table, I had not thought that this is allowed: to me anything before the table should be considered a header. If this is not the case it would be nice to have it documented indeed.

WRT the table offset error, as @msbentley this was an easy catch :) but a more meaningful message would help. Maybe validate could try to "correct" an offset, based on the number of records, the records length and the record delimiters and then provide a suggestion of what is going on?

@jordanpadams
Copy link
Member

@mcostasitja per the header comment, I 100% agree, i don't believe this is explicitly noted as being allowed in PDS4, but it is also not explicitly noted that you cannot. that should be brought to the DDWG to update the DPH / SR to denote that, or add more strictness here (which would be preferred by validate).

per the improved error messaging, and poking around for possibilities, we will take that into consideration. we may be revamping our reporting output a bit with more information, and something like "debugging help" may be something we can look into flagging.

@jordanpadams jordanpadams removed their assignment Jun 16, 2021
@jordanpadams jordanpadams added enhancement New feature or request and removed bug Something isn't working labels Jun 16, 2021
@mcostasitja
Copy link
Author

I have just created an issue at the PDS JIRA https://pds-jira.jpl.nasa.gov/browse/CCB-332

@msbentley
Copy link

I'm still not fully appreciating what is required or desired here?

If we are required to label every bit and byte in a file then I think this either excludes some external formats where the tables and arrays are located throughout the file, or we have to mis-use the Header class to label every "thing" between the data. I'm not sure this is useful to the end user, although I appreciate that it would make validation easier...

@mcostasitja
Copy link
Author

@msbentley I believe that if one includes the header in the label, the information in the label should be correct. Whatever "correct" means. And if there is no way to ensure that, why then provide the header information after all?

I can perfectly understand that there is no need to specify a header at all, since all the relevant information that the user needs can be found in other fields in the label.

@msbentley
Copy link

@mcostasitja it's true that the header object is often not useful - however, it does provide one useful piece of information, and that is the parsing standard, which currently is the only way to let user software know that the described data object may be accessible via another standard (e.g. FITS, CDF, GeoTIFF etc.). I consider it "best practise" to label a "true" header in a PDS4 product, but I don't believe there is a requirement to do it.

Sorry for banging on about this, but I just want to make sure we don't head in a direction that would preclude external formats. If @jordanpadams can find a way to get validate to actually parse/validate the content of a header, then fine - but I would imagine this is too much effort for something which is skipped/invisible to PDS4 :)

@jordanpadams
Copy link
Member

+1 @msbentley @mcostasitja . we will see how we can improve this through at least documentation. but agreed that with all the flexibility in the IM with regard to headers and "hidden space" within data products, it is just a hard thing for validate to check accurately. Ideally, PDS4 would require you to describe every bit in the data file, and then validate could easily check all of that (it already does).

@radiosci
Copy link

radiosci commented Jun 23, 2021

Marc directed me to GitHub for additional comments after we exchanged messages on JIRA. I had not realized this was a problem until CCB-332 appeared on the DDWG agenda.
I agree with what I see here. The Header object is available to anyone who wants to provide information to a non-PDS system (subject to the limits of parsing_standard_id). Validate is only obligated to check that the Header does not overlap other objects in the file. It should throw an error if there is overlap; otherwise, it should move on. As Jordan says, to do more opens an infinite number of possible journeys ... and for what purpose?

@mcostasitja
Copy link
Author

I agree with your comment, but I would still like to point out that the error that Validate provided for the Table_Character offset error (with header overlap) could be more meaningful.

@radiosci
Copy link

I would still like to point out that the error that Validate provided for the Table_Character offset error (with header overlap) could be more meaningful.
I agree.

@mcostasitja
Copy link
Author

Following discussion during the DDWG meeting on Aug 5, 2021, it was concluded that the appropriate check by validate tool would be to ensure no overlaps exists in between classes, and if so, report that overlap. Although this behavior is seen in other classes it does not seem to apply to the header class.

@jordanpadams can you confirm this is an appropriate suggestion?

@msbentley
Copy link

If validate is not already checking that objects defined in the File classes don't overlap, this sounds like a straight-up bug to me ;-)

@jordanpadams
Copy link
Member

@mcostasitja looks like this is a bug in validate somewhere where it does not appear to be validating the content of these file. will add to the queue.

@jordanpadams jordanpadams added B12.0 bug Something isn't working and removed icebox enhancement New feature or request labels Aug 7, 2021
@qchaupds
Copy link
Contributor

qchaupds commented Aug 10, 2021

@mcostasitja I am not able to reproduce the offset error you had observed. Validate reports 'PASS' for status.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 137 % validate -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.offset.xml

PDS Validate Tool Report

Configuration:
   Version                       2.1.0-SNAPSHOT
   Date                          2021-08-10T20:58:56Z

Parameters:
   Targets                       [file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.offset.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      src/main/resources/util/registered_context_products.json



Product Level Validation Results

  PASS: file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.offset.xml
        1 product validation(s) completed
[main] INFO gov.nasa.pds.tools.validate.task.ValidationTask - Validation complete for location 'file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.offset.xml'

Summary:

  0 error(s)
  0 warning(s)

  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped


End of Report
Completed execution in 5643 ms

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate/src/test/resources/github361 126 % diff maven_orb_rec_210101_210401_v1.header.xml maven_orb_rec_210101_210401_v1.offset.xml
70c70
<       <object_length unit="byte">1</object_length>
---
>       <object_length unit="byte">267</object_length>
75c75
<       <offset unit="byte">268</offset>
---
>       <offset unit="byte">267</offset>

The .header file did not produced any errors (which from the ticket, it should?). Will take a deeper dive of the reading of the header.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 138 % validate -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.header.xml

PDS Validate Tool Report

Configuration:
   Version                       2.1.0-SNAPSHOT
   Date                          2021-08-10T21:02:20Z

Parameters:
   Targets                       [file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.header.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      src/main/resources/util/registered_context_products.json



Product Level Validation Results

  PASS: file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.header.xml
        1 product validation(s) completed
[main] INFO gov.nasa.pds.tools.validate.task.ValidationTask - Validation complete for location 'file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.header.xml'

Summary:

  0 error(s)
  0 warning(s)

  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped


End of Report
Completed execution in 5216 ms

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 139 % 

@mcostasitja
Copy link
Author

mcostasitja commented Aug 10, 2021

@qchaupds the file that should fail is maven_orb_rec_210101_210401_v1.xml.offset, but please note that I was using validate v2.0.3

I tried to replicate the error but I can't either, shame on me... but the purpose of the issue still applies because I I change the Table_Character offset bytes there is no header <-> Table_Character overlap error being reported.

@jordanpadams
Copy link
Member

@mcostasitja should that offset one fail? object_length and offset should match. so in your file,

    <Header>
      <name>Orbit number header</name>
      <offset unit="byte">0</offset>
      <object_length unit="byte">267</object_length>

and

    <Table_Character>
      <name>Orbit number table</name>
      <offset unit="byte">267</offset>

which is correct. the offset is saying, "ignore this many bytes", which in this case we are saying "ignore the first 267 bytes"

@jordanpadams
Copy link
Member

the *.header.xml file was a bit of a concern for me because the offset is off by one:

    <Header>
      <name>Orbit number header</name>
      <offset unit="byte">0</offset>
      <object_length unit="byte">1</object_length>
      <parsing_standard_id>7-Bit ASCII Text</parsing_standard_id>
    </Header>
    <Table_Character>
      <name>Orbit number table</name>
      <offset unit="byte">268</offset>

so I am curious as to why that extra byte doesn't throw everything else off? @qchaupds can we verify we are checking each field / record length for Ancillary products? seems odd that that 1 byte doesn't throw the whole thing off

@qchaupds
Copy link
Contributor

Both of these tests are now catching the errors:

% validate -R pds4.label --skip-context-reference-check -r report_github361_label_invalid_mini_offset.json -s json -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.offset.xml >& t11

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 156 % egrep "label|message" report_github361_label_invalid_mini_offset.json | head
"ruleType": "pds4.label",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.offset.xml",
"messages": [],
"messages": [
"message": "The Header object_length 267 plus the Header offset 0 should be the same as the the table offset 333"

% validate -R pds4.label --skip-context-reference-check -r report_github361_label_invalid_mini_header.json -s json -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.header.xml >& t12

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 158 % egrep "label|message" report_github361_label_invalid_mini_header.json | head
"ruleType": "pds4.label",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.header.xml",
"messages": [],
"messages": [
"message": "The Header object_length 1 plus the Header offset 0 should be the same as the the table offset 268"
"message": "This field overlaps the next field. Current field ends at 6. Next field starts at 2 but should be at least at 7"

@qchaupds
Copy link
Contributor

Printing the offset and object_length values in both set of minimal labels:

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 163 % egrep "offset|object_length" src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.offset.xml
0
<object_length unit="byte">267</object_length>

333

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 164 % egrep "offset|object_length" src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.header.xml
0
<object_length unit="byte">1</object_length>
268

@jordanpadams
Copy link
Member

jordanpadams commented Aug 11, 2021

@qchaupds

The Header object_length 267 plus the Header offset 0 should be the same as the the table offset 333

That does not have to be true though. The table offset can include "empty space" or space not defined by the header. So you can have a header that is 127 bytes and a table that has an offser of 333. The thing we need to check (which I think we do) is if the object_length is 127 and the table offset is less than 127. That would be where the header and the table overlap, but extra padding is ok

The concern I noted above is, per your example, if label X has table offset is 333 and label Y has table offset 127 and they both refer to the same table file, how is the validation of the rows/columns of the table not failing on one of them? It seems like that should put us somewhere randomly inside the table, which would through all the content validation of the rows/columns off?

@qchaupds
Copy link
Contributor

@jordanpadams You are correct about the check for Header total length versus the table offset. Working on a fix.

@jordanpadams
Copy link
Member

@msbentley @mcostasitja how does this sound for the error message you would expect back:

The table offset 222 for object X is invalid. The previously defined object ends at byte 267.

@msbentley
Copy link

Sounds fine - if name attributes are present for one or both objects, it would be nice to use them, otherwise using a counter I guess.

@mcostasitja
Copy link
Author

@msbentley sounds fine to me.

qchaupds pushed a commit that referenced this issue Aug 26, 2021
…ages when the offset overlap between Header and table data or table fields overlap between each others.

1. Add test resources :src/test/resources/github361
2. Add meaningful error message when fields overlap :src/main/java/gov/nasa/pds/tools/validate/content/table/FieldValueValidator.java
3. Add logic to check for overlap of header total length with table offset, and add meaningful error message when offset is incorrect :src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableDataContentValidationRule.java
4. Modify logic to read Table_Character tags from all possible main tags not just Product_Observational :src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableFieldDefinitionRule.java

Refs:

#361 validate does not check Header of a File_Area_Ancillary nor does not provide a meaningful error message for an incorrect Table_Character offset
@jordanpadams
Copy link
Member

@msbentley #393 shows we can actually pull out the object name, so we include it (if it is included in the label)

jordanpadams added a commit that referenced this issue Aug 27, 2021
Check Header of File_Area_Ancillary and provide meaningful error mess…
@rchenatjpl
Copy link

Holy moly, that was a lot to read through. Is there supposed to be an ERROR if (for File_Area_Ancillary) header/offset + header/object_length > Table_Character/offset? That works. If something else is supposed to happen, please say so.

@rchenatjpl
Copy link

@qchaupds @msbentley @jordanpadams val361.zip

I poked a little more. What if the header overlapped an Array_2D_Image instead of a Table_Character? Validate allows that. If that is bad, let's start a new issue. In the attached, val361.xml is the current issue, and validate does the right thing. val361b.xml is what I described in this comment, and validate incorrectly (I presume) passes it. The Array_2D_Image is nonsense - it just allows an easy diff of the two .xml files.

% validate -t val361.xml
PDS Validate Tool Report
Configuration:
Version 2.2.0-SNAPSHOT
Date 2021-10-26T07:32:29Z
Parameters:
Targets [file:/Users/rchen/Desktop/test/val361.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, .XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
FAIL: file:/Users/rchen/Desktop/test/val361.xml
Begin Content Validation: file:/Users/rchen/Desktop/test/val361.orb
ERROR [error.table.fields_mismatch] table 1: The table offset 268 for object 'Orbit number table' is invalid. The previously defined object ends at byte 99999
End Content Validation: file:/Users/rchen/Desktop/test/val361.orb
1 product validation(s) completed
Summary:
1 error(s)
0 warning(s)
Product Validation Summary:
0 product(s) passed
1 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
Message Types:
1 error.table.fields_mismatch
End of Report
Completed execution in 3864 ms
%
%
% validate -t val361b.xml
PDS Validate Tool Report
Configuration:
Version 2.2.0-SNAPSHOT
Date 2021-10-26T07:32:11Z
Parameters:
Targets [file:/Users/rchen/Desktop/test/val361b.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [
.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
PASS: file:/Users/rchen/Desktop/test/val361b.xml
1 product validation(s) completed
Summary:
0 error(s)
0 warning(s)
Product Validation Summary:
1 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
End of Report
Completed execution in 3889 ms

@msbentley
Copy link

Agreed - no two objects should overlap, it doesn't matter if they are arrays, tables or headers. On the other hand, there may be space in between them. I think that's it!

@jordanpadams
Copy link
Member

thanks @rchenatjpl @msbentley . unfortunately, we merged the PR associated with this ticket, but probably should have created a separate ticket since it was only partially implemented for Table_*. we have a follow-on ticket at #425 to refactor the algorithm

@radiosci
Copy link

Mark is 100 percent right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B12.0 bug Something isn't working s.medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants