-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validate does not check Header of a File_Area_Ancillary nor does not provide a meaningful error message for an incorrect Table_Character offset #361
Comments
Hey @mcostasitja how would validate know the expected length of the header? I would hope it checks for overlapping byte offset+object length, but there could be unlabelled "stuff" between any data objects in the file. Just to understand the problem ;) |
@mcostasitja to follow-on to @msbentley's comment, validate only knows what the label tells it, and from there, there are a bunch of bits in the data file that it tries to read. if I am understanding the issue correctly, the checks for all the possible reasons why a field does not line-up with the definition can almost be infinite. should this maybe be a documentation update for common debugging steps when content validation fails? |
Yeah, I think sometimes the errors can be a bit cryptic - and byte offsets are a pain to track down - especially in binary formats like CDF where it can be really hard to work out what's going on. A document update, or even a wiki, or something would be really nice to help people figure out how to go from an error message to a solution :) |
I agree with @msbentley comment. @msbentley, @jordanpadams in this particular case, the label has all the information relevant for the header to validate: offset I understand that there might be items in between the header and the table itself that do not belong to the header nor to the table, I had not thought that this is allowed: to me anything before the table should be considered a header. If this is not the case it would be nice to have it documented indeed. WRT the table offset error, as @msbentley this was an easy catch :) but a more meaningful message would help. Maybe validate could try to "correct" an offset, based on the number of records, the records length and the record delimiters and then provide a suggestion of what is going on? |
@mcostasitja per the header comment, I 100% agree, i don't believe this is explicitly noted as being allowed in PDS4, but it is also not explicitly noted that you cannot. that should be brought to the DDWG to update the DPH / SR to denote that, or add more strictness here (which would be preferred by validate). per the improved error messaging, and poking around for possibilities, we will take that into consideration. we may be revamping our reporting output a bit with more information, and something like "debugging help" may be something we can look into flagging. |
I have just created an issue at the PDS JIRA https://pds-jira.jpl.nasa.gov/browse/CCB-332 |
I'm still not fully appreciating what is required or desired here? If we are required to label every bit and byte in a file then I think this either excludes some external formats where the tables and arrays are located throughout the file, or we have to mis-use the |
@msbentley I believe that if one includes the header in the label, the information in the label should be correct. Whatever "correct" means. And if there is no way to ensure that, why then provide the header information after all? I can perfectly understand that there is no need to specify a header at all, since all the relevant information that the user needs can be found in other fields in the label. |
@mcostasitja it's true that the header object is often not useful - however, it does provide one useful piece of information, and that is the parsing standard, which currently is the only way to let user software know that the described data object may be accessible via another standard (e.g. FITS, CDF, GeoTIFF etc.). I consider it "best practise" to label a "true" header in a PDS4 product, but I don't believe there is a requirement to do it. Sorry for banging on about this, but I just want to make sure we don't head in a direction that would preclude external formats. If @jordanpadams can find a way to get validate to actually parse/validate the content of a header, then fine - but I would imagine this is too much effort for something which is skipped/invisible to PDS4 :) |
+1 @msbentley @mcostasitja . we will see how we can improve this through at least documentation. but agreed that with all the flexibility in the IM with regard to headers and "hidden space" within data products, it is just a hard thing for validate to check accurately. Ideally, PDS4 would require you to describe every bit in the data file, and then validate could easily check all of that (it already does). |
Marc directed me to GitHub for additional comments after we exchanged messages on JIRA. I had not realized this was a problem until CCB-332 appeared on the DDWG agenda. |
I agree with your comment, but I would still like to point out that the error that Validate provided for the Table_Character offset error (with header overlap) could be more meaningful. |
|
Following discussion during the DDWG meeting on Aug 5, 2021, it was concluded that the appropriate check by validate tool would be to ensure no overlaps exists in between classes, and if so, report that overlap. Although this behavior is seen in other classes it does not seem to apply to the header class. @jordanpadams can you confirm this is an appropriate suggestion? |
If validate is not already checking that objects defined in the File classes don't overlap, this sounds like a straight-up bug to me ;-) |
@mcostasitja looks like this is a bug in validate somewhere where it does not appear to be validating the content of these file. will add to the queue. |
@mcostasitja I am not able to reproduce the offset error you had observed. Validate reports 'PASS' for status.
The .header file did not produced any errors (which from the ticket, it should?). Will take a deeper dive of the reading of the header.
|
@qchaupds the file that should fail is maven_orb_rec_210101_210401_v1.xml.offset, but please note that I was using validate v2.0.3 I tried to replicate the error but I can't either, shame on me... but the purpose of the issue still applies because I I change the Table_Character offset bytes there is no header <-> Table_Character overlap error being reported. |
@mcostasitja should that offset one fail? <Header>
<name>Orbit number header</name>
<offset unit="byte">0</offset>
<object_length unit="byte">267</object_length> and <Table_Character>
<name>Orbit number table</name>
<offset unit="byte">267</offset> which is correct. the offset is saying, "ignore this many bytes", which in this case we are saying "ignore the first 267 bytes" |
the
so I am curious as to why that extra byte doesn't throw everything else off? @qchaupds can we verify we are checking each field / record length for Ancillary products? seems odd that that 1 byte doesn't throw the whole thing off |
Both of these tests are now catching the errors: % validate -R pds4.label --skip-context-reference-check -r report_github361_label_invalid_mini_offset.json -s json -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.offset.xml >& t11 {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 156 % egrep "label|message" report_github361_label_invalid_mini_offset.json | head % validate -R pds4.label --skip-context-reference-check -r report_github361_label_invalid_mini_header.json -s json -t src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.header.xml >& t12 {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 158 % egrep "label|message" report_github361_label_invalid_mini_header.json | head |
Printing the offset and object_length values in both set of minimal labels: {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 163 % egrep "offset|object_length" src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.offset.xml {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 164 % egrep "offset|object_length" src/test/resources/github361/maven_orb_rec_210101_210401_v1.minimal.header.xml |
That does not have to be true though. The table offset can include "empty space" or space not defined by the header. So you can have a header that is 127 bytes and a table that has an offser of 333. The thing we need to check (which I think we do) is if the object_length is 127 and the table offset is less than 127. That would be where the header and the table overlap, but extra padding is ok The concern I noted above is, per your example, if label X has table offset is 333 and label Y has table offset 127 and they both refer to the same table file, how is the validation of the rows/columns of the table not failing on one of them? It seems like that should put us somewhere randomly inside the table, which would through all the content validation of the rows/columns off? |
@jordanpadams You are correct about the check for Header total length versus the table offset. Working on a fix. |
@msbentley @mcostasitja how does this sound for the error message you would expect back:
|
Sounds fine - if |
@msbentley sounds fine to me. |
…ages when the offset overlap between Header and table data or table fields overlap between each others. 1. Add test resources :src/test/resources/github361 2. Add meaningful error message when fields overlap :src/main/java/gov/nasa/pds/tools/validate/content/table/FieldValueValidator.java 3. Add logic to check for overlap of header total length with table offset, and add meaningful error message when offset is incorrect :src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableDataContentValidationRule.java 4. Modify logic to read Table_Character tags from all possible main tags not just Product_Observational :src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableFieldDefinitionRule.java Refs: #361 validate does not check Header of a File_Area_Ancillary nor does not provide a meaningful error message for an incorrect Table_Character offset
@msbentley #393 shows we can actually pull out the object name, so we include it (if it is included in the label) |
Check Header of File_Area_Ancillary and provide meaningful error mess…
Holy moly, that was a lot to read through. Is there supposed to be an ERROR if (for File_Area_Ancillary) header/offset + header/object_length > Table_Character/offset? That works. If something else is supposed to happen, please say so. |
@qchaupds @msbentley @jordanpadams val361.zip I poked a little more. What if the header overlapped an Array_2D_Image instead of a Table_Character? Validate allows that. If that is bad, let's start a new issue. In the attached, val361.xml is the current issue, and validate does the right thing. val361b.xml is what I described in this comment, and validate incorrectly (I presume) passes it. The Array_2D_Image is nonsense - it just allows an easy diff of the two .xml files. % validate -t val361.xml |
Agreed - no two objects should overlap, it doesn't matter if they are arrays, tables or headers. On the other hand, there may be space in between them. I think that's it! |
thanks @rchenatjpl @msbentley . unfortunately, we merged the PR associated with this ticket, but probably should have created a separate ticket since it was only partially implemented for |
Mark is 100 percent right. |
🐛 Describe the bug
When fine tuning the generation of the label for a table file, I first struggled to debug my label because the error message I was obtaining was not pointing me towards the error. If the Table_Character offset attribute is incorrect validate interprets that there is only one record of length 1 (see attached validate_offset.log file).
Ultimately the error was that the offset was 1 byte off.
While debugging this error, I played around with the length of the Header attribute of the Table_Character and I found out that the object_length of the header is not checked and if an incorrect value (in my case it should correspond to the Table_Character offset minus one byte) is provided, the label validates.
📜 To Reproduce
The original tab file is attached (maven_orb_rec_210101_210401_v1.orb) along with the correct label (maven_orb_rec_210101_210401_v1.xml), a label with an incorrect header length that validates (maven_orb_rec_210101_210401_v1.xml.header) and a label with the incorrect offset that fails to validate (maven_orb_rec_210101_210401_v1.xml.offset).
The command used is:
$validate --local -t maven_orb_rec_210101_210401_v1.xml
🕵️ Expected behavior
I would expect an error message if the header length is incorrect.
I would expect a more meaningful error message if the offset is incorrect.
📚 Version of Software Used
Version 2.0.3
🩺 Test Data / Additional context
Test data is attached. Note that .txt has been added to the non .log extensions.
maven_orb_rec_210101_210401_v1.orb.txt
maven_orb_rec_210101_210401_v1.xml.header.txt
maven_orb_rec_210101_210401_v1.xml.offset.txt
maven_orb_rec_210101_210401_v1.xml.txt
validate_offset.log
🖥 System Info
🦄 Applicable requirements**
The text was updated successfully, but these errors were encountered: