Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate does not correctly diagnose errors in the record_length value for fixed width Table_Character objects #680

Closed
jmafi opened this issue Aug 16, 2023 · 3 comments · Fixed by #688
Assignees

Comments

@jmafi
Copy link

jmafi commented Aug 16, 2023

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

For Table_Character labels containing a single error of an incorrect value for:

File_Area_Observational/Table_Character/Record_Character/record_length

Validate reports a large number of different errors, including multiple instances of records not ending in the indicated record delimiter and data values not matching the data type indicated in the label.

🕵️ Expected behavior

While the reported errors are technically true if Validate is reading the number of bytes reported in the record_length, they are not reporting the real problem. It would be better if Validate reported the actual error of the incorrect record_length.

📜 To Reproduce

  1. Take a valid PDS4 label containing a Table_Character object, and change the Record_Length value.
  2. Run Validate tool on the modified label and compare the validation report to that of the original label.

...
I've attached a zip file containing a sample data file with two PDS4 labels. ORB12_EUR_EPHIO_reclen96.xml is the original and has the correct record_length value of "96". The other, ORB12_EUR_EPHIO_reclen95.xml, has the record_length value changed to "95". The error report produced by Validate Tool is also included.

ORB12_EUR_EPHIO.zip

🖥 Environment Info

  • Validate version: 3.2.0
  • Operating System: Linux (Ubuntu 20)
    ...

📚 Version of Software Used

V.3.2.0 (2023-04-14 00:53:23)

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

@jmafi jmafi added bug Something isn't working needs:triage labels Aug 16, 2023
@jordanpadams
Copy link
Member

@jmafi This is a little bit of dicey issue because we have to key off of something as we read the data in. We can't key off of the record_delimiter because you can actually have CRLF or LF within a character field, so we have to use record_length.

We could try to read the table again and try to figure out a better error, but I'm not sure that logic will be any better.

Would updating the error message to say check record length and record delimiter are valid sufficient?

@jmafi
Copy link
Author

jmafi commented Aug 17, 2023

@jordanpadams Your concern about scanning for instances of the record_delimiter is understandable. Another consideration for any potential solution is the impact that it would have on performance. A possible alternative approach might be do perform a check similar to what's done for binary records: sum up the field_location and field_length of all of the fields in the record (maybe just the last field would do), add the length of the field delimiter and compare that to the record_length. However, this approach might not work if there are "spare" bytes at the end of the record. Updating the missing record delimiter error message to include the additional warning would be helpful if none of the other options prove feasible.

@jordanpadams
Copy link
Member

copy. thanks @jmafi . we will investigate, but I think the error message update may be the best we are going to be able to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants