Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #593: fix regression introduced to #325 integration tests #594

Merged
merged 6 commits into from
Feb 14, 2023

Conversation

al-niessner
Copy link
Contributor

@al-niessner al-niessner commented Feb 8, 2023

🗒️ Summary

Many part fix. In this repository, change the table validator to push character tables to the correct test. However, it requires the changes waiting in NASA-PDS/pds4-jparser#86.

⚙️ Test Data and/or Report

Test github325 is enabled and should pass as well as all other unit tests.

♻️ Related Issues

fixes #593

@al-niessner al-niessner self-assigned this Feb 8, 2023
@al-niessner al-niessner requested a review from a team as a code owner February 8, 2023 23:07
@al-niessner
Copy link
Contributor Author

@jordanpadams @nutjob4life @tloubrieu-jpl

Can we update all pds4-jarser again and then let these unit tests run. I want to see if my changes break something else. Looks good on my laptop but it does not always translate to jenkins.

@jordanpadams
Copy link
Member

@al-niessner merged pds4-jparser and re-ran the integration tests

@jordanpadams
Copy link
Member

jordanpadams commented Feb 9, 2023

@al-niessner also can you update the PR description to use the GitHub keywords to close the ticket when we merge this (e.g. fixes #593). Just one less step for us to track down the ticket and close it out

@jordanpadams
Copy link
Member

@al-niessner looks like the tests failed with the latest pds4-jparser

@jordanpadams
Copy link
Member

@al-niessner
Copy link
Contributor Author

@jordanpadams

Yes, I got the email that it failed but it was another test. Thought that might be the case because that is what happened earlier in previous ticket that broke 325. Will see why it breaks and keep going until it all turns green. Thanks.

Also, I know about the keywords but thought you did not want to use them for some reason. I can use them easy enough.

@jordanpadams
Copy link
Member

Also, I know about the keywords but thought you did not want to use them for some reason. I can use them easy enough.

my bad. thanks!

@jordanpadams jordanpadams changed the title issue 593: return github325 to service issue #593: fix regression introduced to #325 integration tests Feb 9, 2023
@al-niessner
Copy link
Contributor Author

@jordanpadams @nutjob4life

Okay, I do not understand where the error is here (code, test, or concept). I will write it out and hopefully never post it because I figured it out along the way. The new table validation does a better job?? and this error now seems to be doing what it is supposed to do but not what it did. So what is right? I will say that it is because now reading based on a fixed length rather than delimiter and given the table is character that seems correct as well.

There is an error in the table that fields overlap (#379). In the old version, got this error and only this error:

      ERROR  [error.table.field_value_overlap]   data object 1, record 1, field 42: This field overlaps the next field. Current field ends at 347. Next field starts at 347 but should be at least at 348

However this is back when the character table was being processed as a delimited table. Either way of processing the table ends up here:

In the delimited case, it breaks out of row processing immediately, does not tell the user that it did not skip rows, and moves onward with that single error.

In the character case, it generates these errors:

      ERROR  [error.table.field_value_overlap]   data object 1, record 1, field 42: This field overlaps the next field. Current field ends at 347. Next field starts at 346 but should be at least at 348
      ERROR  [error.table.field_value_overlap]   data object 1, record 1, field 42: This field overlaps the next field. Current field ends at 347. Next field starts at 347 but should be at least at 348
      ERROR  [error.table.records_mismatch]   data object 1: Number of records read is not equal to the defined number of records in the label (expected 583, got 1).

The first error is the same, thank goodness since the code is identical. The second error is new for reasons that I cannot explain since the error is in the middle of the record (741 bytes) but maybe for the same reason as the third error and I have not discovered it yet: delimiter table and character table do not check all of the same stuff. The third error occurs back here:

addTableProblem(ExceptionType.WARNING, ProblemType.RECORDS_MISMATCH, message, dataFile,

There is no similar check anywhere in validateTableDelimitedContent(FieldValueValidator, URL, int, boolean).

While the third error seems to point to updating the validate.features file, the second error implies not so much - mostly because I cannot explain it. When I look at XML for test the overlap is clear and the two messages from character table seems correct. Just not sure why the delimited table is not showing it since it appears to be happening the the shared code.

So, what should be done? The shortest and maybe near-term-correct is to update validate.features for the two new errors but what about the contradictions with delimited table processing? Ignore it? Or is time for the real fix to all use the same table checking but just have different readers (way more work with scary unknown ripples)?

@jordanpadams
Copy link
Member

jordanpadams commented Feb 14, 2023

So, what should be done? The shortest and maybe near-term-correct is to update validate.features for the two new errors but what about the contradictions with delimited table processing? Ignore it? Or is time for the real fix to all use the same table checking but just have different readers (way more work with scary unknown ripples)?

@al-niessner I say we fix the test for now, and see if other issues arise... I agree that the second error may just occur because when the first error gets caught, the reader gets all thrown off and may just throw another? either way, I think we march forward and see if we break anything. once you update the validate.features I will merge this PR.

per the contradictions with delimited processing, are you saying there is no RECORDS_MISMATCH warning anywhere in the handling of delimited tables?

@al-niessner
Copy link
Contributor Author

@jordanpadams

per the contradictions with delimited processing, are you saying there is no RECORDS_MISMATCH warning anywhere in the handling of delimited tables?

Yes.

@al-niessner
Copy link
Contributor Author

@jordanpadams @nutjob4life

All yours to review. Seems that cleared up all of the regression tests.

remove unnecessary import
@@ -470,6 +469,8 @@ private void validateTableCharacterContent(FieldValueValidator fieldValueValidat
ArrayList<Integer> lineLengthsArray = new ArrayList<>(0);
ArrayList<Long> lineNumbersArray = new ArrayList<>(0);

line = tableIsFixedLength ? this.currentTableReader.readNextFixedLine() : this.currentTableReader.readNextLine();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the class of currentTableReader should provide a single API readNextLine and it itself figures out if it needs to provide a fixed line or a next line rather than relying on clients of the object to do it.

But I know, I know, I sound like a broken record. This is just the way "validate" is architected! 😝

Copy link
Contributor Author

@al-niessner al-niessner Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no. It is not the reader that thinks it is fixed length. It is the XML definition that thinks it should be then the test code tells the reader to do a fixed length read rather delimited based on 3 values in the XML - touch more messed up than you were giving it credit for.

Your points are correct: Yes, some XML adapter should be asked if it is fixed length. Or, yes, the reader should be more complete and use the XML to do the necessary read (jparser). Yes, nothing will change because "do not fix it if it is not broken".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nutjob4life @al-niessner if you have a proposal for an architectural improvement to the software, feel free to create a ticket for it, and we can see if it becomes a higher priority in the future. I am all good with "do not fix it if it is not broken", until the technical debt becomes too high for us to manage. @al-niessner let me know when you think we have hit that tipping point.

Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@nutjob4life nutjob4life merged commit 9b357f8 into main Feb 14, 2023
@nutjob4life nutjob4life deleted the issue_593 branch February 14, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in validate no longer enabling CRLF to be embedded within a Table_Character record
3 participants