-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validate doesn't flag a data file with only LF #499
Comments
@rchenatjpl shouldn't this be valid as of CCB-264? #292 |
oh, I forgot about that. I guess the problem is then what should validate do if the data file has only LF, but the label says record_delimiter is CRLF, which was the only value available for the selected IM model |
@rchenatjpl copy. That is a bug then |
@jordanpadams @nutjob4life @tloubrieu-jpl The failure is obscure and needs clarification. It starts here: validate/src/main/java/gov/nasa/pds/tools/validate/rule/pds4/FileReferenceValidationRule.java Lines 429 to 439 in 1979b01
In the XML file the doctype resolves to "". when at line 429 the problemType is retrieved with doctype being "" a null is returned. It is an inherent part of the call at: validate/src/main/java/gov/nasa/pds/tools/util/DocumentUtil.java Lines 117 to 137 in 1979b01
because "" is not part of the initialize() of the class. Have a bunch like EXCEL, HTML, GIF, etc. Not blank so the function returns null. Now at line 432, the null is detected and the generic document test is not performed (not to say that the LF would have been caught but it was never given the chance). So,
Please clarify which is the actual problem (XML and/or code) and what the actual problem is (malformed XML and/or not detecting previous error). |
✅ The 429-439 block of code is wrong and it should always do the generic test despite not having a doctype but I actually think that doctype code was kind of a wild ride that we wound up not really relying on too much. In this case, we should continue to perform validation. |
@al-niessner ☝️ |
@jordanpadams @nutjob4life @tloubrieu-jpl Yum, I do not think this is the right route. I think the XML is not sufficient. Applied patch to have problemType for the doctype = "" and it now sinks checking a little deeper: validate/src/main/java/gov/nasa/pds/tools/validate/rule/pds4/FileReferenceValidationRule.java Lines 838 to 846 in 15c37d4
The problem here is that documentStandardId = null causing the else portion to be execute instead of the actual check at line 839 not that it would matter to this ticket. If you look just down from the highlighted block it is obvious that the only check is the mime type - not too thrilling. Point is, if we are checking a random file then how do we know it should have LFCR instead of LF or CR or just randomly placed bytes of those values because it is a binary file. Without a proper doctype and mime type to tell us that lines should end LFCR and that LF or CR will not appear otherwise in the file then how can a check for them be meaningful? |
@jordanpadams @nutjob4life @tloubrieu-jpl Just ran the cucumber tests and 8 of them fail because they now pass files that do not resolve a doctype. It seems there is a much a larger design problem and less a line of code. |
@al-niessner which tests are failing and for which ticket? I would not be surprised if this was a larger design issue...
that is actually not true. LF and CRLF are defined in the labels:
this mimetype and doctype thing is totally arbitrary. as I noted, it was a bit of a rabbithole we went down to try to do some guessing to check that the document type defined in the label actually makes sense with the file name defined (e.g. label says Excel file and it ends in xls/xlsx). In the end, unless something looks totally out of the ordinary here, we should just ignore this check and keep going. It is only something glaring wrong where this error should ever be thrown (e.g. label says Excel file and it is a PDF). so for this case, I imagine the doctype check doesn't know what to do with a hopefully that makes sense. will try to poke at it some more. |
This is a valid file reference test not a table test. Are you saying all undefined doctype files are text tables? The tests that are failing are HTML, JPEG, GIF, EXCEL, etc (8 of them) that somehow leave doctype empty while specifying enough to be one of those 8 types. The error demonstrates that the block validate/src/main/java/gov/nasa/pds/tools/validate/rule/pds4/FileReferenceValidationRule.java Lines 429 to 439 in 1979b01
throws an error when productType is still null. It is clear that they are positive tests of a fail result. Not sure of the actual tests numbers as JUnit is hiding them very well from me. |
How can you tell that a file should have LFCR or CRLF? If given a JPEG does it need to have CRLF in it or just EOF? In other words, what in the XML is telling that this reference file should end with CRLF instead of LF or a stream of bytes until EOF? |
I think I am understanding it better now. The original error that doctype is "" forcing problemType to null is causing the base error. It then prevents the table checks from taking place when we think they should or that the table checks are not correct. Is that about right? |
this is described in the label and this check is already happening elsewhere in the code. for example, in the test data attached above, the label has:
so this only applies to products where you can define
|
the file reference check happens, which may throw an error or may not, but we should move onto doing the next check. nothing within that class should be a fatal error, stopping the remaining validation checks. |
I was thinking about this the other day - it could, theoretically, be possible that for some horribly esoteric reason a single file contains multiple tables with different line endings, right? :-/ And definitely possible that we have binary content mixed with ASCII, so I guess this check as to be at the data object level? |
Found it. Now there is a table line counting problem. Will fix soon. |
@msbentley yes! our code was refactored a bit to hopefully enable more ready support validation of things like this at the data object level. hopefully that works as expected. |
The attached data file has only LF but of course should have CRLF. validate passes it.
% validate -t ../M7_217_044546_N.xml
PDS Validate Tool Report
Configuration:
Version 2.1.5-SNAPSHOT
Date 2022-05-10T08:01:15Z
Parameters:
Targets [file:/Users/rchen/Desktop/M7_217_044546_N.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
PASS: file:/Users/rchen/Desktop/M7_217_044546_N.xml
1 product validation(s) completed
Summary:
0 error(s)
0 warning(s)
Product Validation Summary:
1 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
End of Report
Completed execution in 3190 ms
%
%
% validate -V
gov.nasa.pds:validate
Version 2.1.5-SNAPSHOT
Release Date: 2022-02-10 21:20:46
The text was updated successfully, but these errors were encountered: