-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate #372
Comments
Initial inspection of the run reveals the 2nd collection file: spectra_doc_collection_inventory.xml was ignored by the crawler. Will revamp the crawling for "bundle" and "collection" files to make a decision of which latest "bundle" or "collection" should be processed. A possible logic is to collect all the "collection" files and group similar ones together and then decide which one is the latest of the similar group. A "similar" collection is the ones that share the same logical_identifier but different version_id. The affected code is: src/main/java/gov/nasa/pds/tools/validate/BundleManager.java where all the crawling is taking place. |
A correction to Jordan's description. There are only 7 .xml files: {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 164 % find src/test/resources/github69/invalid -name "*.xml" | cat -n They are check for validity in "productLevelValidationResults" and "PDS4BundleLevelValidationResults" portions. |
Part 1 status: % validate -R pds4.bundle --skip-content-validation -r report_github372_bundle_invalid.json -s json -t src/test/resources/github372/invalid/bundle_kaguya_derived.xml The warning message warning.file.not_referenced_in_label for data file spectra_doc_collection_inventory.csv is no longer there because the code now recognize the label file spectra_doc_collection_inventory.xml {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 165 % grep not_referenced report_github372_bundle_invalid.json Added the 2nd label file spectra_doc_collection_inventory.xml and both are recognized by validate. {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 167 % egrep "label" report_github372_bundle_invalid.json | grep doc_collection |
Part 2 status: {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 173 % validate -R pds4.bundle --skip-content-validation -r report_github69_bundle_invalid.json -s json -t src/test/resources/github69/invalid/bundle_kaguya_derived.xml >& t10 The label "spectra_doc_collection_inventory.xml" is now recognized by validate. {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 175 % grep label report_github69_bundle_invalid.json | grep doc_collection The productLevelValidationResults now process all 7 labels: {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 178 % egrep "Results|label" report_github69_bundle_invalid.json | egrep -v allowUnlabeledFiles | cat -n {pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 180 % find src/test/resources/github69/invalid -name "*.xml" | xargs ls -l | cat -n |
@jordanpadams , Qui is waiting for your feedback on that. |
1. Add extra bundle file to observe crawler ignoring it :src/test/resources/github69/invalid/bundle_kaguya_derived_2.xml \ 2. Add extra collection to observe crawler ignoring it :src/test/resources/github69/invalid/data_spectra/spectra_data_collection_inventory_2.xml \ 3. Add extra collection to observe crawler processing it :src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory_2.xml \ 4. Add logic to ignore certain file and keep certain files based on logical_identifier values :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java 5. Remove unused functions findAllBundleFiles and findAllCollectionFiles() :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java Refs: #372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
…print "SKIP:" statements if debug is set. 1. Add severity variable to house keeping :src/main/java/gov/nasa/pds/tools/util/FlagsUtil.java 2. Add print statement of "SKIP:" :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java 3. Add call to FlagsUtil.setSeverity() :src/main/java/gov/nasa/pds/validate/ValidateLauncher.java Refs: #372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
…s skipped in report instead of standard out and correct log level from error to debug 1. Add setReport() so a problem can be added to report :src/main/java/gov/nasa/pds/tools/label/LocationValidator.java \ 2. Correct logging level from error to debug :src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java \ 3. Add reporting of files skipped in report instead of standard out :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java \ 4. Add new type UNREFERENCED_FILE for use when files are skipped :src/main/java/gov/nasa/pds/tools/validate/ProblemType.java \ 5. Add call to setReport() to set report in LocationValidator object :src/main/java/gov/nasa/pds/validate/ValidateLauncher.java \ 6. Add logging capability :src/main/java/gov/nasa/pds/validate/report/Report.java Refs: #372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
…ction files skipped in report when the target is a directory 1. Add new function containsExistingTarget() to inspect from a list of Target :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java 2. Modify parameter in findOtherCollectionFiles() to receive a list of Target instead of a single URL in case there are more than 1 latest collections :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java 3. Modify variable from otherBundleFilesList to otherCollectionFilesList to match purpose of function :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java 4. Add reporting of collection files skipped in report when the target is a directory: src/main/java/gov/nasa/pds/tools/validate/BundleManager.java Refs: #372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
Fix bug with crawler ignoring certain collection files
closed per #389 |
@jordanpadams Can you share the data file with i&t? or let me know where I can download src/test/resources/github69/invalid/. Thanks. |
@gxtchen all the test resources that are referenced in tickets can be found here: https://github.com/NASA-PDS/validate/tree/main/src/test/resources |
🐛 Describe the bug
With v2+ release of validate, we are no longer reporting all the files being read by validate. This is most likely a quirk of the improvements made for #238 and #249.
Part 1
📜 To Reproduce
Executing the following with the latest 2.1.0-SNAPSHOT, I get a few quirky results:
I get this WARNING message about an unreferenced file:
This file is referenced by
spectra_doc_collection_inventory.xml
, but I think sincespectra_doc_collection_inventory.xml
is not referenced by the parent bundle, it is never read in and causes this odd output.🕵️ Expected behavior
spectra_doc_collection_inventory.xml
referencesspectra_doc_collection_inventory.csv
, but then we continue on with the logic as-is, which is to ignore performing validation on the actual product.Part 2
📜 To Reproduce
Executing the following with the latest 2.1.0-SNAPSHOT, I get a few quirky results:
The output logs only include 13 XML files being checked, while there are 15 in the bundle that used to be checked by older versions of validate. (see logs attached below)
🕵️ Expected behavior
SKIP: file:...
for those files we skip over because they are not referenced by the parent bundle / collection / etc. If possible, let's also include a note as to why we are skipping them. It can be generic across all skipped products, but we should include some text if possible.📚 Version of Software Used
🩺 Test Data / Additional context
v2.1.0_report.txt
v1.24.0_report.txt
The text was updated successfully, but these errors were encountered: