Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate #372

Closed
2 tasks
jordanpadams opened this issue Jul 17, 2021 · 8 comments
Assignees
Labels
B12.0 bug Something isn't working s.medium

Comments

@jordanpadams
Copy link
Member

🐛 Describe the bug

With v2+ release of validate, we are no longer reporting all the files being read by validate. This is most likely a quirk of the improvements made for #238 and #249.


Part 1

📜 To Reproduce

Executing the following with the latest 2.1.0-SNAPSHOT, I get a few quirky results:

$ validate-2.1.0-SNAPSHOT/bin/validate --skip-content-validation -R pds4.bundle -t src/test/resources/github69/invalid/bundle_kaguya_derived.xml

I get this WARNING message about an unreferenced file:

  PASS: file:/Users/jpadams/Documents/proj/pds/pdsen/workspace/validate/src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.csv
      WARNING  [warning.file.not_referenced_in_label]   File is not referenced by any label
        3 integrity check(s) completed

This file is referenced by spectra_doc_collection_inventory.xml, but I think since spectra_doc_collection_inventory.xml is not referenced by the parent bundle, it is never read in and causes this odd output.

🕵️ Expected behavior

  • We should still be able to recognize that spectra_doc_collection_inventory.xml references spectra_doc_collection_inventory.csv, but then we continue on with the logic as-is, which is to ignore performing validation on the actual product.

Part 2

📜 To Reproduce

Executing the following with the latest 2.1.0-SNAPSHOT, I get a few quirky results:

$ validate-2.1.0-SNAPSHOT/bin/validate --skip-content-validation -R pds4.bundle -t src/test/resources/github69/invalid/bundle_kaguya_derived.xml

The output logs only include 13 XML files being checked, while there are 15 in the bundle that used to be checked by older versions of validate. (see logs attached below)

🕵️ Expected behavior

  • Even though we are skipping over collections / products that are not referenced by the bundle or collection, we should still output to the log that we are skipping them. Let's update our logging output to include a SKIP: file:... for those files we skip over because they are not referenced by the parent bundle / collection / etc. If possible, let's also include a note as to why we are skipping them. It can be generic across all skipped products, but we should include some text if possible.

📚 Version of Software Used

🩺 Test Data / Additional context

v2.1.0_report.txt
v1.24.0_report.txt

@jordanpadams jordanpadams added bug Something isn't working needs:triage labels Jul 17, 2021
@jordanpadams jordanpadams self-assigned this Jul 17, 2021
@jordanpadams jordanpadams added this to the 12.Roger.Bannister milestone Jul 17, 2021
@jordanpadams jordanpadams removed their assignment Jul 17, 2021
@qchaupds
Copy link
Contributor

qchaupds commented Aug 5, 2021

Initial inspection of the run reveals the 2nd collection file: spectra_doc_collection_inventory.xml was ignored by the crawler.

Will revamp the crawling for "bundle" and "collection" files to make a decision of which latest "bundle" or "collection" should be processed. A possible logic is to collect all the "collection" files and group similar ones together and then decide which one is the latest of the similar group. A "similar" collection is the ones that share the same logical_identifier but different version_id.

The affected code is:

src/main/java/gov/nasa/pds/tools/validate/BundleManager.java

where all the crawling is taking place.

@qchaupds
Copy link
Contributor

qchaupds commented Aug 5, 2021

A correction to Jordan's description. There are only 7 .xml files:

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 164 % find src/test/resources/github69/invalid -name "*.xml" | cat -n
1 src/test/resources/github69/invalid/bundle_kaguya_derived.xml
2 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per1.xml
3 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per2.xml
4 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per3.xml
5 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per4.xml
6 src/test/resources/github69/invalid/data_spectra/spectra_data_collection_inventory.xml
7 src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.xml

They are check for validity in "productLevelValidationResults" and "PDS4BundleLevelValidationResults" portions.

@qchaupds
Copy link
Contributor

qchaupds commented Aug 6, 2021

Part 1 status:

% validate -R pds4.bundle --skip-content-validation -r report_github372_bundle_invalid.json -s json -t src/test/resources/github372/invalid/bundle_kaguya_derived.xml

The warning message warning.file.not_referenced_in_label for data file spectra_doc_collection_inventory.csv is no longer there because the code now recognize the label file spectra_doc_collection_inventory.xml

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 165 % grep not_referenced report_github372_bundle_invalid.json

Added the 2nd label file spectra_doc_collection_inventory.xml and both are recognized by validate.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 167 % egrep "label" report_github372_bundle_invalid.json | grep doc_collection
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github372/invalid/data_spectra/spectra_doc_collection_inventory.xml",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github372/invalid/data_spectra/spectra_doc_collection_inventory_2.xml",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github372/invalid/data_spectra/spectra_doc_collection_inventory.xml",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github372/invalid/data_spectra/spectra_doc_collection_inventory_2.xml",

@qchaupds
Copy link
Contributor

qchaupds commented Aug 6, 2021

Part 2 status:

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 173 % validate -R pds4.bundle --skip-content-validation -r report_github69_bundle_invalid.json -s json -t src/test/resources/github69/invalid/bundle_kaguya_derived.xml >& t10

The label "spectra_doc_collection_inventory.xml" is now recognized by validate.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 175 % grep label report_github69_bundle_invalid.json | grep doc_collection
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.xml",
"label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.xml",
{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 176 %

The productLevelValidationResults now process all 7 labels:

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 178 % egrep "Results|label" report_github69_bundle_invalid.json | egrep -v allowUnlabeledFiles | cat -n
1 "productLevelValidationResults": [
2 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/bundle_kaguya_derived.xml",
3 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per1.xml",
4 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per2.xml",
5 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per3.xml",
6 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per4.xml",
7 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/spectra_data_collection_inventory.xml",
8 "label": "file:/home/qchau/sandbox/validate/src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.xml",

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 180 % find src/test/resources/github69/invalid -name "*.xml" | xargs ls -l | cat -n
1 -rw-r--r-- 1 qchau pds 6266 Jul 19 14:47 src/test/resources/github69/invalid/bundle_kaguya_derived.xml
2 -rw-r--r-- 1 qchau pds 12219 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per1.xml
3 -rw-r--r-- 1 qchau pds 12500 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per2.xml
4 -rw-r--r-- 1 qchau pds 12500 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per3.xml
5 -rw-r--r-- 1 qchau pds 12500 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/kgrs_calibrated_spectra_per4.xml
6 -rw-r--r-- 1 qchau pds 3868 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/spectra_data_collection_inventory.xml
7 -rw-r--r-- 1 qchau pds 3867 Jul 19 14:47 src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory.xml

@tloubrieu-jpl
Copy link
Member

@jordanpadams , Qui is waiting for your feedback on that.

qchaupds pushed a commit that referenced this issue Aug 10, 2021
1. Add extra bundle file to observe crawler ignoring it :src/test/resources/github69/invalid/bundle_kaguya_derived_2.xml \
2. Add extra collection to observe crawler ignoring it :src/test/resources/github69/invalid/data_spectra/spectra_data_collection_inventory_2.xml \
3. Add extra collection to observe crawler processing it :src/test/resources/github69/invalid/data_spectra/spectra_doc_collection_inventory_2.xml \
4. Add logic to ignore certain file and keep certain files based on logical_identifier values :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java
5. Remove unused functions findAllBundleFiles and findAllCollectionFiles() :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java

Refs:

#372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
qchaupds pushed a commit that referenced this issue Aug 16, 2021
…print "SKIP:" statements if debug is set.

1. Add severity variable to house keeping :src/main/java/gov/nasa/pds/tools/util/FlagsUtil.java
2. Add print statement of "SKIP:"  :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java
3. Add call to FlagsUtil.setSeverity() :src/main/java/gov/nasa/pds/validate/ValidateLauncher.java

Refs:

#372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
qchaupds pushed a commit that referenced this issue Aug 17, 2021
…s skipped in report instead of standard out and correct log level from error to debug

1. Add setReport() so a problem can be added to report :src/main/java/gov/nasa/pds/tools/label/LocationValidator.java \
2. Correct logging level from error to debug :src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java \
3. Add reporting of files skipped in report instead of standard out :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java \
4. Add new type UNREFERENCED_FILE for use when files are skipped :src/main/java/gov/nasa/pds/tools/validate/ProblemType.java \
5. Add call to setReport() to set report in LocationValidator object :src/main/java/gov/nasa/pds/validate/ValidateLauncher.java \
6. Add logging capability :src/main/java/gov/nasa/pds/validate/report/Report.java

Refs:

#372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
qchaupds pushed a commit that referenced this issue Aug 25, 2021
…ction files skipped in report when the target is a directory

1. Add new function containsExistingTarget() to inspect from a list of Target :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java
2. Modify parameter in findOtherCollectionFiles() to receive a list of Target instead of a single URL in case there are more than 1 latest collections :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java
3. Modify variable from otherBundleFilesList to otherCollectionFilesList to match purpose of function :src/main/java/gov/nasa/pds/tools/validate/BundleManager.java
4. Add reporting of collection files skipped in report when the target is a directory: src/main/java/gov/nasa/pds/tools/validate/BundleManager.java

Refs:

#372 Issues with logic for reading latest version of collections, and file read logging lost with v2.* of validate
jordanpadams added a commit that referenced this issue Aug 25, 2021
jordanpadams added a commit that referenced this issue Aug 25, 2021
Fix bug with crawler ignoring certain collection files
@jordanpadams
Copy link
Member Author

closed per #389

@gxtchen
Copy link

gxtchen commented Aug 29, 2021

@jordanpadams Can you share the data file with i&t? or let me know where I can download src/test/resources/github69/invalid/. Thanks.

@jordanpadams
Copy link
Member Author

@gxtchen all the test resources that are referenced in tickets can be found here: https://github.com/NASA-PDS/validate/tree/main/src/test/resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B12.0 bug Something isn't working s.medium
Projects
None yet
Development

No branches or pull requests

4 participants