Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lid & vid validation, logging enhancement, fix skipped file counter #22

Closed
4 tasks done
tdddblog opened this issue Mar 30, 2020 · 4 comments
Closed
4 tasks done
Assignees
Labels
bug Something isn't working i&t.skip

Comments

@tdddblog
Copy link
Contributor

tdddblog commented Mar 30, 2020

  • Validate empty lid and vid fields.
  • Log basic info (summary, etc.) even when log verbosity is set to WARN or ERROR.
  • Skipped file count should not count files filtered out by product filter.
  • Add -stopOnError flag
@tdddblog tdddblog added the bug Something isn't working label Mar 30, 2020
@tdddblog tdddblog self-assigned this Mar 30, 2020
@tdddblog tdddblog added this to the PDS.14 (ends 2020-04-08) milestone Mar 30, 2020
@tloubrieu-jpl
Copy link
Member

Ok, tested with the -stopError made default behavior.
Registry records with empty lid/vid are not created, the harvest process stops where it finds the record causing that (e.g. in case of duplication).

@rchenatjpl
Copy link

I might be misunderstanding the point of this issue, which is that someone harvested a label without a VID or perhaps a LID? If registry-manager load-data reads a solr_docs.xml with a doc with no VID, it halts, which I think is what's desired. However, if there's no such doc but there is a doc with no LID, it will fully process solr_docs.xml. Is that bad? That's a very unlikely scenario, but the title of this issue suggests that possibility.
Archive.zip

@rchenatjpl
Copy link

Does build 11.1 want this behavior? In a directory with 1 good and 1 bad .xml file, neither gets harvested.

% registry-manager delete-registry ; rm -r /tmp/harvest ; registry-manager create-registry
Elasticsearch URL: http://localhost:9200
Deleting index registry
Deleting index registry-refs
Deleting index registry-dd
Done
Elasticsearch URL: http://localhost:9200
Creating index...
Index: registry
Schema: /Users/rchen/PDS4tools/registry-manager/elastic/registry.json
Shards: 1
Replicas: 0
Done
Creating index...
Index: registry-refs
Schema: /Users/rchen/PDS4tools/registry-manager/elastic/refs.json
Shards: 1
Replicas: 0
Done
Creating index...
Index: registry-dd
Schema: /Users/rchen/PDS4tools/registry-manager/elastic/data-dic.json
Shards: 1
Replicas: 0
Done
Loading file: /Users/rchen/PDS4tools/registry-manager/elastic/data-dic-data.jar:dd.json
Loaded 2505 document(s)
%
%
% harvest -c policy/hvt22a.cfg
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Output format: json
[SUMMARY] Reading configuration from /Users/rchen/Desktop/test/testHvt/policy/hvt22a.cfg
[WARN] Registry is not configured
[INFO] Processing directory: /Users/rchen/Desktop/test/testHvt/hvt22a
[INFO] Processing /Users/rchen/Desktop/test/testHvt/hvt22a/hvt22.xml
[ERROR] Missing '//Identification_Area/version_id'
%
%
% ls -l /tmp/harvest/out
total 0
-rw-rw-r-- 1 rchen wheel 0 May 19 01:18 refs-docs.json
-rw-rw-r-- 1 rchen wheel 0 May 19 01:18 registry-docs.json
%
%
%
% mv hvt22a/hvt22.bad.xml .
%
%
%
% harvest -c policy/hvt22a.cfg
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Output format: json
[SUMMARY] Reading configuration from /Users/rchen/Desktop/test/testHvt/policy/hvt22a.cfg
[WARN] Registry is not configured
[INFO] Processing directory: /Users/rchen/Desktop/test/testHvt/hvt22a
[INFO] Processing /Users/rchen/Desktop/test/testHvt/hvt22a/hvt22.xml
[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Processed files: 1
[SUMMARY] File counts by type:
[SUMMARY] Product_Ancillary: 1
[SUMMARY] Package ID: b6dc4914-8217-4e6a-962a-c14b1a8862a0
%
%
%
% ls -l /tmp/harvest/out
total 16
-rw-rw-r-- 1 rchen wheel 31 May 19 01:19 fields.txt
-rw-rw-r-- 1 rchen wheel 0 May 19 01:19 refs-docs.json
-rw-rw-r-- 1 rchen wheel 495 May 19 01:19 registry-docs.json
Archive.zip

@jordanpadams
Copy link
Member

@rchenatjpl ignoring this ticket. it should not have been in the RDD. tagged as untestable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working i&t.skip
Projects
None yet
Development

No branches or pull requests

5 participants