Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor harvest to operate with new multi-tenant, serverless OpenSearch architecture #146

Merged
merged 19 commits into from
Jan 24, 2024

Conversation

al-niessner
Copy link
Contributor

🗒️ Summary

Brief summary of changes if not sufficiently described by commit messages.

⚙️ Test Data and/or Report

One of the following should be included here:

  • Reference to regression test included in code (preferred wherever reasonable)
  • Attach test data here + outputs of tests

♻️ Related Issues

Closes #118

@al-niessner al-niessner requested a review from a team as a code owner January 11, 2024 20:26
@al-niessner al-niessner marked this pull request as draft January 11, 2024 20:26
@al-niessner al-niessner self-assigned this Jan 11, 2024
@al-niessner
Copy link
Contributor Author

@alexdunnjpl @jordanpadams @nutjob4life @tloubrieu-jpl

Schema has been implemented and examples updated. Would like to review because older "legacy" harvest config files cannot be used without minor changes. Should not be a problem because URL is not necessary for non-testing harvest configs. Can review be done during today's breakout? It would be good to keep going with this but having to change tag names or something after this will become harder and harder.

@al-niessner
Copy link
Contributor Author

Current state of schema and tests are:

$ xmllint --noout --schema configuration.xsd examples/bundles.xml examples/directories.xml examples/files.xml examples/xpaths.xml 
examples/bundles.xml:20: element harvest: Schemas validity error : Element 'harvest', attribute 'nodeName': [facet 'enumeration'] The value 'CHANGE_ME' is not an element of the set {'PDS_ATM', 'PDS_ENG', 'PDS_GEO', 'PDS_IMG', 'PDS_NAIF', 'PDS_PPI', 'PDS_RMS', 'PDS_SBN', 'PSA', 'JAXA', 'ROSCOSMOS'}.
examples/bundles.xml fails to validate
examples/directories.xml:17: element harvest: Schemas validity error : Element 'harvest', attribute 'nodeName': [facet 'enumeration'] The value 'CHANGE_ME' is not an element of the set {'PDS_ATM', 'PDS_ENG', 'PDS_GEO', 'PDS_IMG', 'PDS_NAIF', 'PDS_PPI', 'PDS_RMS', 'PDS_SBN', 'PSA', 'JAXA', 'ROSCOSMOS'}.
examples/directories.xml fails to validate
examples/files.xml:17: element harvest: Schemas validity error : Element 'harvest', attribute 'nodeName': [facet 'enumeration'] The value 'CHANGE_ME' is not an element of the set {'PDS_ATM', 'PDS_ENG', 'PDS_GEO', 'PDS_IMG', 'PDS_NAIF', 'PDS_PPI', 'PDS_RMS', 'PDS_SBN', 'PSA', 'JAXA', 'ROSCOSMOS'}.
examples/files.xml fails to validate
examples/xpaths.xml:3: element harvest: Schemas validity error : Element 'harvest', attribute 'nodeName': [facet 'enumeration'] The value 'CHANGE_ME' is not an element of the set {'PDS_ATM', 'PDS_ENG', 'PDS_GEO', 'PDS_IMG', 'PDS_NAIF', 'PDS_PPI', 'PDS_RMS', 'PDS_SBN', 'PSA', 'JAXA', 'ROSCOSMOS'}.
examples/xpaths.xml fails to validate

Left the name broken in each just to show that xmllint is working as expected.

@jordanpadams jordanpadams changed the title 118: move to multitenancy in the cloud Refactor harvest to operate with new multi-tenant, serverless OpenSearch architecture Jan 12, 2024
@jordanpadams
Copy link
Member

@al-niessner @tloubrieu-jpl was this review completed at the breakout yesterday?

@al-niessner
Copy link
Contributor Author

@al-niessner @tloubrieu-jpl was this review completed at the breakout yesterday?

@jordanpadams No

Al Niessner added 5 commits January 18, 2024 15:09
These tests required files that are no longer around - as in /tmp. Also removed the dead code from remaining tests.
Copy link
Member

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@al-niessner a few comments. thoughts?

src/main/resources/conf/configuration.xsd Outdated Show resolved Hide resolved
src/main/resources/conf/configuration.xsd Outdated Show resolved Hide resolved
src/main/resources/conf/configuration.xsd Show resolved Hide resolved
<xs:all>
<xs:element minOccurs="0"
name="autogenFields" type="autogen_fields_type"/>
<xs:element name="do" type="do_type"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@al-niessner Do we really need this? I would prefer to minimize the number of backwards incompatible changes if we can?

But if you would really prefer we encapsulte this, can we change this to data_config or load_config or just data or load?

src/main/resources/conf/examples/bundles.xml Outdated Show resolved Hide resolved
@al-niessner
Copy link
Contributor Author

@jordanpadams

Changed some the items and suggest do -> load or maybe ingest. Changed direct_url to server_url since that is really what it is and you like self documenting names. Now cognito_client_id for same reason. Take a gander again and I will update code again. I need to run the local test to make sure I have not broken anything but I do not think so.

@al-niessner
Copy link
Contributor Author

At this point (52aec27) can use harvest to load bundle from #143 with what seem to be the same results:

[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Loaded files: 11139
[SUMMARY]   Product_Bundle: 1
[SUMMARY]   Product_Collection: 6
[SUMMARY]   Product_Document: 5
[SUMMARY]   Product_Observational: 11127
[SUMMARY] Failed files: 0
[SUMMARY] Package ID: 45cd1442-256d-4c80-9dcf-c54f2191f749

Next is to start looking at modifying registry-common to have better registry connection interface than a public bean with no setter/getter that will allow for polymorphism to help when there are different ways to contact different services like coginito or direct - the cause of the original issue.

@jordanpadams jordanpadams marked this pull request as ready for review January 24, 2024 17:57
src/test/java/tt/TestZip.java Show resolved Hide resolved
@jordanpadams
Copy link
Member

@al-niessner note we needed to update the .secrets.baseline for the secrets detection to run successfully. You may want to add the pre-commit to your local git config so this can be caught and updated locally in the future.

The pre-commit and info about Detect Secrets are being added to Harvest README here: #150

But that pretty much links here: https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Git-and-Github-Guide#detect-secrets

@jordanpadams jordanpadams merged commit 2ec805b into main Jan 24, 2024
2 checks passed
@jordanpadams jordanpadams deleted the issue_118 branch January 24, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to utilize new multi-tenancy approach
3 participants