Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switchover to using registry-common library to communicate with OpenSearch Serverless Registry #895

Open
jordanpadams opened this issue May 9, 2024 · 8 comments · May be fixed by #906
Open

Comments

@jordanpadams
Copy link
Member

💡 Description

Complete switchover for referential integrity checker

⚔️ Parent Epic / Related Tickets

No response

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

Design choice - I am split or would not be asking - on config file for validate-refs. I get that the most convenient thing for the user is to let them use harvest config. The new config is more complex and controlled with a schema. Great. For validate-ref, this means it now needs to:

  1. continue sax parsing and hope harvest config does not change too much more
  2. depend on harvest as well as registry-common
  3. not use harvest config as auth input but do like all other registry-mgr tools
  4. move validate-refs out of validate and into registry-mgr since it has nothing to do with validate other than the name.
  5. move validate-refs to harvest since it has more in common with it than validate

I think 2, 4 and 5 are up for bat. 2 seems the most direct and with maven maybe we do not care one wit about all the packages that we reference. 4 offers a cleaner solution since validate-refs uses nothing in validate at all but is probably there because the same user that runs harvest and validate also wants to run validate-refs but cares nothing for tools in registry-mgr. If true about users that use harvest want to do validate-refs, then 5 seems like the best solution since they share the config and both having nothing in common with validate itself.

@tloubrieu-jpl
Copy link
Member

Hi @al-niessner ,

I don't have a final answer on that, but to feed your internal discussion, I think we would like all the tools provided to the discipline node users to be available in a consistent way. What they need is:

  • validate
  • validate-ref
  • harvest
  • registry-mgr, archive status mostly

I don't think the same users need validate, validate-refs and harvest/registry-mgr. For example SDS development guys use validate, but not harvest or registry-mgr or validate-refs.

I don't know though where that would make more sense to move validate-refs.

We also have this ticket on which we could piggy back your suggested changes: NASA-PDS/registry-loader#11. If we move forward with that ticket, we would not have to choose in between harvest or registry-mgr to move validate-ref (options 4 or 5).

Can that decision wait for Jordan to be back, next week ? Thanks

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

No rush on decision. I can easily punt for now.

I would suggest not implementing NASA-PDS/registry-loader#11. With Java anyway, smaller repos are better -- Java bloat is really bad. Maybe a registry-loader that depends on all the other repos for the user that then gives them a better/simpler install 1 to get 5, but combining the java code will get unwieldy fast.

@tloubrieu-jpl
Copy link
Member

I am realizing: should we switch to using the search API directly, since Alex fixed the pagination performance issue.

@tloubrieu-jpl
Copy link
Member

@jordanpadams @al-niessner ☝️

@al-niessner
Copy link
Contributor

@alexdunnjpl @jordanpadams @tloubrieu-jpl

Short answer: update and march forward.

Longer answer:

We all blabber about the learning curve and it is more expensive to change later than now . However, we never include the forgot curve. The real effort is the area between these two curves. It can also be shown that the forgot curve always plummets faster than the learning curve rises. By the time you eventually upgrade you will have forgotten all of the workarounds, patches, cruft, and other bad code to use the older interface that now also need updating.

If the pagination stuff is broken in SDK, then opensearch will fix it because the rest of the world will apply pressure. If they do not fix it because we are using the wrong technology (rest of world thinks pagination works well enough as is), then, well, change technologies.

Having rewritten harvest registry-common, and registry-mgr to use SDK v2, it is wonderful compared to the API direct interface. v2 does mean a complete rethink and recode of registry-api.

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

I am going to move validate-refs to registry-mgr unless there is compelling reason to make a mess of validate (the repo) with depending on opensearch-java just to get this ticket done.

@jordanpadams
Copy link
Member Author

@al-niessner I would prefer we do not do that. validate-refs depends on the registry, but it is performing an integral step in validation. I know it is messier in terms of dependencies, but it is not really a function of managing the registry or loading data into the registry. I would prefer we avoid conflating the two.

jordanpadams added a commit to NASA-PDS/registry-common that referenced this issue Sep 26, 2024
These classes are needed elsewhere and cannot be called from a sprint boot app (harvest)

refs NASA-PDS/validate#895
jordanpadams added a commit to NASA-PDS/harvest that referenced this issue Sep 26, 2024
These classes are needed elsewhere and cannot be called from a sprint boot app

refs NASA-PDS/validate#895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
3 participants