Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration HttpSolrClient to Http2SolrClient and ConcurrentUpdateHttp2SolrClient #10241

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

jeromeroucou
Copy link
Contributor

@jeromeroucou jeromeroucou commented Jan 18, 2024

What this PR does / why we need it:

This PR replace deprecated HttpSolrClient with 2 new implementations :

  • Http2SolrClient to make queries to Solr
  • ConcurrentUpdateHttp2SolrClient to perform index to Solr

Which issue(s) this PR closes:

Closes #10161

Special notes for your reviewer:

I've been careful to centralize code into an abstract class, and delete local client on IndexServiceBean to use instead the 2 new service client (one for queries, one for indexation)

Suggestions on how to test this:

I run some local tests on my laptop : create collection, publish dataset with files, search on index page, harvesting other repository, call clear-orphan API. But I didn't index a large collection to measure if performance had improved.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

N/A

Is there a release notes update needed for this change?:

If indexation performance is increase, I think it would be nice to mention it.

Additional documentation:

Configuration documentation has been updated with the new feature-flag to enable it.

jeromeroucou and others added 2 commits January 10, 2024 17:35
with refactoring in order to use one service to query solr, and another service to manage index
@coveralls
Copy link

coveralls commented Jan 18, 2024

Coverage Status

coverage: 20.867% (-0.008%) from 20.875%
when pulling 728efff on Recherche-Data-Gouv:10161-Http2SolrClient
into c44ad65 on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented Jan 18, 2024

@jeromeroucou - thanks for the PR! A ~quick question - have you run this extensively?

I ask because I tried to make a simple switch from HttpSolrClient to Http2SolrClient in the SolrClientService class at QDR and started seeing errors w.r.t. the client being closed that started after some delay (~days). It's possible that there were restarts of solr going on at QDR so I just reverted and was going to check again at some point after the solr changes were done, but I haven't yet.

If the new Http2SolrClient does somehow timeout (or perhaps it is just worse at handling a solr restart?), we'd want to address that (e.g. with reinitialize). If you haven't tried running this branch for days, I may try again at QDR and/or will make sure that we test a longer duration in QA.

@jeromeroucou
Copy link
Contributor Author

Hi @qqmyers, thanks for this feedback.

No, I don't run Http2SolrClient some days, but I'll try to do it on my side too. I'll indicate the result.

@pdurbin
Copy link
Member

pdurbin commented Feb 28, 2024

@jeromeroucou hi! Any results to share? Thanks!

@pdurbin pdurbin added the Size: 10 A percentage of a sprint. 7 hours. label Feb 28, 2024
@jeromeroucou
Copy link
Contributor Author

Unfortunately no, our deployment environment is still broken. We're still working on restoring it. I hope to be able to give you an update within 2 weeks.

@poikilotherm
Copy link
Contributor

General remark on the PR: the HTTP2 classes are still flagged experimental, see https://solr.apache.org/guide/solr/latest/deployment-guide/solrj.html#types-of-solrclients. Do we really want to do this? Should we potentially use a feature-flag, so it keeps the experimental nature also in Dataverse?

@jeromeroucou
Copy link
Contributor Author

Hi @poikilotherm,

I think the documentation you're referring to specifies that the client is in progress (I understand that it's not out of the question for a method signature to be modified for exemple). However, Solr Javadoc clearly indicates HttpSolrClient is deprecated and Http2SolrClient should be used instead.

But your proposition to use a feature-flag may be a good thing until HttpSolrClient is really removed.

@qqmyers qqmyers mentioned this pull request May 13, 2024
@pdurbin pdurbin added the Type: Feature a feature request label Oct 9, 2024
@jeromeroucou jeromeroucou marked this pull request as draft October 10, 2024 19:24
@jeromeroucou jeromeroucou marked this pull request as ready for review October 25, 2024 16:31
@jeromeroucou
Copy link
Contributor Author

jeromeroucou commented Oct 30, 2024

The PR is finally ready for review again!

I've taken @poikilotherm feedback into account by using a feature flag.

Regarding @qqmyers comments, we deployed this branch in an environment that remained switched on for over a week, without observing any connection interruption. However, not really knowing how to check, we may not have noticed the errors?

@pdurbin pdurbin added Type: Bug a defect and removed Type: Feature a feature request labels Oct 31, 2024
@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2024

Thanks @jeromeroucou.

I had mischaracterized this PR as a feature but it's really more of a bug fix (stopping the use of deprecated code) so the team will talk about it at our weekly review of "ready for triage" on Tuesday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect
Projects
Status: Ready for Review ⏩
Status: 🚧 Dev by Recherche Data Gouv
Development

Successfully merging this pull request may close these issues.

Dataverse is using deprecated org.apache.solr.client.solrj.impl.HttpSolrClient;
7 participants