-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semaphore for async indexing and sync index in transaction after publish #10388
Merged
Merged
Changes from 4 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
ff16a49
semaphore for async indexing and sync index in transaction after publish
ErykKul 0002008
fixed too short underline in config doc
ErykKul 714f9f6
fixed new property name scope
ErykKul 1a3783a
added short release note
ErykKul b42983c
feat(index): add timing metrics to measure indexing load
poikilotherm 77789db
reverted back to async index after publish, but now with em.flush() a…
ErykKul b830c82
Merge pull request #10 from poikilotherm/10381-timers
ErykKul a631444
Merge branch 'IQSS:develop' into 10381_index_after_publish
ErykKul 8945ec8
moved indexing after last dataset merge to assure indextime is not ov…
ErykKul 7f7fd89
Merge branch '10381_index_after_publish' of https://github.com/ErykKu…
ErykKul 053ebdb
metric fix
ErykKul 797bc38
moved indexed time by 3 hours to prevent false negatives in isIndexed…
ErykKul 0769ee9
nullpointer fix
ErykKul 1479403
quick info on the new metrics added for indexing
ErykKul File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
New release adds a new microprofile setting for maximum number of simultaneously running asynchronous dataset index operations that defaults to ``4``: | ||
|
||
dataverse.solr.concurrency.max-async-indexes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand what's going on here. I understand why we may want to change indexing from asynchronous to synchronous. But why are we moving the indexing from
onSuccess()
and back into theexecute()
method?Is this just to address the original issue with timestamps (#10381)?
Are we positive we want to do this? - If I understand correctly, this would bring us back to a failure to index resulting in a failure to publish. Which I thought we specifically wanted to avoid.
I may easily be missing something of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@landreev @scolapasta I do not fully understand the issue myself. Moving the indexing back to execute makes the indexing to happen earlier, not later. Also making it synchronous instead of async makes it also happen earlier, not later. If the issue is that the index did happen, but its timestamp is before the publish moment, this PR would make the problem worse, not better. I just do not see yet why the publish date would be greater than the index moment, unless the index was interrupted (e.g., by a circuit breaker), or a wrong version of dataset was indexed (e.g., the draft and not the published version). I will go again through the code to see if I am missing something, and maybe wrong version gets indexed, etc. As its turns out, we just migrated to the version 6.1 (from 5.14) and recently published dataset has the same problem. I think that the core cause is still not addressed. I will keep digging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@landreev @scolapasta I did not see any errors on solr, no failure log in payara, etc. Also, the index looks fine, with the right version etc. I assume that indexing went fine after publish, only the "indextime" was not updated on the DB. I reverted to async update in finalize after publish, and added
em.flush()
after theindextime
update, which should fix the problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if
em.flush
changes anything, from what I understand the change should have been written anyway. I think I am still missing something. In the example on our server, the publishing took 20+ hours due to the checksum checks on 20+ TiB. I got e-mail that the dataset was published the next day after publishing started. From the code, the index should happen directly after sending the notifications. I would assume that it either succeeded, but theindextime
did not update, or it failed, but then I should see it in theprocess-failures
log, which is empty.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think that index was done, the correct
indextime
was written, but then it got overwritten by themerge
inI moved the indexing to the bottom, I am more confident now that it is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that it is safe to merge the current code of this PR and leave it like this until we see how it will behave in production. I saved the link to this issue as it contains some interesting discussion, especially on running batches and metrics. We can revisit these ideas at some point. As for adding the sleep calls, I am against that since this would makes the queue holding indexing jobs longer than necessary. I like the idea of adding the index logic to the timer for fixing exports, but I agree it is out of scope for this PR. We will pick it up later, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just asked a question in dv-tech slack about simplifying the whole scheme somewhat radically. (but probably not in the context of this PR either).
I'm generally in favor of merging this PR as is (just want to run a couple of more experiments with multiple indexing attempts in parallel).
I would consider however including one small extra change - adding a few extra hours to the index timestamp in the test in the
isIndexedVersion()
method inDatasetPage.java
:i.e., something like
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ik have looked at the differences, but nothing stroke me as suspicious.
In solrconfig.xml some requesthandlers are disabled, but I do not think that these were used anyway. Those are the /browse, /spell, /update/extract, /tvhr and /elevate endpoints.
In schema.xml I did find it strange that the schema version in 6.1 is lower than that of 5.14. But I do not know what the impact of that version number is. Could be just informative. The other differences are replacing the now obsolete solr.TrieXXXField with solr.XXXPointField as one would expect for Solr 9. There are also some changes to the definitions of dynamic field definitions. AFAIK Dataverse does not use dynamic field types in Solr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be the move from Solr 8 to 9 itself that changed something dramatically?
From a brief look, I'm not seeing any information online suggesting 9 is dramatically slower than 8 in general. ... But could be something Dataverse specifically relies on when indexing that is behaving differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the indexed time in the
isIndexedVersion
test by 3 hours as suggested. This should fix the problem for already published datasets withindexedtime
beforereleasetime
.