Do not re-index records in private repositories unnecessarily #3037
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is an efficiency improvement in the indexer. There is no need for the PUI Indexer to index individual archival objects (or anything else) belonging to a private repository. Currently it does, and on the production system I run that adds half an hour to a full re-index, because there is a very large private repository used for staging new records prior to publishing. Maybe that is an niche case, but the principle remains that it should just skip such repositories, not retrieve every record from the database, then individually determine that each one should not be published.
The one scenario when PUI Indexer needs to do something is the first time the PUI Indexer runs after a public repository is unpublished. That one time, it needs to delete all the "pui only" documents from Solr (the ones whose id fields end with
#pui
) belonging to that repository. But, once done, those "pui only" records will not be recreated unless/until the repository is made public again. So, I've added a routine to delete them en masse, only when the repository itself has been modified.Related JIRA Ticket or GitHub Issue
None
How Has This Been Tested?
types:pui_only
, also filtered to the new repository, returns however many archival objects you created.Deleted PUI-only documents in private repository
info message.types:pui_only
in the new repository.Types of changes
Checklist: