Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not re-index records in private repositories unnecessarily #3037

Conversation

andrew-morrison
Copy link
Contributor

Description

This is an efficiency improvement in the indexer. There is no need for the PUI Indexer to index individual archival objects (or anything else) belonging to a private repository. Currently it does, and on the production system I run that adds half an hour to a full re-index, because there is a very large private repository used for staging new records prior to publishing. Maybe that is an niche case, but the principle remains that it should just skip such repositories, not retrieve every record from the database, then individually determine that each one should not be published.

The one scenario when PUI Indexer needs to do something is the first time the PUI Indexer runs after a public repository is unpublished. That one time, it needs to delete all the "pui only" documents from Solr (the ones whose id fields end with #pui) belonging to that repository. But, once done, those "pui only" records will not be recreated unless/until the repository is made public again. So, I've added a routine to delete them en masse, only when the repository itself has been modified.

Related JIRA Ticket or GitHub Issue

None

How Has This Been Tested?

  1. Create a new repository, with its Publish? checkbox selected.
  2. Switch to that repository and create a resource and some archival objects (e.g. by importing an EAD file.)
  3. Check that they appear in the public interface.
  4. Confirm in Solr that a search with a filter of types:pui_only, also filtered to the new repository, returns however many archival objects you created.
  5. Edit the repository, deselecting its Publish? checkbox, and click Save.
  6. Watch the indexer log, waiting for a Deleted PUI-only documents in private repository info message.
  7. Check that the repository and its records have disappeared from the public interface.
  8. Confirm in Solr that there are no documents returned for types:pui_only in the new repository.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have read the CONTRIBUTING document.
  • I have authority to submit this code.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@cdibella cdibella added the community code contributed by community members not on or contracted by the ArchivesSpace program team label Aug 28, 2023
andrew-morrison and others added 2 commits November 21, 2023 10:12
Accepting suggestion by @donaldjosephsmith

Co-authored-by: Donald Smith <dsmith@alum.rit.edu>
Copy link
Collaborator

@donaldjosephsmith donaldjosephsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@donaldjosephsmith donaldjosephsmith merged commit 5586449 into archivesspace:master Nov 21, 2023
8 checks passed
@cdibella cdibella added this to the 3.5.0 milestone Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community code contributed by community members not on or contracted by the ArchivesSpace program team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants