Indexer efficiency improvements and fixes for side-effects of #3006 #3077

andrew-morrison · 2023-11-13T14:02:39Z

Description

I have observed the following side-effects of #3006:

Archival objects match keywords in their parent resource when searching in the PUI. That is because the PUI indexer retrieves ancestors in order to do inheritance of selected fields, but all the fields in those ancestors are now being included in the new “fullrecord_published” index.
Indexing is slower, because each record is being scanned for fields to include in the new “fullrecord_published” and “notes_published” indexes. It only adds hundredths of a second per record, but for very large institutions that could add hours to a full re-index.

This pull request contains a possible approach to fixing these issues:

Change the extract_string_values method to extract both published and unpublished strings at the same time.
Off-load the merging of published and unpublished in the "fullrecord" and "notes" fields to Solr, using copyField (which requires the fields be changed to multi-valued.)
Do not call the build_fullrecord field twice for the PUI indexer (only once, the hook defined in the PUIIndexer class, after the ancestor records of archival objects have been deleted.)
Delete code in build_fullrecord to add finding_aid_subtitle, finding_aid_author, and agents names, which do not appear to be necessary anymore (those fields are already included "fullrecord" index.)

Related JIRA Ticket or GitHub Issue

https://archivesspace.atlassian.net/browse/ANW-261

How Has This Been Tested?

Set stored attribute on the modified fields in solr/schema.xml and re-indexed with test data containing mixture of published and unpublished notes and records. All the unpublished text was restricted to "fullrecord" and "notes", and not in "fullrecord_published" and "notes_unpublished". Also "fullrecord_published" does not get text from ancestors. Finally, the time required for a full re-index has been reduced back to comparable time as before the changes in #3006.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have read the CONTRIBUTING document.
I have authority to submit this code.
I have added tests to cover my changes.
All new and existing tests passed.

…sspace#3006

donaldjosephsmith

I took a good look at this and I think I got my head around it. The detailed description was much appreciated. Good thinking on taking advantage of the copyField and multivalued field stuff in Solr.

frontend test failures are real in this case

indexer/app/lib/indexer_common.rb

Indexer efficiency improvements and fixes for side-effects of archive…

bedbc74

…sspace#3006

donaldjosephsmith previously approved these changes Nov 14, 2023

View reviewed changes

donaldjosephsmith reviewed Nov 14, 2023

View reviewed changes

indexer/app/lib/indexer_common.rb Show resolved Hide resolved

make sure PUIIndexer is loaded for tests

b619c66

donaldjosephsmith approved these changes Nov 20, 2023

View reviewed changes

donaldjosephsmith merged commit 5a172d9 into archivesspace:master Nov 20, 2023
12 checks passed

cdibella added this to the 3.5.0 milestone Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer efficiency improvements and fixes for side-effects of #3006 #3077

Indexer efficiency improvements and fixes for side-effects of #3006 #3077

andrew-morrison commented Nov 13, 2023 •

edited

donaldjosephsmith left a comment

Indexer efficiency improvements and fixes for side-effects of #3006 #3077

Indexer efficiency improvements and fixes for side-effects of #3006 #3077

Conversation

andrew-morrison commented Nov 13, 2023 • edited

Description

Related JIRA Ticket or GitHub Issue

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

donaldjosephsmith left a comment

Choose a reason for hiding this comment

andrew-morrison commented Nov 13, 2023 •

edited