Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANW-261 Prevent Public Search Queries Matching on Unpublished Data #3006

Conversation

quoideneuf
Copy link
Collaborator

@quoideneuf quoideneuf commented May 25, 2023

Addresses the problem of searchable non-published data by adding alternate fields notes_published and fullrecord_published to the solr schema. Adds logic to the backend so that search requests coming from the public user are marked :protect_published, which results in a Solr::Query instance producing a url with the alternate fields. Unfortunately, it gets a bit messy due to 1) the way keyword type searches are handled (see #2824). Since they are not prefixed in the query string, the _published suffix needs to be added to the qf parameter (which is ignored for search clauses with prefixed fields); 2) the way the Solr::Query object is instantiated after the advanced_query json object is parsed into a solr query string. Perhaps that parsing could be delayed until the entire query is exported to a url? Then the :protect_published option wouldn't need to be passed around so much.

Regarding 1) - perhaps PR 2824 was a bad idea. Could the same results be achieved by reverting keyword to map to fullrecord and using boost queries rather than qf to ensure results with identifier and title matches are at the top of the results list? Would also require some rework of the AdvancedSearch field mapping definitions to affect bq param in Solr::Query.

Note on Indexer Tests: I reworked the way the factories and JSONModel client get set up in the spec helper - maybe this could simplify the setup in other test workflows. The indexer tests run without any supporting backend thanks to the VCR gem. The workflow for a developer editing the tests (unless they don't need to change the existing fixtures) would be something like:

rm indexer/spec/cassettes/indexer_spec.yml
./build/run backend:devserver
ASPACE_TEST_BACKEND_URL=http://localhost:4567 ./build/run indexer:test
git add indexer/spec/cassettes/indexer_spec.yml

This workflow should also work fine with the docker test setup in build.xml.

@quoideneuf quoideneuf force-pushed the ANW-261-dont-search-unpublised-notes branch from 921447f to 0d4383e Compare May 25, 2023 19:20
@quoideneuf quoideneuf changed the title protect unpublished index data from public user ANW-261 Prevent Public Search Queries Matching on Unpublished Data May 25, 2023
@quoideneuf quoideneuf force-pushed the ANW-261-dont-search-unpublised-notes branch from 0d4383e to 029601d Compare May 26, 2023 10:58
@quoideneuf quoideneuf force-pushed the ANW-261-dont-search-unpublised-notes branch from 029601d to 9220d2b Compare May 26, 2023 12:38
@quoideneuf quoideneuf marked this pull request as ready for review May 26, 2023 13:48
Running this new indexer test configuration locally was ending up with a
"Connection to backend failed" error, which ended up being due to
ENV['ASPACE_TEST_BACKEND_URL'] being nil. This copies a pattern from
other helpers that just assigns a localhost address if that is the case.
@donaldjosephsmith donaldjosephsmith merged commit f8adbc9 into archivesspace:master Nov 3, 2023
12 checks passed
andrew-morrison added a commit to andrew-morrison/archivesspace that referenced this pull request Nov 13, 2023
donaldjosephsmith added a commit that referenced this pull request Nov 20, 2023
Indexer efficiency improvements and fixes for side-effects of #3006
@cdibella cdibella added this to the 3.5.0 milestone Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants