Conversation
❌ 1 blocking issue (1 total)
|
Coverage Report for CI Build 24797199778Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.003%) to 98.355%Details
Uncovered ChangesNo uncovered changes found. Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
Why these changes are being introduced: We need a means to toggle the new semantic search query mode in the UI. Relevant ticket(s): - [USE-493](https://mitlibraries.atlassian.net/browse/USE-493) How this addresses that need: This adds a feature to toggle semantic search on and off. Lexical search remains the default. Side effects of this change: - This feature is explicitly disabled for geospatial queries. If we want semantic search for GeoData in the future, we will need to revisit the code. - There is no query param exposing this feature, so it is not currently possible to toggle on a per-query basis.
There was a problem hiding this comment.
Pull request overview
Adds a feature-flagged semantic search mode for TIMDEX queries by switching the GraphQL query document and (when enabled) supplying queryMode: semantic, while keeping lexical search as the default behavior.
Changes:
- Add
FEATURE_TIMDEX_SEMANTIC_SEARCHflag and documentation. - Extend the TIMDEX GraphQL schema/client queries to support
queryMode. - Add/adjust Minitest coverage around query building and controller behavior.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test_output.log | Adds a captured test run/coverage log output. |
| test/models/query_builder_test.rb | Tests for default lexical behavior and semantic mode when flag enabled. |
| test/controllers/search_controller_test.rb | Adds tests intended to cover BaseQuery vs SemanticBaseQuery selection. |
| config/schema/schema.json | Adds queryMode to the GraphQL schema JSON used by graphql-client. |
| app/models/timdex_search.rb | Introduces SemanticBaseQuery GraphQL document including queryMode. |
| app/models/query_builder.rb | Sets queryMode to semantic when feature flag is enabled. |
| app/models/feature.rb | Registers timdex_semantic_search as a valid feature flag. |
| app/controllers/search_controller.rb | Switches to selecting query document by mode; attempts to disable semantic mode for geospatial flows. |
| README.md | Documents the new FEATURE_TIMDEX_SEMANTIC_SEARCH env var. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL | ||
| query( | ||
| $q: String | ||
| $citation: String | ||
| $contributors: String | ||
| $fundingInformation: String | ||
| $identifiers: String | ||
| $locations: String | ||
| $subjects: String | ||
| $title: String | ||
| $index: String | ||
| $from: String | ||
| $booleanType: String | ||
| $queryMode: String | ||
| $fulltext: Boolean | ||
| $perPage: Int | ||
| $accessToFilesFilter: [String!] | ||
| $contentTypeFilter: [String!] | ||
| $contributorsFilter: [String!] | ||
| $formatFilter: [String!] | ||
| $languagesFilter: [String!] | ||
| $literaryFormFilter: String | ||
| $placesFilter: [String!] | ||
| $sourceFilter: [String!] | ||
| $subjectsFilter: [String!] | ||
| ) { | ||
| search( | ||
| searchterm: $q | ||
| citation: $citation | ||
| contributors: $contributors | ||
| fundingInformation: $fundingInformation | ||
| identifiers: $identifiers | ||
| locations: $locations | ||
| subjects: $subjects | ||
| title: $title | ||
| index: $index | ||
| from: $from | ||
| booleanType: $booleanType | ||
| queryMode: $queryMode | ||
| fulltext: $fulltext | ||
| perPage: $perPage | ||
| accessToFilesFilter: $accessToFilesFilter | ||
| contentTypeFilter: $contentTypeFilter | ||
| contributorsFilter: $contributorsFilter | ||
| formatFilter: $formatFilter | ||
| languagesFilter: $languagesFilter | ||
| literaryFormFilter: $literaryFormFilter | ||
| placesFilter: $placesFilter | ||
| sourceFilter: $sourceFilter | ||
| subjectsFilter: $subjectsFilter | ||
| ) { | ||
| hits | ||
| records { | ||
| timdexRecordId | ||
| identifiers { | ||
| kind | ||
| value | ||
| } | ||
| title | ||
| source | ||
| contentType | ||
| contributors { | ||
| kind | ||
| value | ||
| } | ||
| publicationInformation | ||
| dates { | ||
| kind | ||
| value | ||
| range { | ||
| gte | ||
| lte | ||
| } | ||
| } | ||
| links { | ||
| kind | ||
| restrictions | ||
| text | ||
| url | ||
| } | ||
| notes { | ||
| kind | ||
| value | ||
| } | ||
| highlight { | ||
| matchedField | ||
| matchedPhrases | ||
| } | ||
| provider | ||
| rights { | ||
| kind | ||
| description | ||
| uri | ||
| } | ||
| sourceLink | ||
| summary | ||
| subjects { | ||
| kind | ||
| value | ||
| } | ||
| citation | ||
| } | ||
| aggregations { | ||
| accessToFiles { | ||
| key | ||
| docCount | ||
| } | ||
| contentType { | ||
| key | ||
| docCount | ||
| } | ||
| contributors { | ||
| key | ||
| docCount | ||
| } | ||
| format { | ||
| key | ||
| docCount | ||
| } | ||
| languages { | ||
| key | ||
| docCount | ||
| } | ||
| literaryForm { | ||
| key | ||
| docCount | ||
| } | ||
| places { | ||
| key | ||
| docCount | ||
| } | ||
| source { | ||
| key | ||
| docCount | ||
| } | ||
| subjects { | ||
| key | ||
| docCount | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
SemanticBaseQuery duplicates the full BaseQuery document with only a small argument difference. This duplication increases the risk of future drift if fields/filters are added to one query but not the other. Consider factoring out the shared selection set / argument list into a shared template (or generating both query documents from a single source) so updates stay in sync.
| SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL | |
| query( | |
| $q: String | |
| $citation: String | |
| $contributors: String | |
| $fundingInformation: String | |
| $identifiers: String | |
| $locations: String | |
| $subjects: String | |
| $title: String | |
| $index: String | |
| $from: String | |
| $booleanType: String | |
| $queryMode: String | |
| $fulltext: Boolean | |
| $perPage: Int | |
| $accessToFilesFilter: [String!] | |
| $contentTypeFilter: [String!] | |
| $contributorsFilter: [String!] | |
| $formatFilter: [String!] | |
| $languagesFilter: [String!] | |
| $literaryFormFilter: String | |
| $placesFilter: [String!] | |
| $sourceFilter: [String!] | |
| $subjectsFilter: [String!] | |
| ) { | |
| search( | |
| searchterm: $q | |
| citation: $citation | |
| contributors: $contributors | |
| fundingInformation: $fundingInformation | |
| identifiers: $identifiers | |
| locations: $locations | |
| subjects: $subjects | |
| title: $title | |
| index: $index | |
| from: $from | |
| booleanType: $booleanType | |
| queryMode: $queryMode | |
| fulltext: $fulltext | |
| perPage: $perPage | |
| accessToFilesFilter: $accessToFilesFilter | |
| contentTypeFilter: $contentTypeFilter | |
| contributorsFilter: $contributorsFilter | |
| formatFilter: $formatFilter | |
| languagesFilter: $languagesFilter | |
| literaryFormFilter: $literaryFormFilter | |
| placesFilter: $placesFilter | |
| sourceFilter: $sourceFilter | |
| subjectsFilter: $subjectsFilter | |
| ) { | |
| hits | |
| records { | |
| timdexRecordId | |
| identifiers { | |
| kind | |
| value | |
| } | |
| title | |
| source | |
| contentType | |
| contributors { | |
| kind | |
| value | |
| } | |
| publicationInformation | |
| dates { | |
| kind | |
| value | |
| range { | |
| gte | |
| lte | |
| } | |
| } | |
| links { | |
| kind | |
| restrictions | |
| text | |
| url | |
| } | |
| notes { | |
| kind | |
| value | |
| } | |
| highlight { | |
| matchedField | |
| matchedPhrases | |
| } | |
| provider | |
| rights { | |
| kind | |
| description | |
| uri | |
| } | |
| sourceLink | |
| summary | |
| subjects { | |
| kind | |
| value | |
| } | |
| citation | |
| } | |
| aggregations { | |
| accessToFiles { | |
| key | |
| docCount | |
| } | |
| contentType { | |
| key | |
| docCount | |
| } | |
| contributors { | |
| key | |
| docCount | |
| } | |
| format { | |
| key | |
| docCount | |
| } | |
| languages { | |
| key | |
| docCount | |
| } | |
| literaryForm { | |
| key | |
| docCount | |
| } | |
| places { | |
| key | |
| docCount | |
| } | |
| source { | |
| key | |
| docCount | |
| } | |
| subjects { | |
| key | |
| docCount | |
| } | |
| } | |
| } | |
| } | |
| SemanticBaseQuery = BaseQuery |
There was a problem hiding this comment.
These feels worth logging into a deferred maintenance log for this app. This whole model is basically duplication with minor changes that we might have a better solution for now than we did when we wrote it.
There was a problem hiding this comment.
Looking at this again, do we actually need the SemanticBaseQuery or can we just include $queryMode: String in all queries and only set a value in semantic searches? I suspect this will require cassettes to be regenerated but it feels better unless I am missing something.
There was a problem hiding this comment.
Oh yeah, that's much cleaner. Good call!
| def execute_geospatial_query(query) | ||
| query = query.except('queryMode') | ||
|
|
||
| if query['geobox'] == 'true' && query[:geodistance] == 'true' |
There was a problem hiding this comment.
execute_geospatial_query checks query[:geodistance], but QueryBuilder populates the geodistance flag under the string key (query['geodistance']). This makes the combined geobox+geodistance branch unreachable and will route those requests to the wrong GraphQL query.
| if query['geobox'] == 'true' && query[:geodistance] == 'true' | |
| if query['geobox'] == 'true' && query['geodistance'] == 'true' |
There was a problem hiding this comment.
Huh. This is concerning.
There was a problem hiding this comment.
Yeah this feels worth confirming the code works as expected. We can do that under another ticket as this change set didn't introduce the potential problem.
There was a problem hiding this comment.
Yeah, it's not introduced here. I think we just never noticed it.
There was a problem hiding this comment.
| def execute_geospatial_query(query) | ||
| query = query.except('queryMode') | ||
|
|
||
| if query['geobox'] == 'true' && query[:geodistance] == 'true' | ||
| TimdexBase::Client.query(TimdexSearch::AllQuery, variables: query) | ||
| elsif query['geobox'] == 'true' | ||
| TimdexBase::Client.query(TimdexSearch::GeoboxQuery, variables: query) | ||
| elsif query['geodistance'] == 'true' | ||
| TimdexBase::Client.query(TimdexSearch::GeodistanceQuery, variables: query) | ||
| else | ||
| TimdexBase::Client.query(TimdexSearch::BaseQuery, variables: query) | ||
| TimdexBase::Client.query(base_query_for_mode, variables: query) | ||
| end |
There was a problem hiding this comment.
execute_geospatial_query removes queryMode unconditionally whenever the :geodata feature is enabled. That disables semantic search even for non-geospatial searches in geodata mode, and can also result in SemanticBaseQuery being used with $queryMode unset (passed as null), which bypasses the schema default ("lexical") because the argument is still present in the query document. Consider only stripping queryMode for actual geospatial queries (geobox/geodistance), and/or forcing BaseQuery for geospatial execution paths.
There was a problem hiding this comment.
This feels worth logging a ticket or a technical debt log.
There was a problem hiding this comment.
I'm a bit unsure where to ticket it. Is GeoData in scope of USE? Or would this go in the EngX kanban board?
There was a problem hiding this comment.
It's probably best in TIMX or in the maintenance log (which we might not have for TIMDEX UI yet)
There was a problem hiding this comment.
| test 'uses BaseQuery when semantic search feature is disabled' do | ||
| # When the feature flag is not enabled, base_query_for_mode returns BaseQuery (default tab is 'all') | ||
| mock_primo_search_all_tab | ||
| mock_timdex_search_all_tab | ||
|
|
||
| get '/results?q=test' | ||
|
|
||
| assert_response :success | ||
| end | ||
|
|
||
| test 'uses SemanticBaseQuery when semantic search feature is enabled' do | ||
| # When the feature flag is enabled, base_query_for_mode returns SemanticBaseQuery (default tab is 'all') | ||
| ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do | ||
| mock_primo_search_all_tab | ||
| mock_timdex_search_all_tab | ||
|
|
||
| get '/results?q=test' | ||
|
|
||
| assert_response :success | ||
| end |
There was a problem hiding this comment.
These tests are named as though they verify which GraphQL query document is used, but they only assert :success and the TIMDEX client mock does not assert the query constant or variables. This means the tests won’t fail if base_query_for_mode is wired incorrectly. Consider tightening the expectation to assert TimdexBase::Client.query is invoked with TimdexSearch::BaseQuery vs TimdexSearch::SemanticBaseQuery (and that variables include/omit queryMode as appropriate).
There was a problem hiding this comment.
I concur. Can you rework these tests to ensure we check the different paths are called?
There was a problem hiding this comment.
I ended up dropping those tests because I think it's adequately covered in the Query Builder test. A controller test is likely warranted if/when we implement queryMode as a param. Let me know if you feel differently.
| SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL | ||
| query( | ||
| $q: String | ||
| $citation: String | ||
| $contributors: String | ||
| $fundingInformation: String | ||
| $identifiers: String | ||
| $locations: String | ||
| $subjects: String | ||
| $title: String | ||
| $index: String | ||
| $from: String | ||
| $booleanType: String | ||
| $queryMode: String | ||
| $fulltext: Boolean | ||
| $perPage: Int | ||
| $accessToFilesFilter: [String!] | ||
| $contentTypeFilter: [String!] | ||
| $contributorsFilter: [String!] | ||
| $formatFilter: [String!] | ||
| $languagesFilter: [String!] | ||
| $literaryFormFilter: String | ||
| $placesFilter: [String!] | ||
| $sourceFilter: [String!] | ||
| $subjectsFilter: [String!] | ||
| ) { | ||
| search( | ||
| searchterm: $q | ||
| citation: $citation | ||
| contributors: $contributors | ||
| fundingInformation: $fundingInformation | ||
| identifiers: $identifiers | ||
| locations: $locations | ||
| subjects: $subjects | ||
| title: $title | ||
| index: $index | ||
| from: $from | ||
| booleanType: $booleanType | ||
| queryMode: $queryMode | ||
| fulltext: $fulltext | ||
| perPage: $perPage | ||
| accessToFilesFilter: $accessToFilesFilter | ||
| contentTypeFilter: $contentTypeFilter | ||
| contributorsFilter: $contributorsFilter | ||
| formatFilter: $formatFilter | ||
| languagesFilter: $languagesFilter | ||
| literaryFormFilter: $literaryFormFilter | ||
| placesFilter: $placesFilter | ||
| sourceFilter: $sourceFilter | ||
| subjectsFilter: $subjectsFilter | ||
| ) { | ||
| hits | ||
| records { | ||
| timdexRecordId | ||
| identifiers { | ||
| kind | ||
| value | ||
| } | ||
| title | ||
| source | ||
| contentType | ||
| contributors { | ||
| kind | ||
| value | ||
| } | ||
| publicationInformation | ||
| dates { | ||
| kind | ||
| value | ||
| range { | ||
| gte | ||
| lte | ||
| } | ||
| } | ||
| links { | ||
| kind | ||
| restrictions | ||
| text | ||
| url | ||
| } | ||
| notes { | ||
| kind | ||
| value | ||
| } | ||
| highlight { | ||
| matchedField | ||
| matchedPhrases | ||
| } | ||
| provider | ||
| rights { | ||
| kind | ||
| description | ||
| uri | ||
| } | ||
| sourceLink | ||
| summary | ||
| subjects { | ||
| kind | ||
| value | ||
| } | ||
| citation | ||
| } | ||
| aggregations { | ||
| accessToFiles { | ||
| key | ||
| docCount | ||
| } | ||
| contentType { | ||
| key | ||
| docCount | ||
| } | ||
| contributors { | ||
| key | ||
| docCount | ||
| } | ||
| format { | ||
| key | ||
| docCount | ||
| } | ||
| languages { | ||
| key | ||
| docCount | ||
| } | ||
| literaryForm { | ||
| key | ||
| docCount | ||
| } | ||
| places { | ||
| key | ||
| docCount | ||
| } | ||
| source { | ||
| key | ||
| docCount | ||
| } | ||
| subjects { | ||
| key | ||
| docCount | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Looking at this again, do we actually need the SemanticBaseQuery or can we just include $queryMode: String in all queries and only set a value in semantic searches? I suspect this will require cassettes to be regenerated but it feels better unless I am missing something.
| test 'uses BaseQuery when semantic search feature is disabled' do | ||
| # When the feature flag is not enabled, base_query_for_mode returns BaseQuery (default tab is 'all') | ||
| mock_primo_search_all_tab | ||
| mock_timdex_search_all_tab | ||
|
|
||
| get '/results?q=test' | ||
|
|
||
| assert_response :success | ||
| end | ||
|
|
||
| test 'uses SemanticBaseQuery when semantic search feature is enabled' do | ||
| # When the feature flag is enabled, base_query_for_mode returns SemanticBaseQuery (default tab is 'all') | ||
| ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do | ||
| mock_primo_search_all_tab | ||
| mock_timdex_search_all_tab | ||
|
|
||
| get '/results?q=test' | ||
|
|
||
| assert_response :success | ||
| end |
There was a problem hiding this comment.
I concur. Can you rework these tests to ensure we check the different paths are called?
Why these changes are being introduced:
We need a means to toggle the new semantic search
query mode in the UI.
Relevant ticket(s):
How this addresses that need:
This adds a feature to toggle semantic search on
and off. Lexical search remains the default.
Side effects of this change:
the code.
per-query basis.
Developer
Accessibility
New ENV
Approval beyond code review
Additional context needed to review
The feature is enabled on this PR build. One way to test is to run a query in the PR build and another locally, and see if the result sets are different.
Code Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing