Skip to content

Add feature to enable semantic search#381

Merged
jazairi merged 2 commits intomainfrom
use-493
Apr 22, 2026
Merged

Add feature to enable semantic search#381
jazairi merged 2 commits intomainfrom
use-493

Conversation

@jazairi
Copy link
Copy Markdown
Contributor

@jazairi jazairi commented Apr 16, 2026

Why these changes are being introduced:

We need a means to toggle the new semantic search
query mode in the UI.

Relevant ticket(s):

How this addresses that need:

This adds a feature to toggle semantic search on
and off. Lexical search remains the default.

Side effects of this change:

  • This feature is explicitly disabled for geospatial queries. If we want semantic search for GeoData in the future, we will need to revisit
    the code.
  • There is no query param exposing this feature, so it is not currently possible to toggle on a
    per-query basis.

Developer

Accessibility
  • ANDI or WAVE has been run in accordance to our guide.
  • This PR contains no changes to the view layer.
  • New issues flagged by ANDI or WAVE have been resolved.
  • New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
  • No new accessibility issues have been flagged.
New ENV
  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.
Approval beyond code review
  • UXWS/stakeholder approval has been confirmed.
  • UXWS/stakeholder review will be completed retroactively.
  • UXWS/stakeholder review is not needed.
Additional context needed to review

The feature is enabled on this PR build. One way to test is to run a query in the PR build and another locally, and see if the result sets are different.

Code Reviewer

Code
  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.
Documentation
  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.
Testing
  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

@qltysh
Copy link
Copy Markdown

qltysh Bot commented Apr 16, 2026

❌ 1 blocking issue (1 total)

Tool Category Rule Count
rubocop Lint Class has too many lines. [533/100] 1

Comment thread app/models/query_builder.rb
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 16, 2026

Coverage Report for CI Build 24797199778

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+0.003%) to 98.355%

Details

  • Coverage increased (+0.003%) from the base build.
  • Patch coverage: 2 of 2 lines across 2 files are fully covered (100%).
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 1398
Covered Lines: 1375
Line Coverage: 98.35%
Coverage Strength: 68.64 hits per line

💛 - Coveralls

@mitlib mitlib temporarily deployed to timdex-ui-pi-use-493-k4pwyo4uc April 16, 2026 22:35 Inactive
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-493-k4pwyo4uc April 17, 2026 15:49 Inactive
Why these changes are being introduced:

We need a means to toggle the new semantic search
query mode in the UI.

Relevant ticket(s):

- [USE-493](https://mitlibraries.atlassian.net/browse/USE-493)

How this addresses that need:

This adds a feature to toggle semantic search on
and off. Lexical search remains the default.

Side effects of this change:

- This feature is explicitly disabled for
geospatial queries. If we want semantic search for
GeoData in the future, we will need to revisit
the code.
- There is no query param exposing this feature,
so it is not currently possible to toggle on a
per-query basis.
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-493-k4pwyo4uc April 17, 2026 15:59 Inactive
@jazairi jazairi requested a review from Copilot April 17, 2026 16:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a feature-flagged semantic search mode for TIMDEX queries by switching the GraphQL query document and (when enabled) supplying queryMode: semantic, while keeping lexical search as the default behavior.

Changes:

  • Add FEATURE_TIMDEX_SEMANTIC_SEARCH flag and documentation.
  • Extend the TIMDEX GraphQL schema/client queries to support queryMode.
  • Add/adjust Minitest coverage around query building and controller behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test_output.log Adds a captured test run/coverage log output.
test/models/query_builder_test.rb Tests for default lexical behavior and semantic mode when flag enabled.
test/controllers/search_controller_test.rb Adds tests intended to cover BaseQuery vs SemanticBaseQuery selection.
config/schema/schema.json Adds queryMode to the GraphQL schema JSON used by graphql-client.
app/models/timdex_search.rb Introduces SemanticBaseQuery GraphQL document including queryMode.
app/models/query_builder.rb Sets queryMode to semantic when feature flag is enabled.
app/models/feature.rb Registers timdex_semantic_search as a valid feature flag.
app/controllers/search_controller.rb Switches to selecting query document by mode; attempts to disable semantic mode for geospatial flows.
README.md Documents the new FEATURE_TIMDEX_SEMANTIC_SEARCH env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/models/timdex_search.rb Outdated
Comment on lines +147 to +288
SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL
query(
$q: String
$citation: String
$contributors: String
$fundingInformation: String
$identifiers: String
$locations: String
$subjects: String
$title: String
$index: String
$from: String
$booleanType: String
$queryMode: String
$fulltext: Boolean
$perPage: Int
$accessToFilesFilter: [String!]
$contentTypeFilter: [String!]
$contributorsFilter: [String!]
$formatFilter: [String!]
$languagesFilter: [String!]
$literaryFormFilter: String
$placesFilter: [String!]
$sourceFilter: [String!]
$subjectsFilter: [String!]
) {
search(
searchterm: $q
citation: $citation
contributors: $contributors
fundingInformation: $fundingInformation
identifiers: $identifiers
locations: $locations
subjects: $subjects
title: $title
index: $index
from: $from
booleanType: $booleanType
queryMode: $queryMode
fulltext: $fulltext
perPage: $perPage
accessToFilesFilter: $accessToFilesFilter
contentTypeFilter: $contentTypeFilter
contributorsFilter: $contributorsFilter
formatFilter: $formatFilter
languagesFilter: $languagesFilter
literaryFormFilter: $literaryFormFilter
placesFilter: $placesFilter
sourceFilter: $sourceFilter
subjectsFilter: $subjectsFilter
) {
hits
records {
timdexRecordId
identifiers {
kind
value
}
title
source
contentType
contributors {
kind
value
}
publicationInformation
dates {
kind
value
range {
gte
lte
}
}
links {
kind
restrictions
text
url
}
notes {
kind
value
}
highlight {
matchedField
matchedPhrases
}
provider
rights {
kind
description
uri
}
sourceLink
summary
subjects {
kind
value
}
citation
}
aggregations {
accessToFiles {
key
docCount
}
contentType {
key
docCount
}
contributors {
key
docCount
}
format {
key
docCount
}
languages {
key
docCount
}
literaryForm {
key
docCount
}
places {
key
docCount
}
source {
key
docCount
}
subjects {
key
docCount
}
}
}
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SemanticBaseQuery duplicates the full BaseQuery document with only a small argument difference. This duplication increases the risk of future drift if fields/filters are added to one query but not the other. Consider factoring out the shared selection set / argument list into a shared template (or generating both query documents from a single source) so updates stay in sync.

Suggested change
SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL
query(
$q: String
$citation: String
$contributors: String
$fundingInformation: String
$identifiers: String
$locations: String
$subjects: String
$title: String
$index: String
$from: String
$booleanType: String
$queryMode: String
$fulltext: Boolean
$perPage: Int
$accessToFilesFilter: [String!]
$contentTypeFilter: [String!]
$contributorsFilter: [String!]
$formatFilter: [String!]
$languagesFilter: [String!]
$literaryFormFilter: String
$placesFilter: [String!]
$sourceFilter: [String!]
$subjectsFilter: [String!]
) {
search(
searchterm: $q
citation: $citation
contributors: $contributors
fundingInformation: $fundingInformation
identifiers: $identifiers
locations: $locations
subjects: $subjects
title: $title
index: $index
from: $from
booleanType: $booleanType
queryMode: $queryMode
fulltext: $fulltext
perPage: $perPage
accessToFilesFilter: $accessToFilesFilter
contentTypeFilter: $contentTypeFilter
contributorsFilter: $contributorsFilter
formatFilter: $formatFilter
languagesFilter: $languagesFilter
literaryFormFilter: $literaryFormFilter
placesFilter: $placesFilter
sourceFilter: $sourceFilter
subjectsFilter: $subjectsFilter
) {
hits
records {
timdexRecordId
identifiers {
kind
value
}
title
source
contentType
contributors {
kind
value
}
publicationInformation
dates {
kind
value
range {
gte
lte
}
}
links {
kind
restrictions
text
url
}
notes {
kind
value
}
highlight {
matchedField
matchedPhrases
}
provider
rights {
kind
description
uri
}
sourceLink
summary
subjects {
kind
value
}
citation
}
aggregations {
accessToFiles {
key
docCount
}
contentType {
key
docCount
}
contributors {
key
docCount
}
format {
key
docCount
}
languages {
key
docCount
}
literaryForm {
key
docCount
}
places {
key
docCount
}
source {
key
docCount
}
subjects {
key
docCount
}
}
}
}
SemanticBaseQuery = BaseQuery

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feels worth logging into a deferred maintenance log for this app. This whole model is basically duplication with minor changes that we might have a better solution for now than we did when we wrote it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, do we actually need the SemanticBaseQuery or can we just include $queryMode: String in all queries and only set a value in semantic searches? I suspect this will require cassettes to be regenerated but it feels better unless I am missing something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's much cleaner. Good call!

def execute_geospatial_query(query)
query = query.except('queryMode')

if query['geobox'] == 'true' && query[:geodistance] == 'true'
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute_geospatial_query checks query[:geodistance], but QueryBuilder populates the geodistance flag under the string key (query['geodistance']). This makes the combined geobox+geodistance branch unreachable and will route those requests to the wrong GraphQL query.

Suggested change
if query['geobox'] == 'true' && query[:geodistance] == 'true'
if query['geobox'] == 'true' && query['geodistance'] == 'true'

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. This is concerning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this feels worth confirming the code works as expected. We can do that under another ticket as this change set didn't introduce the potential problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not introduced here. I think we just never noticed it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 219 to 230
def execute_geospatial_query(query)
query = query.except('queryMode')

if query['geobox'] == 'true' && query[:geodistance] == 'true'
TimdexBase::Client.query(TimdexSearch::AllQuery, variables: query)
elsif query['geobox'] == 'true'
TimdexBase::Client.query(TimdexSearch::GeoboxQuery, variables: query)
elsif query['geodistance'] == 'true'
TimdexBase::Client.query(TimdexSearch::GeodistanceQuery, variables: query)
else
TimdexBase::Client.query(TimdexSearch::BaseQuery, variables: query)
TimdexBase::Client.query(base_query_for_mode, variables: query)
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute_geospatial_query removes queryMode unconditionally whenever the :geodata feature is enabled. That disables semantic search even for non-geospatial searches in geodata mode, and can also result in SemanticBaseQuery being used with $queryMode unset (passed as null), which bypasses the schema default ("lexical") because the argument is still present in the query document. Consider only stripping queryMode for actual geospatial queries (geobox/geodistance), and/or forcing BaseQuery for geospatial execution paths.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels worth logging a ticket or a technical debt log.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure where to ticket it. Is GeoData in scope of USE? Or would this go in the EngX kanban board?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best in TIMX or in the maintenance log (which we might not have for TIMDEX UI yet)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1224 to +1243
test 'uses BaseQuery when semantic search feature is disabled' do
# When the feature flag is not enabled, base_query_for_mode returns BaseQuery (default tab is 'all')
mock_primo_search_all_tab
mock_timdex_search_all_tab

get '/results?q=test'

assert_response :success
end

test 'uses SemanticBaseQuery when semantic search feature is enabled' do
# When the feature flag is enabled, base_query_for_mode returns SemanticBaseQuery (default tab is 'all')
ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do
mock_primo_search_all_tab
mock_timdex_search_all_tab

get '/results?q=test'

assert_response :success
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are named as though they verify which GraphQL query document is used, but they only assert :success and the TIMDEX client mock does not assert the query constant or variables. This means the tests won’t fail if base_query_for_mode is wired incorrectly. Consider tightening the expectation to assert TimdexBase::Client.query is invoked with TimdexSearch::BaseQuery vs TimdexSearch::SemanticBaseQuery (and that variables include/omit queryMode as appropriate).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. Can you rework these tests to ensure we check the different paths are called?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up dropping those tests because I think it's adequately covered in the Query Builder test. A controller test is likely warranted if/when we implement queryMode as a param. Let me know if you feel differently.

Comment thread test/models/query_builder_test.rb
@jazairi jazairi assigned jazairi and unassigned jazairi Apr 17, 2026
@JPrevost JPrevost self-assigned this Apr 22, 2026
Comment thread app/models/timdex_search.rb Outdated
Comment on lines +147 to +288
SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL
query(
$q: String
$citation: String
$contributors: String
$fundingInformation: String
$identifiers: String
$locations: String
$subjects: String
$title: String
$index: String
$from: String
$booleanType: String
$queryMode: String
$fulltext: Boolean
$perPage: Int
$accessToFilesFilter: [String!]
$contentTypeFilter: [String!]
$contributorsFilter: [String!]
$formatFilter: [String!]
$languagesFilter: [String!]
$literaryFormFilter: String
$placesFilter: [String!]
$sourceFilter: [String!]
$subjectsFilter: [String!]
) {
search(
searchterm: $q
citation: $citation
contributors: $contributors
fundingInformation: $fundingInformation
identifiers: $identifiers
locations: $locations
subjects: $subjects
title: $title
index: $index
from: $from
booleanType: $booleanType
queryMode: $queryMode
fulltext: $fulltext
perPage: $perPage
accessToFilesFilter: $accessToFilesFilter
contentTypeFilter: $contentTypeFilter
contributorsFilter: $contributorsFilter
formatFilter: $formatFilter
languagesFilter: $languagesFilter
literaryFormFilter: $literaryFormFilter
placesFilter: $placesFilter
sourceFilter: $sourceFilter
subjectsFilter: $subjectsFilter
) {
hits
records {
timdexRecordId
identifiers {
kind
value
}
title
source
contentType
contributors {
kind
value
}
publicationInformation
dates {
kind
value
range {
gte
lte
}
}
links {
kind
restrictions
text
url
}
notes {
kind
value
}
highlight {
matchedField
matchedPhrases
}
provider
rights {
kind
description
uri
}
sourceLink
summary
subjects {
kind
value
}
citation
}
aggregations {
accessToFiles {
key
docCount
}
contentType {
key
docCount
}
contributors {
key
docCount
}
format {
key
docCount
}
languages {
key
docCount
}
literaryForm {
key
docCount
}
places {
key
docCount
}
source {
key
docCount
}
subjects {
key
docCount
}
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, do we actually need the SemanticBaseQuery or can we just include $queryMode: String in all queries and only set a value in semantic searches? I suspect this will require cassettes to be regenerated but it feels better unless I am missing something.

Comment thread test_output.log Outdated
Comment on lines +1224 to +1243
test 'uses BaseQuery when semantic search feature is disabled' do
# When the feature flag is not enabled, base_query_for_mode returns BaseQuery (default tab is 'all')
mock_primo_search_all_tab
mock_timdex_search_all_tab

get '/results?q=test'

assert_response :success
end

test 'uses SemanticBaseQuery when semantic search feature is enabled' do
# When the feature flag is enabled, base_query_for_mode returns SemanticBaseQuery (default tab is 'all')
ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do
mock_primo_search_all_tab
mock_timdex_search_all_tab

get '/results?q=test'

assert_response :success
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. Can you rework these tests to ensure we check the different paths are called?

Comment thread test/models/query_builder_test.rb
@jazairi jazairi requested a review from JPrevost April 22, 2026 19:11
@jazairi jazairi merged commit 60e878f into main Apr 22, 2026
6 checks passed
@jazairi jazairi deleted the use-493 branch April 22, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants