Skip to content

Commit

Permalink
Adds support for AND, OR, custom keyword matching
Browse files Browse the repository at this point in the history
Why are these changes being introduced:

* TIMDEX currently ORs all keyword searches together which leads to a
  lot of search results, many of which are not relevant (to the
  extreme of sometimes having no records returned that have all the
  terms which is sometimes the expectation)

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-281

How does this address that need:

* Leverages OpenSearch `minimum_should_match` [feature](https://opensearch.org/docs/latest/query-dsl/minimum-should-match/#valid-values)
  along with a GraphQL feature to choose a preconfigured value or supply your own

Document any side effects to this change:

* once we get feedback from stakeholders, it is likely we'll want to adjust the options available, document
  them more thoroughly in the API docs, and provide them with meaningful names, as well as probably remove
  some of the options
  • Loading branch information
JPrevost committed Apr 19, 2024
1 parent 76b2b9b commit b9e31c4
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 4 deletions.
9 changes: 6 additions & 3 deletions app/graphql/types/query_type.rb
Expand Up @@ -59,6 +59,8 @@ def record_id(id:, index:)
'you with one for your specific use case'

argument :source, String, required: false, default_value: 'All', deprecation_reason: 'Use `sourceFilter`'
argument :boolean_type, String, required: false, default_value: 'OR',
description: 'How to join multiword queries. Defaults to "OR" which means any of the words much match. Options include: "OR", "AND", "minimum_a"'

# applied filters
argument :access_to_files_filter, [String],
Expand Down Expand Up @@ -96,9 +98,9 @@ def record_id(id:, index:)
end

def search(searchterm:, citation:, contributors:, funding_information:, geodistance:, geobox:, identifiers:,
locations:, subjects:, title:, index:, source:, from:, **filters)
locations:, subjects:, title:, index:, source:, from:, boolean_type:, **filters)
query = construct_query(searchterm, citation, contributors, funding_information, geodistance, geobox, identifiers,
locations, subjects, title, source, filters)
locations, subjects, title, source, boolean_type, filters)

results = Opensearch.new.search(from, query, Timdex::OSClient, highlight_requested?, index)

Expand Down Expand Up @@ -128,9 +130,10 @@ def inject_hits_fields_into_source(hits)
end

def construct_query(searchterm, citation, contributors, funding_information, geodistance, geobox, identifiers,
locations, subjects, title, source, filters)
locations, subjects, title, source, boolean_type, filters)
query = {}
query[:q] = searchterm
query[:boolean_type] = boolean_type
query[:citation] = citation
query[:contributors] = contributors
query[:funding_information] = funding_information
Expand Down
30 changes: 29 additions & 1 deletion app/models/opensearch.rb
Expand Up @@ -105,6 +105,33 @@ def multisearch
]
end

# https://opensearch.org/docs/latest/query-dsl/minimum-should-match/#valid-values
# checks for preconfigured cases or uses whatever is supplied (i.e. we currently accept OpenSearch syntax for
# minimum_should_match)
def minimum_should_match
case @params[:boolean_type]
when 'OR'
'0%'
when 'AND'
'100%'
# 5 or less terms match all (AND)
# More than 5 match all but one
when 'experiment_a'
'4<100% 5<-1'
# 4 or less terms match all (AND)
# More than 4 match all but one
when 'experiment_b'
'3<100% 4<-1'
# 4 or less terms match all (AND)
# 5 to 10 match all but one
# 10 or more match 90%
when 'experiment_c'
'3<100% 9<-1 10<90%'
else
@params[:boolean_type]
end
end

def matches
m = []
if @params[:q].present?
Expand All @@ -113,7 +140,8 @@ def matches
query: @params[:q].downcase,
fields: ['alternate_titles', 'call_numbers', 'citation', 'contents', 'contributors.value', 'dates.value',
'edition', 'funding_information.*', 'identifiers.value', 'languages', 'locations.value',
'notes.value', 'numbering', 'publication_information', 'subjects.value', 'summary', 'title']
'notes.value', 'numbering', 'publication_information', 'subjects.value', 'summary', 'title'],
minimum_should_match:
}
}
end
Expand Down

0 comments on commit b9e31c4

Please sign in to comment.