Conversation
❌ 3 blocking issues (4 total)
|
|
|
||
| results = Opensearch.new.search(from, query, Timdex::OSClient, highlight: highlight_requested?, index: index, fulltext: fulltext, query_mode: query_mode) | ||
| results = Opensearch.new.search(from, query, Timdex::OSClient, highlight: highlight_requested?, index: index, | ||
| fulltext: fulltext, query_mode: query_mode, requested_aggregations: requested_aggregations) |
| MAX_SIZE = 200 | ||
|
|
||
| def search(from, params, client, highlight: false, index: nil, fulltext: false, query_mode: 'keyword') | ||
| def search(from, params, client, highlight: false, index: nil, fulltext: false, query_mode: 'keyword', |
There was a problem hiding this comment.
Pull request overview
This PR aims to improve search efficiency by only computing OpenSearch aggregations when the GraphQL query actually requests them, using GraphQL field-usage analysis to determine which aggregations to include.
Changes:
- Add
requested_aggregationsdetection in GraphQLQueryType#searchand pass it through to the OpenSearch query layer. - Add
Aggregations.for_requestto filter the aggregation definitions down to only requested ones, and updateOpensearch#build_queryto conditionally include aggregations. - Add model tests covering conditional inclusion/exclusion of aggregations and
for_requestfiltering behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| app/graphql/types/query_type.rb | Computes requested aggregation fields from tracer used_fields, passes them to OpenSearch, and makes bucket-collapsing nil-safe. |
| app/models/opensearch.rb | Adds requested_aggregations: param and only injects aggregations into the OpenSearch request when present. |
| app/models/aggregations.rb | Introduces for_request to select only a subset of known aggregations. |
| test/models/opensearch_test.rb | Adds tests asserting aggregations are included/excluded in built query based on requested list. |
| test/models/aggregations_test.rb | Adds tests for Aggregations.for_request selection/validation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def requested_aggregations | ||
| used_fields = context[:tracers].first.log_data[:used_fields] | ||
| used_fields.select { |field| field.start_with?('Aggregations.') } | ||
| .map { |field| field.sub('Aggregations.', '').to_sym } |
There was a problem hiding this comment.
requested_aggregations is derived from used_fields entries which (by default in graphql-ruby) are GraphQL names like Aggregations.accessToFiles / Aggregations.contentType. Converting the suffix directly to to_sym will produce symbols like :accessToFiles / :contentType, which won't match the snake_case keys used by Aggregations.all (e.g., :access_to_files, :content_type). Additionally, the GraphQL field format corresponds to the OpenSearch aggregation key content_format, so :format will never be requested with the current mapping. Consider underscoring the extracted field name (and mapping format -> content_format) before passing it into Opensearch/Aggregations.for_request, and add/adjust tests to cover at least one camelCase field like contentType or accessToFiles and the format/content_format mapping.
| def requested_aggregations | |
| used_fields = context[:tracers].first.log_data[:used_fields] | |
| used_fields.select { |field| field.start_with?('Aggregations.') } | |
| .map { |field| field.sub('Aggregations.', '').to_sym } | |
| def requested_aggregation_field(field_name) | |
| return :content_format if field_name == 'format' | |
| field_name.underscore.to_sym | |
| end | |
| def requested_aggregations | |
| used_fields = context[:tracers].first.log_data[:used_fields] | |
| used_fields.select { |field| field.start_with?('Aggregations.') } | |
| .map { |field| requested_aggregation_field(field.sub('Aggregations.', '')) } |
There was a problem hiding this comment.
This seems critical, but it's also confusing to me. I'm not sure why the aggregations model has a different name for the format aggregation. I'm going to look at it tomorrow with fresh eyes to see if it makes more sense.
There was a problem hiding this comment.
After some discussion, we think this is related to format being a stop word in the REST API. Out of caution, I've decided to leave this mapping as-is.
Why these changes are being introduced: Aggregations are currently calculated even when they are not requested in the GraphQL query. It would be more efficient to calculate them only when they are needed. Relevant ticket(s): - [USE-491](https://mitlibraries.atlassian.net/browse/USE-491) How this addresses that need: This adds a `requested_aggregations` method to Query Type that evaluates which aggregations are requested in the query. This information is then used in the Aggregations and Opensearch models to calculate only the requested aggregations. Side effects of this change: None.
Why these changes are being introduced:
Aggregations are currently calculated even when
they are not requested in the GraphQL query. It
would be more efficient to calculate them only
when they are needed.
Relevant ticket(s):
How this addresses that need:
This adds a
requested_aggregationsmethod toQuery Type that evaluates which aggregations are
requested in the query. This information is then
used in the Aggregations and Opensearch models
to calculate only the requested aggregations.
Side effects of this change:
None.
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Code Reviewer
(not just this pull request message)
Requires database migrations?
NO
Includes new or updated dependencies?
NO