Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabler: Support multiple sites for DOC collection #85

Closed
4 of 6 tasks
blairlearn opened this issue Apr 20, 2022 · 3 comments · Fixed by #108
Closed
4 of 6 tasks

Enabler: Support multiple sites for DOC collection #85

blairlearn opened this issue Apr 20, 2022 · 3 comments · Fixed by #108
Assignees

Comments

@blairlearn
Copy link
Contributor

blairlearn commented Apr 20, 2022

Allow searches using the DOC collection to request results from multiple sites.

As presently implemented, when the DOC collection is specified, the user must supply exactly one site to return results for.

Some of the DOCs have more than one site which should show up in the results. (e.g. Search results on a DCEG microsite should return results for dceg.cancer.gov, and www.cancer.gov/connect-prevention-study.) There should be no hard limit on the number of sites.

Existing site configurations must not require updates in order to continue working.

A preliminary list of tasks:

  • Create an updated query.
  • Create unit tests
  • Create integration tests for the new behavior (Do not break/modify the existing tests)
    This will require coordination with product owners to determine success criteria.
  • Update the code.

ESTIMATE TBD

Resources:

Prerequisites

Sub-Tasks

  • Short Task Description - Issue #9999
  • Short Task Description - Issue #9999

Notes

DCCPS Site to index in Search
DCCPS Search - Sites to Index.xlsx

@blairlearn
Copy link
Contributor Author

blairlearn commented Jul 13, 2023

The DOC search queries for both English and Spanish use the same sub-query to eliminate results which aren't from sites matching the site parameter.

    "bool": {
        "must": [
            { "exists": { "field": "searchtitle" } },
            { "prefix": { "searchurl.raw": { "value": "<SITE>" } } }
        ]
    }

where <SITE> is the hostname and path to match (e.g. "physics.cancer.gov" or "www.cancer.gov/rare-brain-spine-tumor/espanol").

To allow multiple sites, this query is modified to

  1. Remove the prefix query from the must block.
  2. Add a should block, containing a minimum of one prefix query.
  3. Add "minimum_should_match": 1 to require that at least one* of the prefix queries be matched.

* By default, if a bool query contains a must, then none of the should clauses are required to match. By specifying "minimum_should_match": 1, we require at least one match. (i.e. Only the ones that actually match the site parameter.)

Match for a single site

"bool": {
    "must": [
        { "exists": { "field": "searchtitle" } }
        
    ],
    "should": [
        { "prefix": { "searchurl.raw": { "value": "dceg.cancer.gov" } } }
    ],
    "minimum_should_match": 1
}

Match for multiple sites

"bool": {
    "must": [
        { "exists": { "field": "searchtitle" } }
        
    ],
    "should": [
        { "prefix": { "searchurl.raw": { "value": "dceg.cancer.gov" } } },
        { "prefix": { "searchurl.raw": { "value": "www.cancer.gov/connect-prevention-study" } } }
    ],
    "minimum_should_match": 1
}

@blairlearn blairlearn self-assigned this Jul 14, 2023
@blairlearn
Copy link
Contributor Author

The URL change for this is very straightforward. The site argument is already retrieved from the query string, so passing multiple values is just a matter of additional query string parameters (e.g. site=dceg.cancer.gov&site=www.cancer.gov/connect-prevention-study)

In SearchController::Get:

  • The site parameter changes from string to string[] site = null (because this is already an optional parameter, it must continue as such).
  • Additional validation logic is required to handle the cases where either site is null, or any of the values passed as site are null or whitespace.

Additional logic is required in the service classes to change their siteFilter from string to string[] and incorporate the query change outlined in the previous comment.

blairl-nih added a commit to blairl-nih/sitewide-search-api that referenced this issue Jul 18, 2023
blairl-nih added a commit to blairl-nih/sitewide-search-api that referenced this issue Jul 18, 2023
blairl-nih added a commit to blairl-nih/sitewide-search-api that referenced this issue Jul 20, 2023
Tweak description of controller's collection parameter.

 Closes NCIOCPL#85
blairl-nih added a commit to blairl-nih/sitewide-search-api that referenced this issue Jul 20, 2023
Tweak description of controller's collection parameter.

 Closes NCIOCPL#85
@blairlearn
Copy link
Contributor Author

QA Notes:

This update changes the way search results are filtered.

Previously, your options were "All the results for the search string" or "Only the results that came from a specific site" (e.g. "Only results from physics.cancer.gov" or "Only results from www.cancer.gov/nano").

This change allows the filtering to be multiple sites (e.g. "Only results that came from either dceg.cancer.gov or `www.cancer.gov/connect-prevention-study'")

Given param site = "all"
Then expect results to be from multiple sites

Given param site = "www.cancer.gov/nano"
And param collection = "doc"
Then expect results.url to only start with "www.cancer.gov/nano"

Given param site = "physics.cancer.gov"
And param collection = "doc"
Then expect results.url to only start with "physics.cancer.gov"

Given params { site: ["dceg.cancer.gov", "www.cancer.gov/connect-prevention-study"] }
And param collection = "doc"
Then expect results.url to only start with "dceg.cancer.gov" or "www.cancer.gov/connect-prevention-study"

It's a bit weedy, but for that fourth case, the query string should be site=dceg.cancer.gov&site=www.cancer.gov%2Fconnect-prevention-study

blairlearn pushed a commit that referenced this issue Aug 4, 2023
Tweak description of controller's collection parameter.

 Closes #85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant