Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token search should be case-insensitive when the system being searched is case-insensitive #1551

Closed
lmsurpre opened this issue Oct 1, 2020 · 4 comments
Assignees
Labels
bug Something isn't working P2 Priority 2 - Should Have reindex Resolution of issue will require a $reindex during upgrade search

Comments

@lmsurpre
Copy link
Member

lmsurpre commented Oct 1, 2020

Describe the bug
https://www.hl7.org/fhir/search.html#token says

Match is case sensitive unless the underlying semantics for the context indicate that the token should be interpreted case-insensitively (see, e.g. CodeSystem.caseSensitive)

Currently, our token search is ALWAYS case-sensitive.

To Reproduce
Steps to reproduce the behavior:

  1. create a resource with a codeable concept that
    • has a system in a codesystem that registered with the server and is marked as case-insensitive; and
    • has a search parameter defined on it
  2. perform a search for that code's value with different casing

Expected behavior
the resource is found

Additional context
https://www.hl7.org/fhir/codesystem-definitions.html#CodeSystem.caseSensitive says this:

If this value is missing, then it is not specified whether a code system is case sensitive or not. When the rule is not known, Postel's law should be followed: produce codes with the correct case, and accept codes in any case. This element is primarily provided to support validation software.

In my opinion, that conflicts with the guidance from https://www.hl7.org/fhir/search.html#token which says search should be case-sensitive by default. However, I think we should still use the guidance from https://www.hl7.org/fhir/search.html#token and only use case-insensitive search when it is explicitly set in the corresponding CodeSystem.

@lmsurpre lmsurpre added bug Something isn't working search labels Oct 1, 2020
@lmsurpre
Copy link
Member Author

lmsurpre commented Oct 1, 2020

A few implementation considerations:

  • token search allows you to search without specifying the system (via the param=code syntax). It would be very difficult to implement that in a performant way if you don't know whether the code is case-sensitive or not. If there's not a good way to do it, are we OK saying this variant is always case-sensitive?
  • one way to support case-sensitivity would be to capitalize the token value "on the way in" (i.e. during extraction). this is how this issue differs from Investigate options for reducing index costs of standard/lower case string search parameters #1535 ...in that case we need to support searching each value as either case-sensitive or case-insensitive whereas here we should know up front and only store it one way. HOWEVER, a ramification of this approach is that we'd have a bad index if a given CodeSystem flipped from "unknown" (e.g. missing CodeSystem.caseSensitive) to "case-insensitive". That would need to be considered as part of FHIR Search - Search Re-indexing #789

@kmbarton423 kmbarton423 added the P2 Priority 2 - Should Have label Feb 18, 2021
@michaelwschroeder michaelwschroeder self-assigned this Apr 7, 2021
@michaelwschroeder
Copy link
Contributor

After discussion with the team, it was agreed that the general approach would be to store indexed token values as they were specified for code systems which specify caseSensitive=true, and as normalized strings for code systems which specify caseSensitive=false, or which don't specify case sensitivity, or where no code system was specified. The following tables summarize the actions taken during indexing and searching for each code system case sensitivity setting:

When indexing tokens:

code system case-sensitivity setting how it gets stored
case-sensitive store as we do today (case-sensitive)
case-insensitive store as normalized value
case sensitivity not specified store as normalized value
code system not specified store as normalized value

When searching tokens:

code system case-sensitivity setting how it gets searched
case-sensitive search as we do today: match on code system ID + case-sensitive token value + search parameter name
case-insensitive search against normalized values: match on code system ID + normalized token value + search parameter name
case sensitivity not specified search against normalized values: match on code system ID + normalized token value + search parameter name
code system not specified search both ways: match on (case-sensitive token value OR normalized token value) + search parameter name

Search results:

indexed case-sensitive indexed case-insensitive indexed case sensitivity not specified indexed code system not specified
search case-sensitive works as today - correct results returned no matches as expected - code system ID would not match no matches as expected - code system ID would not match no matches as expected - code system ID would not match
search case-insensitive no matches as expected - code system ID would not match correct results returned no matches as expected - code system ID would not match no matches as expected - code system ID would not match
search case sensitivity not specified no matches as expected - code system ID would not match no matches as expected - code system ID would not match correct results returned no matches as expected - code system ID would not match
search code system not specified match against non-normalized token value - correct results returned match against normalized token value - correct results returned match against normalized token value - correct results returned match against normalized token value - correct results returned

@michaelwschroeder michaelwschroeder added this to the Sprint 2021-06 milestone Apr 19, 2021
@lmsurpre lmsurpre added the reindex Resolution of issue will require a $reindex during upgrade label Apr 20, 2021
michaelwschroeder added a commit that referenced this issue Apr 23, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Apr 26, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Apr 27, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Apr 27, 2021
Issue #1551 - perform token search based on codeSystem case-sensitivity
@tbieste
Copy link
Contributor

tbieste commented May 13, 2021

FYI. I was able to verify the following searches:

Patient?identifier:of-type=MR|12345 [FOUND]
Patient?identifier:of-type=mr|12345 [FOUND]
Patient?identifier:of-type=http://terminology.hl7.org/CodeSystem/v2-0203|MR|12345 [FOUND]
Patient?identifier:of-type=http://terminology.hl7.org/CodeSystem/v2-0203|mr|12345 [FOUND]
Observation?category=vital-signs [FOUND]
Observation?category=Vital-Signs [FOUND]
Observation?category=http://terminology.hl7.org/CodeSystem/observation-category|vital-signs [FOUND]
Observation?category=http://terminology.hl7.org/CodeSystem/observation-category|Vital-Signs [NOT FOUND, since CodeSystem is case-sensitive]

@kmbarton423
Copy link
Contributor

kmbarton423 commented May 13, 2021

Confirmed Troy search coverage of the following cases:
- case-sensitive ( CodeSystem "caseSensitive: true")
- case sensitivity not specified
- code system not specified

Verified one additional case:
- case-insensitive ( CodeSystem "caseSensitive: false")

required configuration setting ... fhirServer/core/serverRegistryResourceProviderEnabled=true
created CodeSystem resource initially with "caseSensitive: true"
created Patient with identifier (system and value) tied to CodeSystem
confirmed FOUND/NOT FOUND behavior on search Patient?identifier=system|value
updated CodeSystem resource "caseSensitive: false"
updated Patient to force reindex
confirmed FOUND/FOUND behavior on search Patient?identifier=system|value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 Priority 2 - Should Have reindex Resolution of issue will require a $reindex during upgrade search
Projects
None yet
Development

No branches or pull requests

4 participants