Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum of age groups doesn't match with the result from Count #2528

Closed
DorianQuell opened this issue Jun 18, 2021 · 6 comments
Closed

Sum of age groups doesn't match with the result from Count #2528

DorianQuell opened this issue Jun 18, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@DorianQuell
Copy link

DorianQuell commented Jun 18, 2021

Hey,
I just made several search queries on my FHIR server:
[base]\Patient?_has:Condition:patient:code=C49.0&_summary=count
=> Result: 1500

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=gt1961&_summary=count
=> Result: 369
[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=le1961&_summary=count
=> Result: 1255
=> Total: 1624

The total of both queries should result in the same as the general count, right?
If I split the same search by gender the numbers work out again.
Is there something I'm doing wrong with the search?

I'm using version 4.7.1

@DorianQuell DorianQuell added the bug Something isn't working label Jun 18, 2021
@lmsurpre
Copy link
Member

lmsurpre commented Jun 18, 2021

The total of both queries should result in the same as the general count, right?

Not necessarily. We use the range definitioins of gt and le as defined at https://www.hl7.org/fhir/search.html#prefix

Try changing your gt1961 to sa1961. The sa is for "starts after" and I believe this one has the semantics that your after. Let us know how it goes.

Alternatively you could change 1961 to 1961-12-31 in your search query and then the overlap between your two queries would be a single day.

@lmsurpre
Copy link
Member

lmsurpre commented Jun 18, 2021

Actually, let me have a look deeper look at this to see if we have something wrong here. The only real overlap should be from target values that have a range that spans across the upper bound of the two searches and in the case of a birthday we really shouldn't have that.

@DorianQuell
Copy link
Author

I did another set of tests on a different subsection of my patients.

general count to have a reference value
[base]\Patient?_has:Condition:patient:code=C49.0&_summary=count
Result: 1495

gt1800 to check if it counts any patients that might not have a birthdate set.
[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=gt1800&_summary=count
Result 1495 => All patients seem to have a birth year

Now I did one lt1961, le1961, gt1961, ge1961 and =1961.

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=lt1961&_summary=count
Results: 1167

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=le1961&_summary=count
Results: 1167

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=gt1961&_summary=count
Results: 364

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=ge1961&_summary=count
Results: 364

[base]\Patient?_has:Condition:patient:code=C49.0&birthdate=1961&_summary=count
Results: 36

This means that there is something going wrong with the le and ge modifiers. It seems like those actually do the same as lt and gt

lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` is interpreted as
"the range above the search value intersects (i.e. overlaps) with the
range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` means
"the range below the search value intersects (i.e. overlaps) with the
range of the target value" and `gt` means "the range above the search
value intersects (i.e. overlaps) with the range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` means
"the range below the search value intersects (i.e. overlaps) with the
range of the target value" and `gt` means "the range above the search
value intersects (i.e. overlaps) with the range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` means
"the range below the search value intersects (i.e. overlaps) with the
range of the target value" and `gt` means "the range above the search
value intersects (i.e. overlaps) with the range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` means
"the range below the search value intersects (i.e. overlaps) with the
range of the target value" and `gt` means "the range above the search
value intersects (i.e. overlaps) with the range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 18, 2021
We use the range interpretation of the search prefixes for both date and
number/quantity searches.  The spec says `lt` means
"the range below the search value intersects (i.e. overlaps) with the
range of the target value" and `gt` means "the range above the search
value intersects (i.e. overlaps) with the range of the target value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 19, 2021
For both date and number searches.

We use the range interpretation of the search prefixes. The spec says
`lt` means "the range below the search value intersects (i.e. overlaps)
with the range of the target value" and `gt` means "the range above the
search value intersects (i.e. overlaps) with the range of the target
value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Jun 19, 2021
For both date and number searches.

We use the range interpretation of the search prefixes. The spec says
`lt` means "the range below the search value intersects (i.e. overlaps)
with the range of the target value" and `gt` means "the range above the
search value intersects (i.e. overlaps) with the range of the target
value"

Previously, we interpreted that to mean that a search like `gt2016`
should return any value above `2016-01-01 00:00:00` in the server's
timezone.

However, this interpretation leads to confusing and unintuitive results.
Instead, we will now interpret the "search value" of 2016 to be the
range `[2016-01-01 00:00:00, 2017-01-01 00:00:00)` and therefor it will
only find target values above 2017-01-01 00:00:00 (inclusive).

And similarly for `lt`.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
@lmsurpre
Copy link
Member

We use the range interpretation of the search prefixes for both date and
number/quantity searches. The spec says:

  • lt means "the range below the search value intersects (i.e. overlaps) with the
    range of the target value"
  • gt means "the range above the search value intersects (i.e. overlaps) with the
    range of the target value"

For numbers, the spec additionally says:

  • When a comparison prefix in the set gt, lt, ge, le, sa & eb is provided, the implicit precision of the number is ignored, and they are treated as if they have arbitrarily high precision

Previously, we tried to interpret date search in the same way and so a search like gt2016
would return any value above 2016-01-01 00:00:00 in the server's
timezone. However, we were not consistent with this because, for some reason,
we decided that lt2016 should return any value below 2017-01-01 00:00:00.

However, even if we properly implemented this "arbitrarily high precision" clause for date search,
this interpretation leads to confusing and unintuitive results like gt2016 returning dates in 2016.

Instead, I have opened #2531 to interpret the "search value" of
a date query to be its implicit range. For example, the search value of 2016 would be the
range [2016-01-01 00:00:00, 2017-01-01 00:00:00) and therefor a search for gt2016 will
only find target values above 2017-01-01 00:00:00 (inclusive)...in the timezone of the server
(which defaults to the recommended value of UTC).

@lmsurpre lmsurpre self-assigned this Jun 21, 2021
@lmsurpre lmsurpre added this to the Sprint 2021-08 milestone Jun 21, 2021
lmsurpre added a commit that referenced this issue Jun 22, 2021
I meant to include this in my last commit for #2528...

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
@d0roppe
Copy link
Collaborator

d0roppe commented Jul 1, 2021

should be verified with issue 1624 1653

@d0roppe d0roppe modified the milestones: Sprint 2021-08, Sprint 2021-09 Jul 1, 2021
@tbieste
Copy link
Contributor

tbieste commented Jul 1, 2021

Verified by creating 6 patients (2 per year) with birthdates in 1961, 1962, and 1963.

https://localhost:9443/fhir-server/api/v4/Patient
Returns: 6 patients (all)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=1961
Returns: 2 patients (w/ birthdates in 1961)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=1962
Returns: 2 patients (w/ birthdates in 1962)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=1963
Returns: 2 patients (w/ birthdates in 1963)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=lt1962
Returns: 2 patients (w/ birthdates in 1961)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=le1962
Returns: 4 patients (w/ birthdates in 1961 and 1962)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=ge1962
Returns: 4 patients (w/ birthdates in 1962 and 1963)
https://localhost:9443/fhir-server/api/v4/Patient?birthdate=gt1962
Returns: 2 patients (w/ birthdates in 1963)

@tbieste tbieste closed this as completed Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants