Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support _total parameter in search requests #1353

Closed
punktilious opened this issue Jul 23, 2020 · 9 comments
Closed

Support _total parameter in search requests #1353

punktilious opened this issue Jul 23, 2020 · 9 comments
Assignees
Labels
bulk-data enhancement New feature or request P2 Priority 2 - Should Have performance performance search showcase Used to Identify End-of-Sprint Demos trial-use

Comments

@punktilious
Copy link
Collaborator

punktilious commented Jul 23, 2020

Is your feature request related to a problem? Please describe.
https://www.hl7.org/fhir/search.html#total

The _total parameter is trial-use, but could be useful in cases where the client wants to improve performance by bypassing the resource count calculation (which requires a separate database query).

Per the spec:

none        There is no need to populate the total count; the client will not use it
estimate    A rough estimate of the number of matching resources is sufficient
accurate	   The client requests that the server provide an exact total of the number of matching resources

Team Discussion:
In our implementation we are only going to support 'estimate' as 'accurate'

Describe the solution you'd like
Implement _total as described in the spec.

Describe alternatives you've considered
Ignore _total and rely on existing count query implementation.

Additional context
N/A

@prb112
Copy link
Contributor

prb112 commented Mar 1, 2021

Added the label to bulkdata, it would cut the queries in nearly a half. For a 12K export that would mean about 24 queries at 100ms each is 2.4 seconds of saved time. In very large environments, this adds up.
More specifically this benefits PatientChunkReader

@lmsurpre
Copy link
Member

lmsurpre commented Mar 5, 2021

We may want to add support for this _total parameter on the history operations as well. For whole-system history, we think it should continue to default to no total, because that is the current behavior and its the only way to make it performant enough for a "changes feed".

@tbieste tbieste self-assigned this Mar 9, 2021
@tbieste tbieste added this to the Sprint 2021-04 milestone Mar 9, 2021
@tbieste
Copy link
Contributor

tbieste commented Mar 9, 2021

There is a little bit of validation with paging and the calculation of the next page link that detects if a request would be beyond the last page. If the _total is not "accurate", then if we skip the total page calculation to save on performance, then we would need to ensure that requests beyond the last page behave in a reasonable matter, since we would no longer know which page is the last without having the total.

  1. If we don't have the total, then we don't know if we are on the last page. Today, if we know we are on the last page, we don't return a "next" link. Without the total, seems like we would always return it if the current page is full (or maybe just always).
  2. If we don't have the total, then we don't know if the caller is requesting a page beyond the last page. As long as the internal query can handle it, I think we would just try to run the query and get nothing back. Unless there's some problem with requesting _page=12345678, when there's only 1 total record.

@tbieste
Copy link
Contributor

tbieste commented Mar 10, 2021

I dug into it a bit more.

  1. If the total is known, then the _page automatically gets reduced back to the last page if you pick too big of a number. It generates an OperationOutcome for it, but just throws it away.
  2. If we did not do this, then the OFFSET used in the internal query would just return no results back, so that looks good.

@tbieste
Copy link
Contributor

tbieste commented Mar 11, 2021

In FHIRPersistenceJDBCImpl.search(…) the total is used to determine matched vs included, if the _include/_revcinclude parameters are used. So may need to disallow _total in combination with _include/_revinclude.

@tbieste
Copy link
Contributor

tbieste commented Mar 11, 2021

I think there would be a way out needing to do the match vs included calculation at all by adding "includeType" to the ResourceDTO since the underlying query does use a SORT_ORDER specifically for that. If it's worth doing, that is.

tbieste added a commit that referenced this issue Mar 11, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
@prb112
Copy link
Contributor

prb112 commented Mar 11, 2021

There is a little bit of validation with paging and the calculation of the next page link that detects if a request would be beyond the last page. If the _total is not "accurate", then if we skip the total page calculation to save on performance, then we would need to ensure that requests beyond the last page behave in a reasonable matter, since we would no longer know which page is the last without having the total.

I think we would always return a next page blindly.
For BulkData I would run a total=accurate enabled search, and switch to _total=none, and control the paging in the bulkdata code.

1. If we don't have the total, then we don't know if we are on the last page. Today, if we know we are on the last page, we don't return a "next" link.  Without the total, seems like we would always return it if the current page is full (or maybe just always).

We could just return the next page, and the next page would be empty.

2. If we don't have the total, then we don't know if the caller is requesting a page beyond the last page. As long as the internal query can handle it, I think we would just try to run the query and get nothing back.  Unless there's some problem with requesting _page=12345678, when there's only 1 total record.

👍🏻 I like this approach.

I dug into it a bit more.

1. If the total is known, then the _page automatically gets reduced back to the last page if you pick too big of a number.  It generates an OperationOutcome for it, but just throws it away.

Interesting, did not know we did that....

2. If we did not do this, then the OFFSET used in the internal query would just return no results back, so that looks good.

👍🏻

In FHIRPersistenceJDBCImpl.search(…) the total is used to determine matched vs included, if the _include/_revcinclude parameters are used. So may need to disallow _total in combination with _include/_revinclude.

👍🏻 I totally agree on the disallowed combination and we'd just update our conformance.md

I think there would be a way out needing to do the match vs included calculation at all by adding "includeType" to the ResourceDTO since the underlying query does use a SORT_ORDER specifically for that. If it's worth doing, that is.
It's intriguing. It may be beyond the scope here?

@tbieste
Copy link
Contributor

tbieste commented Mar 11, 2021

Since there's an issue for allowing the currently-disallowed combination of _sort + _include/_revinclude, #1915, I think deferring this as well. Then the combination of _total + _include/_revinclude and be considered a future enhancement. I opened issue #2070 to size and prioritize that enhancement.

tbieste added a commit that referenced this issue Mar 11, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 11, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 11, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 12, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 12, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 12, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 12, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 15, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 15, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Mar 16, 2021
Issue #1353 - Add support for _total search parameter
@punktilious
Copy link
Collaborator Author

punktilious commented Mar 23, 2021

Verified:

?_total=nonsense             error, as expected
?_total=                     error, as expected
?_total=none                 no total
?_total=estimated            total calculated and included in bundle
?_total=accurate             total calculated and included in bundle
?_total=none&_include...     error as expected
?_total=accurate&_include... error, but ought to succeed. Can be addressed in #2070

@tbieste tbieste added the showcase Used to Identify End-of-Sprint Demos label Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bulk-data enhancement New feature or request P2 Priority 2 - Should Have performance performance search showcase Used to Identify End-of-Sprint Demos trial-use
Projects
None yet
Development

No branches or pull requests

4 participants