whole-system-search: use global parameter tables in whole system search join #1922

punktilious · 2021-02-08T18:05:55Z

Related to #264. Once resource ingestion populates the global parameter tables (parameter tables which are tied directly to the global LOGICAL_RESOURCES table, the whole system search query can be rewritten to greatly simplify the join.

The best implementation to retrieve the data payload for each resource type needs to be investigated. Options include a WITH+UNION or perhaps an outer join with a large coalesce statement. To simplify the join as much as possible, it may be necessary to hold the payload at the global layer, which would help to align the design with the ability to offload to the payload to external storage. These options should be evaluated for their impact on performance before any changes are made to the search query build code.

For example, the following structure may perform reasonably well:

        FROM fhirdata.resource_types rt,
             fhirdata.logical_resources lr,
             fhirdata.str_values AS param0
       WHERE rt.resource_type_id = lr.resource_type_id
         AND param0.parameter_name_id = 2582160
         AND param0.str_value = 'abc123'
         AND lr.logical_resource_id = param0.logical_resource_id
     ) 
         SELECT lres.logical_resource_id
           FROM driver
LEFT OUTER JOIN fhirdata.patient_logical_resources AS lres
             ON (lres.logical_resource_id = driver.logical_resource_id
            AND driver.resource_type = 'Patient')
UNION ALL
         SELECT lres.logical_resource_id
           FROM driver
LEFT OUTER JOIN fhirdata.observation_logical_resources AS lres
             ON (lres.logical_resource_id = driver.logical_resource_id
            AND driver.resource_type = 'Observation')
UNION ALL
         SELECT lres.logical_resource_id
           FROM driver
LEFT OUTER JOIN fhirdata.claim_logical_resources AS lres
             ON (lres.logical_resource_id = driver.logical_resource_id
            AND driver.resource_type = 'Claim')
;

Although this form requires the common table expression (CTE) result to be scanned multiple times even though it is evaluated only once. The 'resource_type = 'Claim' predicate in the outer join helps to avoid any logical reads for rows not matching the particular resource type for that sub-query which should keep CPU usage lower.

An alternative is to outer-join each of the resource-specific tables to the main whole-system-search subquery:

SELECT COALESCE(patients.logical_resource_id,
                claims.logical_resource_id,
                observations.logical_resource_id) AS logical_resource_id
  FROM (SELECT rt.resource_type, lr.logical_resource_id
        FROM fhirdata.resource_types rt,
             fhirdata.logical_resources lr,
             fhirdata.str_values AS param0
       WHERE rt.resource_type_id = lr.resource_type_id
         AND param0.parameter_name_id = 2582160
         AND param0.str_value = 'abc123'
         AND lr.logical_resource_id = param0.logical_resource_id
     ) AS driver
LEFT OUTER JOIN fhirdata.patient_logical_resources AS patients
             ON patients.logical_resource_id = driver.logical_resource_id
LEFT OUTER JOIN fhirdata.observation_logical_resources AS observations
             ON observations.logical_resource_id = driver.logical_resource_id
LEFT OUTER JOIN fhirdata.claim_logical_resources AS claims
             ON claims.logical_resource_id = driver.logical_resource_id
;

The downside of this is we make one logical read on every resource-specific logical-resources table for every matching row from the search (the driver sub-query). There are 139 resource-types in the FHIR-R4 model, requiring 139 logical reads per matching row (only one of which will actually match).

The text was updated successfully, but these errors were encountered:

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

Issue #1922 - implement whole-system search in new query builder

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

Issue #1922 - don't run reindex tests during search tests

michaelwschroeder · 2021-06-28T13:34:00Z

There are four different scenarios for whole-system search, each of which will generate different SQL:

_type parameter is not specified, and only global search parameters are specified
_type parameter is specified, and only global search parameters are specified
_type parameter is not specified, and non-global search parameters are specified (in this case would be user-defined whole-system search parameters)
_type parameter is specified, and non-global search parameters are specified

Global search parameters are search parameters defined for all resources, and for which we have system-wide tables in which to store the indexed values. We currently support the following global search parameters:
_id
_lastUpdated
_profile
_security
_source
_tag

For scenarios 1 and 2, where only global search parameters are specified, we'll execute a "filter" query to get the list of logical resource IDs which match the search criteria, and then we'll execute a "data" query to get a page worth of resource data, which will be returned to the caller. Here's a filter query for scenario 1:

      SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID 
        FROM LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM LOGICAL_RESOURCES AS LR1 
  INNER JOIN LOGICAL_RESOURCE_TAGS AS P2 ON P2.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND ((P2.COMMON_TOKEN_VALUE_ID IN (9580,9517)))
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)
    ORDER BY LR0.LOGICAL_RESOURCE_ID
 FETCH FIRST 10 ROWS ONLY

For scenario 2, the filter query is the same as above, only with an additional where clause to select only the resource types specified on the _type parameter:

      SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID 
        FROM LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND LR0.RESOURCE_TYPE_ID IN (104,97)
         AND EXISTS (
      SELECT 1 
        FROM LOGICAL_RESOURCES AS LR1 
  INNER JOIN LOGICAL_RESOURCE_TAGS AS P2 ON P2.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND ((P2.COMMON_TOKEN_VALUE_ID IN (9580,9517)))
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)
    ORDER BY LR0.LOGICAL_RESOURCE_ID
 FETCH FIRST 10 ROWS ONLY

In both scenarios 1 and 2, the data query will be a UNION of type-specific resource tables based on the resource types of the logical resource IDs returned by the filter query. Since we're only getting one page worth of data, this will typically be a small number of resource types. Here's an example:

      SELECT RESOURCE_ID, LOGICAL_RESOURCE_ID, VERSION_ID, LAST_UPDATED, IS_DELETED, DATA, LOGICAL_ID 
        FROM (
      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR.LOGICAL_RESOURCE_ID, LR.LOGICAL_ID, LR.CURRENT_RESOURCE_ID 
        FROM Patient_LOGICAL_RESOURCES AS LR
       WHERE LR.IS_DELETED = 'N'
         AND LR.LOGICAL_RESOURCE_ID IN (4706,4718,4720,4722,4724)) AS LR 
  INNER JOIN Patient_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID
      UNION ALL
      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR.LOGICAL_RESOURCE_ID, LR.LOGICAL_ID, LR.CURRENT_RESOURCE_ID 
        FROM Observation_LOGICAL_RESOURCES AS LR
       WHERE LR.IS_DELETED = 'N'
         AND LR.LOGICAL_RESOURCE_ID IN (1560,1570,1572,1574,1576)) AS LR 
  INNER JOIN Observation_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID) AS COMBINED_RESULTS
    ORDER BY COMBINED_RESULTS.LOGICAL_RESOURCE_ID

For scenarios 3 and 4, we'll build one large UNION'd query to get the data, like we do today with the old query builder. We have to do it that way since the search parameter index values will be in the resource type-specific values tables rather than system-wide values tables. Here is an example query for scenario 4:

      SELECT RESOURCE_ID, LOGICAL_RESOURCE_ID, VERSION_ID, LAST_UPDATED, IS_DELETED, DATA, LOGICAL_ID 
        FROM (
      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR0.LOGICAL_RESOURCE_ID, LR0.LOGICAL_ID, LR0.CURRENT_RESOURCE_ID 
        FROM Condition_LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM Condition_LOGICAL_RESOURCES AS LR1 
  INNER JOIN Condition_RESOURCE_TOKEN_REFS AS P2 ON P2.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P2.PARAMETER_NAME_ID = 1396
         AND (P2.COMMON_TOKEN_VALUE_ID = 30)
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)) AS LR 
  INNER JOIN Condition_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID
      UNION ALL 
      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR0.LOGICAL_RESOURCE_ID, LR0.LOGICAL_ID, LR0.CURRENT_RESOURCE_ID 
        FROM Observation_LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM Observation_LOGICAL_RESOURCES AS LR3 
  INNER JOIN Observation_RESOURCE_TOKEN_REFS AS P4 ON P4.LOGICAL_RESOURCE_ID = LR3.LOGICAL_RESOURCE_ID
         AND P4.PARAMETER_NAME_ID = 1396
         AND (P4.COMMON_TOKEN_VALUE_ID = 30)
       WHERE LR3.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)) AS LR 
  INNER JOIN Observation_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID
      UNION ALL 
      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR0.LOGICAL_RESOURCE_ID, LR0.LOGICAL_ID, LR0.CURRENT_RESOURCE_ID 
        FROM Procedure_LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM Procedure_LOGICAL_RESOURCES AS LR5 
  INNER JOIN Procedure_RESOURCE_TOKEN_REFS AS P6 ON P6.LOGICAL_RESOURCE_ID = LR5.LOGICAL_RESOURCE_ID
         AND P6.PARAMETER_NAME_ID = 1396
         AND (P6.COMMON_TOKEN_VALUE_ID = 30)
       WHERE LR5.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)) AS LR 
  INNER JOIN Procedure_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID) AS COMBINED_RESULTS
    ORDER BY COMBINED_RESULTS.LOGICAL_RESOURCE_ID
 FETCH FIRST 10 ROWS ONLY

For scenario 3, the query would be the same, except there would be a UNION'd sub-query for all supported resource types.

d0roppe · 2021-07-02T13:49:05Z

Ran the search bucket against this latest code as suggested. Some new limitations in the search that used to be supported. Not sure if that was intentional, suspected a bug. Moving this back to in progress.
Search like

[base]/Organization?_id=id-abc-123&address=a

now gets

{
    "resourceType": "OperationOutcome",
    "id": "id-abc-123",
    "issue": [
        {
            "code": "invalid",
            "details": {
                "text": "Search parameter 'address' is not supported by read."
            },
            "expression": [
                "<empty>"
            ],
            "severity": "fatal"
        }
    ]
}

d0roppe · 2021-07-02T17:06:09Z

My mistake I ran an outdated search bucket. I reran the current search bucket where those statements had been updated. and the tests pass now. Closing issue.

prb112 added search persistence performance performance labels Feb 8, 2021

lmsurpre added the P2 Priority 2 - Should Have label Mar 11, 2021

lmsurpre assigned michaelwschroeder Jun 1, 2021

lmsurpre added this to the Sprint 2021-08 milestone Jun 1, 2021

michaelwschroeder added a commit that referenced this issue Jun 9, 2021

Issue #1922 - implement whole-system search in new query builder

adc48d9

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 10, 2021

Issue #1922 - updates for old query builder to work with new tables

f32ffa4

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 14, 2021

Issue #1922 - add union support, implement whole-system with _type

3b81116

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 15, 2021

Issue #1922 - support _tag text and sort searches

f886717

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 16, 2021

Issue #1922 - fix tag value search (:in, :not-in, :above, :below)

ed17cb6

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 18, 2021

Issue #1922 - support _security global tables, address review comments

8165713

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 18, 2021

Issue #1922 - sorting fix

4cd7c94

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - implement whole-system search in new query builder

aab551b

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - updates for old query builder to work with new tables

90265ae

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - add union support, implement whole-system with _type

6e2d0ae

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - support _tag text and sort searches

cc105f3

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - fix tag value search (:in, :not-in, :above, :below)

07a99bb

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - updates after rebasing

21fee8b

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 22, 2021

Issue #1922 - support _source parm, address review comments, add tests

cb3ad44

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - implement whole-system search in new query builder

d5131be

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - updates for old query builder to work with new tables

1b06d61

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - add union support, implement whole-system with _type

2ccfb79

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - support _tag text and sort searches

44bd4b6

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - fix tag value search (:in, :not-in, :above, :below)

00dd116

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - updates after rebasing

19ca8d3

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 23, 2021

Issue #1922 - support _source parm, address review comments, add tests

ff2edf3

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

punktilious added a commit that referenced this issue Jun 23, 2021

Merge pull request #2490 from IBM/issue-1922

5ff0b83

Issue #1922 - implement whole-system search in new query builder

michaelwschroeder added a commit that referenced this issue Jun 24, 2021

Issue #1922 - don't run reindex tests during search tests

39ffcfe

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 24, 2021

Issue #1922 - fix reindex test

ee4e10b

Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>

michaelwschroeder added a commit that referenced this issue Jun 25, 2021

Merge pull request #2552 from IBM/issue-1922b

3cf996c

Issue #1922 - don't run reindex tests during search tests

michaelwschroeder mentioned this issue Jun 28, 2021

poor performance for whole-system search with _lastUpdated #1654

Closed

d0roppe modified the milestones: Sprint 2021-08, Sprint 2021-09 Jul 2, 2021

d0roppe closed this as completed Jul 2, 2021

lmsurpre mentioned this issue Jul 13, 2021

Previously-working search queries should continue to work after migration before a complete reindex #2609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whole-system-search: use global parameter tables in whole system search join #1922

whole-system-search: use global parameter tables in whole system search join #1922

punktilious commented Feb 8, 2021 •

edited

Loading

michaelwschroeder commented Jun 28, 2021

d0roppe commented Jul 2, 2021

d0roppe commented Jul 2, 2021

whole-system-search: use global parameter tables in whole system search join #1922

whole-system-search: use global parameter tables in whole system search join #1922

Comments

punktilious commented Feb 8, 2021 • edited Loading

michaelwschroeder commented Jun 28, 2021

d0roppe commented Jul 2, 2021

d0roppe commented Jul 2, 2021

punktilious commented Feb 8, 2021 •

edited

Loading