Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date range searches include parameter table twice #1624

Closed
punktilious opened this issue Oct 26, 2020 · 3 comments
Closed

Date range searches include parameter table twice #1624

punktilious opened this issue Oct 26, 2020 · 3 comments
Assignees
Labels
bug Something isn't working P2 Priority 2 - Should Have performance performance search

Comments

@punktilious
Copy link
Collaborator

Describe the bug
Date range search joins the _date_values table twice to get the date_start and date_end values. It should be joined once. Minor performance impact. If there were multiple matching rows for the given parameter name, this would be a functional defect.

curl -k -i \
-H 'Content-Type: application/json' \
-u '<username>:<password>' 'https://localhost:9443/fhir-server/api/v4/Claim?created=lt2015-10-17&created=gt2015-10-16'
SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID  
  FROM CLAIM_RESOURCES R 
  JOIN (SELECT DISTINCT LR.LOGICAL_RESOURCE_ID, LR.LOGICAL_ID, LR.CURRENT_RESOURCE_ID 
          FROM Claim_LOGICAL_RESOURCES LR  
          JOIN Claim_RESOURCES R 
            ON R.LOGICAL_RESOURCE_ID = LR.LOGICAL_RESOURCE_ID   
           AND R.RESOURCE_ID = LR.CURRENT_RESOURCE_ID   
           AND R.IS_DELETED <> 'Y'  
          JOIN Claim_DATE_VALUES  AS param0 
            ON (param0.PARAMETER_NAME_ID=10041 
           AND ((param0.DATE_START < ?))) 
           AND LR.LOGICAL_RESOURCE_ID = param0.LOGICAL_RESOURCE_ID 
          JOIN Claim_DATE_VALUES  AS param1 
            ON (param1.PARAMETER_NAME_ID=10041 
           AND ((param1.DATE_END > ?))) 
           AND LR.LOGICAL_RESOURCE_ID = param1.LOGICAL_RESOURCE_ID) AS LR  
    ON      R.LOGICAL_RESOURCE_ID = LR.LOGICAL_RESOURCE_ID  
   AND R.RESOURCE_ID = LR.CURRENT_RESOURCE_ID  
   AND R.IS_DELETED <> 'Y' 
ORDER BY RESOURCE_ID ASC  OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

To Reproduce
Enable tracing and run the curl search. Inspect trace.log to see the query.

Expected behavior
The query should be rewritten to join against the date-values table just once.

Additional context
Implementing this is likely to be non-trivial given the current complexity of the query-builder. I would not consider implementing this unless:

  1. This issue is tied to a functional issue or significant performance problem; or
  2. The query builder code undergoes a refactor.
@punktilious punktilious added the bug Something isn't working label Oct 26, 2020
@prb112
Copy link
Contributor

prb112 commented Oct 26, 2020

It'll happen on any parameter table where the code is included in multiples - e.g. AND with itself.

@michaelwschroeder
Copy link
Contributor

If a DATE search parameter is specified more than once on a search query, we will attempt to simplify the search by consolidating the search parameters internally in two ways:

  1. If multiple search parameters are specified that constrain an upper or lower bound of the date, we will attempt to process only the most constraining parameter. For example, given the following search request:
.../Patient?birthdate=le2001-01-01&birthdate=lt2012-06-01

We will process only the birthdate=le2001-01-01 parameter, since it is the most constraining.

  1. If multiple search parameters are specified that define a range, we will generate SQL using a single JOIN to the xx_DATE_VALUES table rather than multiple JOINS. For example, given the following search request:
 .../Patient?birthdate=ge2001-01-01&birthdate=le2012-06-01

Previously, the generated SQL for this would contain 2 JOINS to the xx_DATE_VALUES table (as documented in the comments above), one for the birthdate=ge2001-01-01 parameter and one for the birthdate=le2012-06-01 parameter. Now, the generated SQL will contain a single JOIN to the xx_DATE_VALUES table, like this:

      SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED, R.DATA, LR.LOGICAL_ID 
        FROM (
      SELECT LR0.LOGICAL_RESOURCE_ID, LR0.LOGICAL_ID, LR0.CURRENT_RESOURCE_ID 
        FROM Patient_LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM Patient_LOGICAL_RESOURCES AS LR1 
  INNER JOIN Patient_DATE_VALUES AS P2 ON P2.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P2.PARAMETER_NAME_ID = 1054
         AND (P2.DATE_END > ?
         AND P2.DATE_START < ?)
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)) AS LR 
  INNER JOIN Patient_RESOURCES AS R ON LR.CURRENT_RESOURCE_ID = R.RESOURCE_ID
    ORDER BY LR.LOGICAL_RESOURCE_ID
 FETCH FIRST 10 ROWS ONLY

NOTES:

  1. Functionally, date searches should not change due to these updates
  2. These updates were only made in the new query builder.

michaelwschroeder added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Jul 1, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Jul 1, 2021
Signed-off-by: Mike Schroeder <mschroed@us.ibm.com>
michaelwschroeder added a commit that referenced this issue Jul 1, 2021
Issue #1624 and #1653 - date range search improvements
@prb112
Copy link
Contributor

prb112 commented Jul 4, 2021

Confirmed using a test case that it isn't added twice.

                         
      SELECT COUNT(*) AS CNT 
        FROM Patient_LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1 
        FROM Patient_LOGICAL_RESOURCES AS LR1 
  INNER JOIN Patient_STR_VALUES AS P2 ON P2.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P2.PARAMETER_NAME_ID = 1172
         AND (P2.STR_VALUE_LCASE LIKE ? ESCAPE '+') 
  INNER JOIN Patient_STR_VALUES AS P3 ON P3.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P3.PARAMETER_NAME_ID = 284001
         AND (P3.STR_VALUE_LCASE LIKE ? ESCAPE '+') 
  INNER JOIN Patient_NUMBER_VALUES AS P4 ON P4.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P4.PARAMETER_NAME_ID = 284003
         AND (P4.NUMBER_VALUE_LOW >= ?
         AND P4.NUMBER_VALUE_HIGH <= ?) 
  INNER JOIN Patient_RESOURCE_TOKEN_REFS AS P5 ON P5.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P5.PARAMETER_NAME_ID = 284000
         AND ((P5.COMMON_TOKEN_VALUE_ID IN (172394,178988))) 
  INNER JOIN Patient_STR_VALUES AS P6 ON P6.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P6.PARAMETER_NAME_ID = 284005
         AND (P6.STR_VALUE = ?) 
  INNER JOIN Patient_QUANTITY_VALUES AS P7 ON P7.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P7.PARAMETER_NAME_ID = 284004
         AND ((P7.QUANTITY_VALUE_LOW >= ?
         AND P7.QUANTITY_VALUE_HIGH <= ?)
         AND P7.CODE = ?) 
  INNER JOIN Patient_DATE_VALUES AS P8 ON P8.LOGICAL_RESOURCE_ID = LR1.LOGICAL_RESOURCE_ID
         AND P8.PARAMETER_NAME_ID = 284002
         AND (P8.DATE_START >= ?
         AND P8.DATE_START <= ?
         AND P8.DATE_END <= ?
         AND P8.DATE_END >= ?)
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)

@prb112 prb112 closed this as completed Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 Priority 2 - Should Have performance performance search
Projects
None yet
Development

No branches or pull requests

4 participants