Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reindex performance by skipping resources with no search parameter changes #2155

Closed
punktilious opened this issue Mar 26, 2021 · 6 comments
Assignees
Labels
P2 Priority 2 - Should Have reindex Resolution of issue will require a $reindex during upgrade schema-change a schema change showcase Used to Identify End-of-Sprint Demos

Comments

@punktilious
Copy link
Collaborator

punktilious commented Mar 26, 2021

The $reindex operation reads every resource, extracts the search parameters, deletes the current search parameters from the database and inserts the new set.

This is expensive and unnecessary if the new search parameter set is identical to the search parameters currently stored. However, reading the existing search parameters for comparison would also be expensive and non-trivial. A possible solution is to store a fingerprint (hash) of the parameter set and use this as a comparison. If the stored hash matches the hash computed from the newly extracted parameter set then the delete/insert step can be skipped.

This optimization is also applicable to new versions of a resource.

Care is needed when computing the hash to make sure it is deterministic - meaning that the parameters must be sorted first before the hash is computed.

A hash such as SHA-256 is statistically reliable for such a feature.

A new column is required on the xx_logical_resources table. No data migration is needed because a default value of null is suitable. The next time a reindex is performed, the hash will be computed and stored. Future reindex operations will then benefit. New resources will be stored with the hash and therefore immediately benefit.

@lmsurpre
Copy link
Member

Could be combined with a similar schema-change needed for #2417

@lmsurpre
Copy link
Member

lmsurpre commented May 24, 2021

Need to make sure we consider the impact of global search parameters as well.
Just using the code + value may not be enough in the hash (if we move the target of where they are stored).

Also need to consider search parameter disambiguation...two parameters can have the same code.
In cases like this we recommend use of search parameter filtering in fhir-server-config. Thus, we probably want to use the search parameter url (not just the code) in the hash.

@lmsurpre
Copy link
Member

should be done in conjunction with #1751 to ensure good e2e test coverage.

@lmsurpre lmsurpre added P2 Priority 2 - Should Have reindex Resolution of issue will require a $reindex during upgrade labels May 24, 2021
@lmsurpre lmsurpre added this to the Sprint 2021-08 milestone Jun 1, 2021
@tbieste tbieste self-assigned this Jun 14, 2021
@tbieste
Copy link
Contributor

tbieste commented Jun 14, 2021

After looking over the code involved, here's an overall design:

DB changes:

  • Add a new column (e.g. PARAMETERS_HASH) to the LOGICAL_RESOURCES table (32-byte VARBINARY column or VARCHAR???) for a SHA256 value.
  • Update add_any_resource stored procedure to populate new column.

Class changes:

  • Add an "existing search parameters hash" field to the ResourceIndexRecord class; which is populated from the value of the new column in the LOGICAL_RESOURCES table.
  • Add a toSearchParameterHash method to ExtractedParameterValue. This would get the SHA256 hash of the "searchParameterCode|searchParameterUri|searchParameterVersion|extractedValue", with extractedValue implemented appropriately for each ParameterType (token, composite, string, etc).

Logic changes after calling extractSearchParameters(...):

  • Call toSearchParameterHash on each ExtractedParameterValue to get a list of hashes.
  • Sort the list of hashes using normal string sorting.
  • Hash the toString of the ExtractedParameterValue list to obtain the "new search parameters hash" for the resource.
  • If reindex:
    • Check if the "new search parameters hash" is different than the "existing search parameters hash".
    • If different:
      • Update the search parameters in the DB.
      • Store the "new search parameters hash" in the new column in the LOGICAL_RESOURCES table.
    • If same:
      • Do not update DB.
  • If resource create/update:
    - Store the "new search parameters hash" in the new column in the LOGICAL_RESOURCES table (via the 'add_any_resource' stored procedure per DB type) and store the search parameters in the DB.

tbieste added a commit that referenced this issue Jun 14, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 15, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 15, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 16, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 17, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 17, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 18, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 18, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 18, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 18, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 18, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 21, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
@tbieste tbieste added the showcase Used to Identify End-of-Sprint Demos label Jun 22, 2021
tbieste added a commit that referenced this issue Jun 25, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 25, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 28, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 28, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 28, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 28, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 29, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Signed-off-by: Troy Biesterfeld <tbieste@us.ibm.com>
tbieste added a commit that referenced this issue Jun 30, 2021
Issue #2155 - Use hash to determine if search parameters need update
@lmsurpre
Copy link
Member

lmsurpre commented Jul 7, 2021

When I first ran this I tried with 20 reindex-max-requests (lookup param name) and I hit this:

INSERT INTO common_token_values (token_value, code_system_id)       SELECT v.token_value, v.code_system_id         FROM (VALUES (CAST(? AS VARCHAR(1024)),CAST(? AS INT)), (CAST(? AS VARCHAR(1024)),CAST(? AS INT)) ) AS v(token_value, code_system_id)     ORDER BY v.token_value, v.code_system_id  ON CONFLICT DO NOTHING 
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 125628 waits for ShareLock on transaction 469547549; blocked by process 125650.
Process 125650 waits for ShareLock on transaction 469547553; blocked by process 125653.
Process 125653 waits for ShareLock on transaction 469547548; blocked by process 125649.
Process 125649 waits for ShareLock on transaction 469547542; blocked by process 125641.
Process 125641 waits for ShareLock on transaction 469547565; blocked by process 125628.
  Hint: See server log for query details.
  Where: while inserting index tuple (521996,89) in relation "common_token_values"
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
	at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130)
	at jdk.internal.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.postgresql.ds.PGPooledConnection$StatementHandler.invoke(PGPooledConnection.java:441)
	at com.sun.proxy.$Proxy117.executeUpdate(Unknown Source)
	at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:520)
	at com.ibm.fhir.persistence.jdbc.postgres.PostgresResourceReferenceDAO.doCommonTokenValuesUpsert(PostgresResourceReferenceDAO.java:140)
	at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.upsertCommonTokenValues(ResourceReferenceDAO.java:710)
	at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.persist(ResourceReferenceDAO.java:786)
	at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.addNormalizedValues(ResourceReferenceDAO.java:182)
	at com.ibm.fhir.persistence.jdbc.dao.impl.ParameterVisitorBatchDAO.close(ParameterVisitorBatchDAO.java:626)
	at com.ibm.fhir.persistence.jdbc.dao.ReindexResourceDAO.updateParameters(ReindexResourceDAO.java:354)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.updateParameters(FHIRPersistenceJDBCImpl.java:2637)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.reindex(FHIRPersistenceJDBCImpl.java:2568)
	at com.ibm.fhir.server.util.FHIRRestHelper.doReindex(FHIRRestHelper.java:3083)
	at com.ibm.fhir.operation.reindex.ReindexOperation.doInvoke(ReindexOperation.java:169)
	at com.ibm.fhir.server.operation.spi.AbstractOperation.invoke(AbstractOperation.java:66)
	at com.ibm.fhir.server.util.FHIRRestHelper.doInvoke(FHIRRestHelper.java:1163)
	at com.ibm.fhir.server.resources.Operation.invoke(Operation.java:126)
	at com.ibm.fhir.server.resources.Operation$Proxy$_$$_WeldClientProxy.invoke(Unknown Source)
	at jdk.internal.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.ibm.ws.jaxrs20.cdi.component.JaxRsFactoryImplicitBeanCDICustomizer.serviceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:342)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsServerFactoryBean.performInvocation(LibertyJaxRsServerFactoryBean.java:641)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.performInvocation(LibertyJaxRsInvoker.java:160)
	at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:101)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:273)
	at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:213)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444)
	at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:112)
	at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
	at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
	at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
	at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
	at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:274)
	at com.ibm.ws.jaxrs20.endpoint.AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.handleRequest(IBMRestServlet.java:146)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.doPost(IBMRestServlet.java:104)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:706)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1253)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:746)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
	at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183)
	at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:94)
	at com.ibm.fhir.server.filter.rest.FHIRRestServletFilter.doFilter(FHIRRestServletFilter.java:155)
	at javax.servlet.http.HttpFilter.doFilter(HttpFilter.java:127)
	at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201)
	at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
	at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56)
	at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201)
	at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
	at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:1002)
	at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1140)
	at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1011)
	at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75)
	at com.ibm.ws.webcontainer40.servlet.CacheServletWrapper40.handleRequest(CacheServletWrapper40.java:85)
	at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:938)
	at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1159)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:428)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:387)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:566)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:500)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:360)
	at com.ibm.ws.http.channel.internal.inbound.HttpICLReadCallback.complete(HttpICLReadCallback.java:70)
	at com.ibm.ws.channel.ssl.internal.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1824)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:504)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:574)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:958)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1047)
	at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:836)

I then turned it down to 10 and it worked just fine.

@d0roppe
Copy link
Collaborator

d0roppe commented Jul 8, 2021

verified that if the search parameters did not change that the update of the search parameters is skipped for the update of the tables. Closing issue.

@d0roppe d0roppe closed this as completed Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Priority 2 - Should Have reindex Resolution of issue will require a $reindex during upgrade schema-change a schema change showcase Used to Identify End-of-Sprint Demos
Projects
None yet
Development

No branches or pull requests

4 participants