Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast export impl 'txTimeoutMillis' should be set relative to the configured transaction timeout #2150

Closed
lmsurpre opened this issue Mar 25, 2021 · 4 comments
Assignees
Labels
bug Something isn't working P2 Priority 2 - Should Have

Comments

@lmsurpre
Copy link
Member

lmsurpre commented Mar 25, 2021

Describe the bug
The fast export impl has a hardcoded "read window" of 90 seconds:

    // Assume for now we have a transaction timeout of 120s, but need to give the upload some time to work
    // TODO configuration
    private long txTimeoutMillis = 90000L;

If users have overridden the transaction timeout, this could result in failure.

To Reproduce
Steps to reproduce the behavior:

  1. set the FHIR_TRANSACTION_MANAGER_TIMEOUT environment variable to something less than 90s (maybe 10s)
  2. perform a system export

Expected behavior
the system export succeeds

Alternative idea
Check if it's possible to override the configured transaction timeout value (programatically or via bulk datasource config).
If so, we could set it to 120s for just the export transaction (without affecting anything else).

Additional context
Perhaps related to #2149

@prb112
Copy link
Contributor

prb112 commented Mar 25, 2021

@lmsurpre I added a configuration element for this, I just didn't plug it in.

@prb112 prb112 added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Mar 25, 2021
@prb112
Copy link
Contributor

prb112 commented Mar 26, 2021

I spent some time looking at this...

One, we could read from the variable FHIR_TRANSACTION_MANGER_TIMEOUT, however that's not a guarantee to be a number.

We could interpolate the timeout using an SPI, I played with this a bit, it's not super clean, and requires reflections.

Some alternatives that were mentioned on StackOverflow:

@lmsurpre lmsurpre added the P2 Priority 2 - Should Have label Mar 29, 2021
@lmsurpre lmsurpre added this to the Sprint 2021-05 milestone Mar 29, 2021
@lmsurpre
Copy link
Member Author

I think we should just go with the user-configured approach and document that it must be less than the configured transaction timeout.
Also, I think we should remove the name timeout from the variable name. Its not really a timeout...its just the point in time at which we'll stop fetching resources for a given "chunk".
Maybe maxChunkReadTime or some such?

prb112 added a commit that referenced this issue Mar 30, 2021
Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
prb112 added a commit that referenced this issue Mar 30, 2021
configure the maxChunkReadTime for Fast Export and configurable timeout for presigned URL #2150
@lmsurpre
Copy link
Member Author

lmsurpre commented Mar 31, 2021

I set maxChunkReadTime to "15000" (15 seconds) and set FHIR_TRANSACTION_MANGER_TIMEOUT to "20s" and confirmed it worked as expected (based on the following output).

2021-03-31 14:27:53.140 INFO bulkexportfastjob[203] processed 190804 resources in 15.02 seconds (rate=12701.1 resources/second)
...
2021-03-31 14:32:05.778 INFO CWWKO1660I: The application fhir-openapi was processed and an OpenAPI document was produced.
2021-03-31 14:42:44.885 INFO bulkexportfastjob[204] processed 197040 resources in 15.34 seconds (rate=12846.6 resources/second)
2021-03-31 14:42:56.607 INFO bulkexportfastjob[204] processed 163336 resources in 11.60 seconds (rate=14082.4 resources/second)

I also purposefully misconfigured the system by keeping maxChunkReadTime at 15s and setting FHIR_TRANSACTION_MANGER_TIMEOUT to "15s" and confirmed we get a reasonable error message:

2021-03-31 14:46:41.044 INFO bulkexportfastjob[205] processed 229439 resources in 15.00 seconds (rate=15295.8 resources/second)
2021-03-31 14:46:41.049 INFO WTRN0006W: Transaction 00000178899BB3FA00000001266B2CDC1F971F9CB9F06CE4485563EF55F0FB11EEE4E65A00000178899BB3FA00000001266B2CDC1F971F9CB9F06CE4485563EF55F0FB11EEE4E65A00000001 has timed out after 15 seconds.
2021-03-31 14:46:41.053 INFO WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[Default Executor-thread-7,5,Default Executor Thread Group]. The stack trace of this thread when the timeout occurred was: 
	sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	java.lang.reflect.Method.invoke(Method.java:498)
	java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1244)
	java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
	java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	com.ibm.jbatch.container.persistence.CheckpointData.getBytesForDB(CheckpointData.java:92)
	com.ibm.jbatch.container.persistence.jpa.StepThreadInstanceEntity.setCheckpointData(StepThreadInstanceEntity.java:137)
	com.ibm.jbatch.container.persistence.CheckpointManager.checkpoint(CheckpointManager.java:104)
	com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeChunk(ChunkStepControllerImpl.java:648)
	com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeCoreStep(ChunkStepControllerImpl.java:795)
	com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:295)
	com.ibm.jbatch.container.controller.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:118)
	com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeCoreTransitionLoop(WorkUnitThreadControllerImpl.java:96)
	com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeWorkUnit(WorkUnitThreadControllerImpl.java:178)
	com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl$AbstractControllerHelper.runExecutionOnThread(WorkUnitThreadControllerImpl.java:503)
	com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.runExecutionOnThread(WorkUnitThreadControllerImpl.java:92)
	com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:113)
	com.ibm.ws.context.service.serializable.ContextualRunnable.run(ContextualRunnable.java:79)
	com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	java.util.concurrent.FutureTask.run(FutureTask.java:266)
	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	java.lang.Thread.run(Thread.java:823)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 Priority 2 - Should Have
Projects
None yet
Development

No branches or pull requests

2 participants