Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under certain conditions, bulk import fails to acquire a db connection #2077

Closed
lmsurpre opened this issue Mar 12, 2021 · 1 comment
Closed
Assignees
Labels
bug Something isn't working bulk-data

Comments

@lmsurpre
Copy link
Member

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
I've seen it twice in a row by roughly following these steps, but I'm not sure if its always repeatable or not.

  1. dropin a datasource-postgres.xml similar to the following (in my case it points to an IBM Cloud databases for PostgreSQL instance):
    <!-- ============================================================== -->
    <!-- TENANT: default; DSID: default; TYPE: read-write               -->
    <!-- ============================================================== -->
    <dataSource id="fhirDefaultDefault" jndiName="jdbc/fhir_default_default" type="javax.sql.XADataSource" statementCacheSize="200" syncQueryTimeoutWithTransactionTimeout="true">
        <jdbcDriver javax.sql.XADataSource="org.postgresql.xa.PGXADataSource" libraryRef="sharedLibPostgres"/>
        <properties.postgresql
             serverName="${env.FHIR_DB_HOSTNAME}"
             portNumber="${env.FHIR_DB_PORT}"
             ssl="true"
             sslmode="verify-full"
             sslrootcert="/opt/ol/wlp/usr/servers/fhir-server/resources/security/postgres.cert"
             databaseName="${env.FHIR_DB_NAME}"
             user="${env.FHIR_DB_USERNAME}"
             password="${env.FHIR_DB_PASSWORD}"
             currentSchema="${env.FHIR_DB_SCHEMA}"
         />
        <connectionManager maxPoolSize="200" minPoolSize="40"/>
    </dataSource>

    <dataSource id="fhirbatchDS" jndiName="jdbc/fhirbatchDB" type="javax.sql.XADataSource" statementCacheSize="200" syncQueryTimeoutWithTransactionTimeout="true">
        <jdbcDriver javax.sql.XADataSource="org.postgresql.xa.PGXADataSource" libraryRef="sharedLibPostgres"/>
        <properties.postgresql
            serverName="${env.BATCH_DB_HOSTNAME}"
            portNumber="${env.BATCH_DB_PORT}"
            ssl="true"
            sslmode="verify-full"
            sslrootcert="/opt/ol/wlp/usr/servers/fhir-server/resources/security/postgres.cert"
            databaseName="${env.BATCH_DB_NAME}"
            user="${env.BATCH_DB_USER}"
            password="${env.BATCH_DB_PASSWORD}" />
    </dataSource>
</server>
  1. Perform a bulk import (it succeeds)
  2. Wait for a long time (in my case, a couple hours)
  3. Perform another bulk import

It fails with an error like this:

2021-03-11 23:03:31.625 fhir-internal INFO Completed request[0.115 secs]: tenantId:[default] dsId:[default] user:[fhiruser] method:[GET] uri:[https://****/fhir-server/api/v4/$bulkdata-status?job=sZ9W5bhJ1_n5_Z7xqumBLA] originalUri:[https://****/basic/$bulkdata-status?job=sZ9W5bhJ1_n5_Z7xqumBLA] status:[200]
2021-03-12 02:47:31.048 fhir-internal INFO ObjectStorge Object -'zAaECq2rdj6sQJVx3TYN_CpkJvs3sjz5HBGywcRIP2M=/Condition_1.ndjson' - '40545858' bytes.
2021-03-12 02:47:32.701 fhir-internal AUDIT J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adapter for resource jdbc/fhir_default_default. The exception is: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.:javax.net.ssl.SSLException: Connection reset:java.net.SocketException: Connection reset
2021-03-12 02:47:32.707 fhir-internal SEVERE Failure retrieving all Resource type names.
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:114)
	at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.postgresql.ds.PGPooledConnection$StatementHandler.invoke(PGPooledConnection.java:441)
	at com.sun.proxy.$Proxy106.executeQuery(Unknown Source)
	at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeQuery(WSJdbcPreparedStatement.java:475)
	at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceDAOImpl.readAllResourceTypeNames(ResourceDAOImpl.java:376)
	at com.ibm.fhir.persistence.jdbc.cache.FHIRPersistenceJDBCCacheUtil.prefill(FHIRPersistenceJDBCCacheUtil.java:37)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.doCachePrefill(FHIRPersistenceJDBCImpl.java:1862)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.makeResourceDAO(FHIRPersistenceJDBCImpl.java:443)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.update(FHIRPersistenceJDBCImpl.java:503)
	at com.ibm.fhir.bulkdata.jbatch.load.ChunkWriter.writeItems(ChunkWriter.java:194)
	at com.ibm.jbatch.container.artifact.proxy.ItemWriterProxy.writeItems(ItemWriterProxy.java:80)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.writeChunk(ChunkStepControllerImpl.java:481)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeChunk(ChunkStepControllerImpl.java:636)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeCoreStep(ChunkStepControllerImpl.java:795)
	at com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:295)
	at com.ibm.jbatch.container.controller.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:118)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeCoreTransitionLoop(WorkUnitThreadControllerImpl.java:96)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeWorkUnit(WorkUnitThreadControllerImpl.java:178)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl$AbstractControllerHelper.runExecutionOnThread(WorkUnitThreadControllerImpl.java:503)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.runExecutionOnThread(WorkUnitThreadControllerImpl.java:92)
	at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:113)
	at com.ibm.ws.context.service.serializable.ContextualRunnable.run(ContextualRunnable.java:79)
	at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:239)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:823)
Caused by: javax.net.ssl.SSLException: Connection reset
	at sun.security.ssl.Alert.createSSLException(Alert.java:127)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
	at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1303)
	at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:72)
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:831)
	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
	at org.postgresql.core.PGStream.receiveChar(PGStream.java:441)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2057)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
	... 34 more
	Suppressed: java.net.SocketException: Broken pipe (Write failed)
		at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
		at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
		at sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81)
		at sun.security.ssl.TransportContext.fatal(TransportContext.java:355)
		... 46 more
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:457)
	at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
	at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1095)
	at sun.security.ssl.SSLSocketImpl.access$200(SSLSocketImpl.java:72)
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:815)
	... 41 more
2021-03-12 02:47:32.707 fhir-internal WARNING Failed to import '17598bed126-6cf5cdc4-1106-4385-93e1-e9bd09d7e993' due to error: Failure retrieving all Resource type names.
2021-03-12 02:47:32.711 fhir-internal INFO FFDC1015I: An FFDC Incident has been created: "org.postgresql.util.PSQLException: Connection has been closed. com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection 280" at ffdc_21.03.12_02.47.32.0.log
2021-03-12 02:47:32.713 fhir-internal Connection has been closed.
2021-03-12 02:47:32.715 fhir-internal SEVERE Failure acquiring connection for datasource: jdbc/fhir_default_default [probeId=ac-11-20-72-08502bc3-9a2e-4ffb-a85a-621f3f094795]
org.postgresql.util.PSQLException: Connection has been closed.
	at org.postgresql.ds.PGPooledConnection$ConnectionHandler.invoke(PGPooledConnection.java:326)
	at com.sun.proxy.$Proxy76.clearWarnings(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.postgresql.xa.PGXAConnection$ConnectionHandler.invoke(PGXAConnection.java:153)
	at com.sun.proxy.$Proxy76.clearWarnings(Unknown Source)
	at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.initialize(WSJdbcConnection.java:242)
	at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:153)
	at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:111)
	at com.ibm.fhir.persistence.jdbc.connection.FHIRDbConnectionStrategyBase.getConnection(FHIRDbConnectionStrategyBase.java:132)
	at com.ibm.fhir.persistence.jdbc.connection.FHIRDbTenantDatasourceConnectionStrategy.getConnection(FHIRDbTenantDatasourceConnectionStrategy.java:168)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.openConnection(FHIRPersistenceJDBCImpl.java:1634)
	at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.update(FHIRPersistenceJDBCImpl.java:502)
	at com.ibm.fhir.bulkdata.jbatch.load.ChunkWriter.writeItems(ChunkWriter.java:194)
	at com.ibm.jbatch.container.artifact.proxy.ItemWriterProxy.writeItems(ItemWriterProxy.java:80)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.writeChunk(ChunkStepControllerImpl.java:481)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeChunk(ChunkStepControllerImpl.java:636)
	at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeCoreStep(ChunkStepControllerImpl.java:795)
	at com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:295)
	at com.ibm.jbatch.container.controller.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:118)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeCoreTransitionLoop(WorkUnitThreadControllerImpl.java:96)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeWorkUnit(WorkUnitThreadControllerImpl.java:178)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl$AbstractControllerHelper.runExecutionOnThread(WorkUnitThreadControllerImpl.java:503)
	at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.runExecutionOnThread(WorkUnitThreadControllerImpl.java:92)
	at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:113)
	at com.ibm.ws.context.service.serializable.ContextualRunnable.run(ContextualRunnable.java:79)
	at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:239)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:823)
2021-03-12 02:47:32.716 fhir-internal WARNING Failed to import '17598bed127-ef12e66b-0216-4d6e-89ee-451e2c561e27' due to error: Failure acquiring connection for datasource

Expected behavior
It succeeds.

Additional context
I also have a $healthceck call that runs every 10 seconds or so.

@lmsurpre lmsurpre added the bug Something isn't working label Mar 12, 2021
@lmsurpre lmsurpre changed the title Under certain conditions Under certain conditions, bulk import fails to acquire a db connection Mar 12, 2021
@lmsurpre lmsurpre self-assigned this Mar 12, 2021
@lmsurpre lmsurpre added this to the Sprint 2021-04 milestone Mar 12, 2021
lmsurpre added a commit that referenced this issue Mar 12, 2021
We used to have this attribute in our fhirProxyDatasource configuration, but we didn't carry it forward.
After adding it to my deploy, I followed the steps to reproduce the issue described in #2077 and was no longer able to produce it.

From the Liberty docs, here is what this setting does:
> Specifies a timeout for validation of pooled connections. When specified, pooled connections are validated before being reused from the connection pool. The validation timeout is also used when the connection manager validates connections in response to a stale connection for PurgePolicy=ValidateAllConnections. A value of 0 means that connections are validated without applying any timeout. Validation timeout requires a JDBC driver that complies with the JDBC 4.0 specification or higher. Specify a positive integer followed by a unit of time, which can be hours (h), minutes (m), or seconds (s). For example, specify 30 seconds as 30s. You can include multiple values in a single entry. For example, 1m30s is equivalent to 90 seconds.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Mar 12, 2021
We used to have this attribute in our fhirProxyDatasource configuration, but we didn't carry it forward.
After adding it to my deploy, I followed the steps to reproduce the issue described in #2077 and was no longer able to produce it.

From the Liberty docs, here is what this setting does:
> Specifies a timeout for validation of pooled connections. When specified, pooled connections are validated before being reused from the connection pool. The validation timeout is also used when the connection manager validates connections in response to a stale connection for PurgePolicy=ValidateAllConnections. A value of 0 means that connections are validated without applying any timeout. Validation timeout requires a JDBC driver that complies with the JDBC 4.0 specification or higher. Specify a positive integer followed by a unit of time, which can be hours (h), minutes (m), or seconds (s). For example, specify 30 seconds as 30s. You can include multiple values in a single entry. For example, 1m30s is equivalent to 90 seconds.

Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre added a commit that referenced this issue Mar 15, 2021
issue #2077 - add validationTimeout to datasource dropins
@prb112 prb112 closed this as completed Mar 17, 2021
@prb112
Copy link
Contributor

prb112 commented Mar 17, 2021

I created a number of exports and imports. I waited over night. I did not hit an issue when executing again. I have tested with Db2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working bulk-data
Projects
None yet
Development

No branches or pull requests

2 participants