Skip to content

[BEAM-7230] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider#9904

Closed
mehdimas wants to merge 1 commit intoapache:masterfrom
mehdimas:fix-jdbc-pool-npe
Closed

[BEAM-7230] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider#9904
mehdimas wants to merge 1 commit intoapache:masterfrom
mehdimas:fix-jdbc-pool-npe

Conversation

@mehdimas
Copy link

This PR attempts to address a null pointer exception caused by the PoolableDataSourceProvider.

When using a simple PoolableDataSourceProvider in the Dataflow Runner I get a null pointer exception at runtime.

JdbcIO.<~>write()
  .withDataSourceProviderFn(
    JdbcIO.PoolableDataSourceProvider.of(
      JdbcIO.DataSourceConfiguration
        .create("org.postgresql.Driver", jdbcUrl)
          .withUsername(jdbcUsername)
          .withPassword(jdbcPassword)
    )
  )

Other users seem to have a similar issue: https://issues.apache.org/jira/browse/BEAM-7230?focusedCommentId=16845769&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16845769.

The stack trace is below.


java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.NullPointerException
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:194)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
        org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
        org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
        org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1232)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.NullPointerException
        org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
        org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn$DoFnInvoker.invokeSetup(Unknown Source)
        org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:80)
        org.apache.beam.runners.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:62)
        org.apache.beam.runners.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:95)
        org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
        org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
        org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
        org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
        org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1232)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        org.apache.beam.sdk.io.jdbc.JdbcIO$PoolableDataSourceProvider.buildDataSource(JdbcIO.java:1363)
        org.apache.beam.sdk.io.jdbc.JdbcIO$PoolableDataSourceProvider.apply(JdbcIO.java:1358)
        org.apache.beam.sdk.io.jdbc.JdbcIO$PoolableDataSourceProvider.apply(JdbcIO.java:1338)
        org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid$WriteFn.setup(JdbcIO.java:1221)

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@mehdimas
Copy link
Author

R: @lukecwik or @kennknowles

@kennknowles kennknowles requested review from aromanenko-dev, iemejia, kennknowles and lukecwik and removed request for aromanenko-dev and iemejia October 28, 2019 16:52
@kennknowles
Copy link
Member

Thanks! Can you add a test that reproduces the NPE?

@kennknowles
Copy link
Member

It would actually be great to either tag this with BEAM-7230 if that is appropriate, or file a new issue just about the NPE. That will make the bug show up in automatically generated release notes, and gather the info for anyone looking into the problem.

static synchronized DataSource buildDataSource(Void input) {
if (source == null) {
DataSource basicSource = dataSourceProviderFn.apply(input);
DataSource basicSource = dataSourceProviderFn.apply(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this is a bit mysterious if it fixes anything, since null is the only constructible value of type Void.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm not convinced this is a solution. Going to do some testing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the existing code seems to work when using the direct runner made me think it was something strange.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change has no impact. I think dataSourceProviderFn is null when buildDataSource(Void input) is called via apply. There is an explicit test for that: https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L184-L190

Copy link
Member

@lukecwik lukecwik Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that PoolableDataSourceProvider stores dataSourceProviderFn as a static which means that it will never be serialized and is lost when ever executed within a runner remotely.

@mehdimas mehdimas changed the title Fixes NPE When Using JdbcIO.PoolableDataSourceProvider [BEAM-8501] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider Oct 28, 2019
@lukecwik lukecwik changed the title [BEAM-8501] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider [BEAM-7230] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider Oct 29, 2019
Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection factory here assumes that only one JdbcIO connector using a PoolableDataSourceProvider and only one using a DataSourceProviderFromDataSourceConfiguration will ever be used within a pipeline since the connection pool will return the "first" DataSource that is initialized since those classes all share the same instance static variable

@iemejia
Copy link
Member

iemejia commented Oct 29, 2019

Just as a minor comment on this, the reason why the impl went into the static approach (that now I see has clear issues) was because of the opposite case the original reported issue happened because JdbcIO was building a pool of connections per DoFn instantiation which was reported to be overwhelming the target db on max number of connections on streaming pipelines.

@lukecwik
Copy link
Member

This is superseded by #9927

@lukecwik lukecwik closed this Oct 29, 2019
@mehdimas mehdimas deleted the fix-jdbc-pool-npe branch October 30, 2019 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants