[BEAM-7230] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider#9904
[BEAM-7230] Fixes NPE When Using JdbcIO.PoolableDataSourceProvider#9904mehdimas wants to merge 1 commit intoapache:masterfrom
Conversation
|
R: @lukecwik or @kennknowles |
|
Thanks! Can you add a test that reproduces the NPE? |
|
It would actually be great to either tag this with BEAM-7230 if that is appropriate, or file a new issue just about the NPE. That will make the bug show up in automatically generated release notes, and gather the info for anyone looking into the problem. |
| static synchronized DataSource buildDataSource(Void input) { | ||
| if (source == null) { | ||
| DataSource basicSource = dataSourceProviderFn.apply(input); | ||
| DataSource basicSource = dataSourceProviderFn.apply(null); |
There was a problem hiding this comment.
TBH this is a bit mysterious if it fixes anything, since null is the only constructible value of type Void.
There was a problem hiding this comment.
Agreed. I'm not convinced this is a solution. Going to do some testing.
There was a problem hiding this comment.
The fact that the existing code seems to work when using the direct runner made me think it was something strange.
There was a problem hiding this comment.
Looks like this change has no impact. I think dataSourceProviderFn is null when buildDataSource(Void input) is called via apply. There is an explicit test for that: https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L184-L190
There was a problem hiding this comment.
The issue is that PoolableDataSourceProvider stores dataSourceProviderFn as a static which means that it will never be serialized and is lost when ever executed within a runner remotely.
There was a problem hiding this comment.
The connection factory here assumes that only one JdbcIO connector using a PoolableDataSourceProvider and only one using a DataSourceProviderFromDataSourceConfiguration will ever be used within a pipeline since the connection pool will return the "first" DataSource that is initialized since those classes all share the same instance static variable
|
Just as a minor comment on this, the reason why the impl went into the static approach (that now I see has clear issues) was because of the opposite case the original reported issue happened because JdbcIO was building a pool of connections per DoFn instantiation which was reported to be overwhelming the target db on max number of connections on streaming pipelines. |
|
This is superseded by #9927 |
This PR attempts to address a null pointer exception caused by the PoolableDataSourceProvider.
When using a simple PoolableDataSourceProvider in the Dataflow Runner I get a null pointer exception at runtime.
Other users seem to have a similar issue: https://issues.apache.org/jira/browse/BEAM-7230?focusedCommentId=16845769&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16845769.
The stack trace is below.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.