[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join #20312

asfimport · 2022-07-07T16:11:49Z

Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces incorrect results when wrapping the postgres driver in certain cases.

If you left join a non-nullable column, the column becomes nullable (if the join condition does not match any columns). However the postgres ResultSetMetaData lies to you and still indicates that the column is still non-nullable.

When iterating through the data, results come back as null (isNull will return true).

However, because of the way that the JDBCConsumer is created, it creates a non-nullable consumer and will not check the nullability of these results.

Unfortunately, this results in incorrect data or errors depending on the data types returned.

The postgres JDBC team has closed a ticket about this indicating that it would be impossible for them to return the correct data nullability data to the JDBC driver. see: pgjdbc/pgjdbc#2079

An example:

Table:

|t1.id|
|
|-|-|
|2|
|
|3|

java CREATE TABLE t1 (id integer NOT NULL); INSERT INTO t1 VALUES (2), (3); 

Query

java WITH t2 AS (SELECT 1 AS id UNION SELECT 2) SELECT t1.id FROM t2 LEFT JOIN t1 on t1.id = t2.id; 

This returns the result set:
|
|id|
|
|2|
|
|null|

The ResultSetMetaData indicates that the column is non-nullable (as t1.id is non-nullable) but there is null data in the result.

The Arrow Vector that is present after the result set is consumed, looks like this:
|
|id|
|
|2|
|
|0|

ResultSet.getInt(1) will return 0 when the source data is null, with an expectation that you check isNull.

The data is incorrect and silently fails potentially leading to clients / consumers getting bad data.

In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to load into arrow due to expecting null data and throwing a NPE when deserializing / converting to bytearrays.

I was able to work around this problem by wrapping the postgres JDBC ResultSetMetadata and always forcing the nullability to nullable (or nullability unknown).

Unfortunately I don't think there is a great way to solve this, but perhaps some way to configure / override the JDBCConsumer creation would allow for users of this library to override this behavior, however the silent failure and incorrect data might lead to users not noticing. |

Reporter: Jonathan Swenson
Assignee: David Li / @lidavidm

PRs and other links:

GitHub Pull Request #13558

_{Note: This issue was originally created as ARROW-17005. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2022-07-07T17:04:11Z

David Li / @lidavidm:
There's already JdbcFieldInfo, maybe we should add nullability there.

I was also thinking we should extend it to let us do things like choosing the output arrow type independently of the actual type/providing a custom conversion for a custom type but that will be a separate Jira (if that is a useful thing to have)

Also cc @lwhite1

asfimport · 2022-07-07T17:13:03Z

Jonathan Swenson:
It's possible to override arrow output type currently (using setJdbcToArrowTypeConverter) but it seems like what you are describing would be a generalization that would allow for writing a custom JDBCConsumer (or am I misinterpreting "custom conversion for a custom type")

I think that would also generalize some of the work that @toddfarmer has done to help with the BigDecimal issues I've run into:

https://issues.apache.org/jira/browse/ARROW-16600

https://issues.apache.org/jira/browse/ARROW-16427

The postgres driver has really been a bit of a pain for me 🤦🏼‍♂️

asfimport · 2022-07-07T17:22:28Z

David Li / @lidavidm:
Ah, yes, basically I would like to be able to make this fully general. Possibly even be able to override per-column instead of per-type (not sure if that granularity is useful)

asfimport · 2022-07-07T17:22:47Z

David Li / @lidavidm:
Also gosh, I keep pinging the wrong person, thanks for linking the related Jiras

asfimport · 2022-07-07T20:05:34Z

Larry White / @lwhite1:
I think forcing the result set to be nullable (or nullability unknown) is a reasonable workaround in this situation, or at least less-bad than returning the incorrect nullability status.

asfimport · 2022-07-07T21:00:33Z

David Li / @lidavidm:
I'm rooting around here now so I'll pick this up

asfimport · 2022-07-11T13:01:14Z

David Li / @lidavidm:
[~jswenson] the linked PR lets you explicitly set nullability now when configuring things; hopefully this resolves your use case.

Another thought: should we set up integration tests with the Postgres driver, and possibly other drivers? It'll require some CI work…

asfimport · 2022-07-12T16:55:49Z

Jonathan Swenson:
Excellent, I'll play with removing my workaround and moving to this in the near future.

On another note, the Redshift JDBC driver also seems to exhibit this same problem (likely because it is very close to the postgres JDBC driver / implementation).

asfimport · 2022-07-12T21:01:41Z

David Li / @lidavidm:
Issue resolved by pull request 13558
#13558

asfimport closed this as completed Jul 12, 2022

asfimport assigned lidavidm Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join #20312

[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join #20312

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 11, 2022

asfimport commented Jul 12, 2022

asfimport commented Jul 12, 2022

[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join #20312

[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join #20312

Comments

asfimport commented Jul 7, 2022

PRs and other links:

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 7, 2022

asfimport commented Jul 11, 2022

asfimport commented Jul 12, 2022

asfimport commented Jul 12, 2022