Skip to content

[SPARK-47628][SQL] Fix Postgres bit array issue 'Cannot cast to boolean'#45751

Closed
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-47628
Closed

[SPARK-47628][SQL] Fix Postgres bit array issue 'Cannot cast to boolean'#45751
yaooqinn wants to merge 2 commits intoapache:masterfrom
yaooqinn:SPARK-47628

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR fixes the below error when reading the bit array from Postgres.

[info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: "10101"
[info]   at org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
[info]   at org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
[info]   at org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
[info]   at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
[info]   at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
[info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
[info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
[info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
[info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
[info]   at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 

The issue is caused by both an upstream limitation and an improper mapping on our side.

The issue of Postges' own is that it does not distinguish bit(1) and bit(n>1) arrays and gets them both as boolean arrays, which causes a cast error on our task execution side.

The issue of our own is similar. We map both bit(1)[] and bit(n>1)[] as ArrayType(BinaryType). It is exactly the opposite of Postgres' behaviour.

This PR fixes the mapping and makes a special getter for bit(n>1)[] values to fix both of the problems

Why are the changes needed?

bugfix

Does this PR introduce any user-facing change?

no

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Mar 28, 2024
@yaooqinn
Copy link
Copy Markdown
Member Author

yaooqinn commented Mar 28, 2024

cc @dongjoon-hyun @cloud-fan, thanks

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @yaooqinn .
Merged to master for Apache Spark 4.0.0.

md: MetadataBuilder): Option[DataType] = typeName match {
case "bool" => Some(BooleanType)
case "bit" => Some(BinaryType)
case "bit" if precision == 1 => Some(BooleanType)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to keep the previous type mapping and only fix the read code? It seems a bit weird to me to have different type mapping based on the bit length.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any other concerns, @cloud-fan ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aligns atomic bit(1), see ...

I see, internal consistency is important.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the double-check and post-review. @cloud-fan

@yaooqinn yaooqinn deleted the SPARK-47628 branch March 27, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants