Please sign in to comment.
revert [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
## What changes were proposed in this pull request? In #19980 , we thought `anyNullsSet` can be simply implemented by `numNulls() > 0`. This is logically true, but may have performance problems. `OrcColumnVector` is an example. It doesn't have the `numNulls` property, only has a `noNulls` property. We will lose a lot of performance if we use `numNulls() > 0` to check null. This PR simply revert #19980, with a renaming to call it `hasNull`. Better name suggestions are welcome, e.g. `nullable`? ## How was this patch tested? existing test Author: Wenchen Fan <email@example.com> Closes #20452 from cloud-fan/null. (cherry picked from commit 48dd6a4) Signed-off-by: Wenchen Fan <firstname.lastname@example.org>
- Loading branch information...
Showing with 44 additions and 3 deletions.
- +5 −0 sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
- +1 −1 sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
- +1 −1 sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
- +6 −1 sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
- +5 −0 sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
- +5 −0 sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java
- +12 −0 sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ArrowColumnVectorSuite.scala
- +9 −0 sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala