Please sign in to comment.
[SPARK-23989][SQL] exchange should copy data before non-serialized sh…
…uffle ## What changes were proposed in this pull request? In Spark SQL, we usually reuse the `UnsafeRow` instance and need to copy the data when a place buffers non-serialized objects. Shuffle may buffer objects if we don't make it to the bypass merge shuffle or unsafe shuffle. `ShuffleExchangeExec.needToCopyObjectsBeforeShuffle` misses the case that, if `spark.sql.shuffle.partitions` is large enough, we could fail to run unsafe shuffle and go with the non-serialized shuffle. This bug is very hard to hit since users wouldn't set such a large number of partitions(16 million) for Spark SQL exchange. TODO: test ## How was this patch tested? todo. Author: Wenchen Fan <firstname.lastname@example.org> Closes #21101 from cloud-fan/shuffle. (cherry picked from commit 6e19f76) Signed-off-by: Herman van Hovell <email@example.com>
- Loading branch information...
Showing with 10 additions and 11 deletions.