-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … #24357
Conversation
…breaks when two machines have different Oops size
I have put Unsafe data class in KryoSerializer.newKryo. Please recheck when available. Thanks |
CC @cloud-fan |
*/ | ||
final class UnsafeDataUtils { | ||
|
||
private UnsafeDataUtils() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use 2-indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out.
import org.apache.spark.sql.catalyst.util.MapData; | ||
import org.apache.spark.unsafe.Platform; | ||
|
||
import java.io.Externalizable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks incorrect import order. org.apache
should be last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeMapSuite.scala
Show resolved
Hide resolved
val valueArray = new UnsafeArrayData() | ||
Platform.putLong(baseObject, offset + 8 + keyArray.getSizeInBytes, 1) | ||
valueArray.pointTo(baseObject, offset + 8 + keyArray.getSizeInBytes, keyArraySize) | ||
valueArray.setLong(0, 19285) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to use different values among key and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't think that makes any different, can you please explain a little more?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hope so. For testing typical use cases, I think that it is good to allocate a separate array for map and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing sth, the key/value array are currently different arrays. The value set is the same (0 -> 19285).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a different value for key and value, not just 19285 both times? that would make a slightly better test. Otherwise you'd possibly miss a weird bug where key and value arrays are swapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to test |
} | ||
byte[] bytes = new byte[sizeInBytes]; | ||
Platform.copyMemory(baseObject, baseOffset, bytes, Platform.BYTE_ARRAY_OFFSET, | ||
sizeInBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: probably 2-indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks
Test build #104590 has finished for PR 24357 at commit
|
retest this please. |
Test build #104621 has finished for PR 24357 at commit
|
thanks, merging to master! |
Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. According Units has been added & tested Closes apache#24357 from pengbo/SPARK-27416_new. Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…erialization … This is a Spark 2.4.x backport of #24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of #24317, #9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes #25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…erialization … This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes apache#25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…erialization … This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes apache#25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…erialization This is a Spark 2.3.x backport of a 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. According Units has been added & tested Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Jason Altekruse <json@apache.org>
What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.
How was this patch tested?
According Units has been added & tested