Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … #24357

Closed
wants to merge 4 commits into from

Conversation

pengbo
Copy link
Contributor

@pengbo pengbo commented Apr 12, 2019

What changes were proposed in this pull request?

Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

How was this patch tested?

According Units has been added & tested

…breaks when two machines have different Oops size
@pengbo
Copy link
Contributor Author

pengbo commented Apr 12, 2019

@srowen

I have put Unsafe data class in KryoSerializer.newKryo.
I reopen the PR with only 1 commit for further discussion.

Please recheck when available. Thanks

@srowen
Copy link
Member

srowen commented Apr 12, 2019

CC @cloud-fan

*/
final class UnsafeDataUtils {

private UnsafeDataUtils() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use 2-indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out.

import org.apache.spark.sql.catalyst.util.MapData;
import org.apache.spark.unsafe.Platform;

import java.io.Externalizable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks incorrect import order. org.apache should be last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val valueArray = new UnsafeArrayData()
Platform.putLong(baseObject, offset + 8 + keyArray.getSizeInBytes, 1)
valueArray.pointTo(baseObject, offset + 8 + keyArray.getSizeInBytes, keyArraySize)
valueArray.setLong(0, 19285)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to use different values among key and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think that makes any different, can you please explain a little more?
Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hope so. For testing typical use cases, I think that it is good to allocate a separate array for map and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing sth, the key/value array are currently different arrays. The value set is the same (0 -> 19285).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a different value for key and value, not just 19285 both times? that would make a slightly better test. Otherwise you'd possibly miss a weird bug where key and value arrays are swapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks @kiszk @srowen

@cloud-fan
Copy link
Contributor

ok to test

}
byte[] bytes = new byte[sizeInBytes];
Platform.copyMemory(baseObject, baseOffset, bytes, Platform.BYTE_ARRAY_OFFSET,
sizeInBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: probably 2-indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104590 has finished for PR 24357 at commit ca2bf06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pengbo
Copy link
Contributor Author

pengbo commented Apr 16, 2019

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104621 has finished for PR 24357 at commit 357144c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [Spark-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … [SPARK-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … Apr 16, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 54b0d1e Apr 17, 2019
JoshRosen pushed a commit to JoshRosen/spark that referenced this pull request Jul 22, 2019
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Closes apache#24357 from pengbo/SPARK-27416_new.

Authored-by: pengbo <bo.peng1019@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request Jul 22, 2019
…erialization …

This is a Spark 2.4.x backport of #24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes #25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <bo.peng1019@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <bo.peng1019@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <bo.peng1019@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
jaltekruse added a commit to jaltekruse/spark that referenced this pull request Dec 10, 2019
…erialization

This is a Spark 2.3.x backport of a 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Authored-by: pengbo <bo.peng1019@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Signed-off-by: Jason Altekruse <json@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants