Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeExternal throws NPE #18343

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
private[this] var numNonEmptyBlocks: Int,
private[this] var emptyBlocks: RoaringBitmap,
private[this] var avgSize: Long,
@transient private var hugeBlockSizes: Map[Int, Byte])
private[this] var hugeBlockSizes: Map[Int, Byte])
Copy link
Contributor

@cloud-fan cloud-fan Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do want to serialize hugeBlockSizes, but with customized logic, that why we marked it @transient.

I think the corrected fix is, make this class implements KryoSerializable, and copy the customized serialization logic of hugeBlockSizes to kryo serialization hooks.

Copy link
Member

@viirya viirya Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. However, the customized serialization logic looks similar to kyro's map serializer. So I'm not sure if it's worth duplicating the customized logic for kyro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can figure out a way to make it serializable with kryo and still keep the customized serialization logic for java serializer, I'm ok with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh seems it is now, then LGTM

extends MapStatus with Externalizable {

// loc could be null when the default constructor is called during deserialization
Expand Down
Expand Up @@ -175,6 +175,7 @@ class KryoSerializer(conf: SparkConf)
kryo.register(None.getClass)
kryo.register(Nil.getClass)
kryo.register(Utils.classForName("scala.collection.immutable.$colon$colon"))
kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Map$EmptyMap$?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kryo.register(classOf[ArrayBuffer[Any]])

kryo.setClassLoader(classLoader)
Expand Down