Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12591][Streaming]Register OpenHashMapBasedStateMap for Kryo #10609

Closed
wants to merge 7 commits into from
Closed

[SPARK-12591][Streaming]Register OpenHashMapBasedStateMap for Kryo #10609

wants to merge 7 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jan 5, 2016

The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls writeObject or readObject. So we should register OpenHashMapBasedStateMap using @DefaultSerializer to make it work with Kryo.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48798 has finished for PR 10609 at commit 7466a50.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48808 has finished for PR 10609 at commit 0228eef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48843 has finished for PR 10609 at commit d587f0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 6, 2016

CC @tdas

// Add classes for Streaming
try {
kryo.register(
Utils.classForName("org.apache.spark.streaming.util.OpenHashMapBasedStateMap"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tricky thing here.

Kryo assigns an unique id to each registered class and only writes its id, so the register order of classes should be same. Otherwise, the ids won't be matched and deserialization will fail.

However, for tests that start a local cluster, their executors have OpenHashMapBasedStateMap but the driver doesn't. So I added OpenHashMapBasedStateMap at the last class to make sure other classes's ids are same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add an internal API to let other projects register their classes to KryoSerializer.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48884 has finished for PR 10609 at commit bf5632e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

@tdas updated as we discussed offline.

Utils.serialize(map), Thread.currentThread().getContextClassLoader)
assertMap(deserMap, map, 1, msg)
deserMap
val deserMaps = Array(new JavaSerializer(conf), new KryoSerializer(conf)).map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to read if this is made a function and called with two different serializers.

import org.apache.spark.streaming.util.OpenHashMapBasedStateMap._
import org.apache.spark.util.collection.OpenHashMap

/** Internal interface for defining the map that keeps track of sessions. */
private[streaming] abstract class StateMap[K: ClassTag, S: ClassTag] extends Serializable {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed ClassTag because EmptyStateMap doesn't need it. If removing them, we don't need to add any codes for EmptyStateMap since it doesn't contain any field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48958 has finished for PR 10609 at commit a65ab45.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48960 has finished for PR 10609 at commit a65ab45.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

private[serializer] class KryoOutputDataOutputBridge(output: KryoOutput) extends DataOutput {
private[spark] class KryoOutputObjectOutputBridge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put some docs on this class to explain what this does? Same for the above class.

@tdas
Copy link
Contributor

tdas commented Jan 7, 2016

LGTM, except one minor comment.

@@ -426,6 +439,7 @@ private[serializer] class KryoOutputDataOutputBridge(output: KryoOutput) extends
override def writeChar(v: Int): Unit = output.writeChar(v.toChar)
override def writeLong(v: Long): Unit = output.writeLong(v)
override def writeByte(v: Int): Unit = output.writeByte(v)
override def writeObject(obj: AnyRef): Unit = kryo.writeClassAndObject(output, obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a new unit test in the KryoSerializerSuite to test this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48969 has finished for PR 10609 at commit 4e4e9a1.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

retest this please

1 similar comment
@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48963 has finished for PR 10609 at commit bf0892c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48967 has finished for PR 10609 at commit 33368be.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jan 7, 2016

retest this please

@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

By the way, I will send another PR for branch 1.6 due to the conflicts of MimaExcludes.scala.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 7, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Jan 8, 2016

Test build #48979 has finished for PR 10609 at commit 4e4e9a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jan 8, 2016

LGTM. Merging this to master. Please send another PR for 1.6 ASAP. Thanks for catching and fixing this bug.

@asfgit asfgit closed this in 28e0e50 Jan 8, 2016
@zsxwing zsxwing deleted the SPARK-12591 branch January 8, 2016 03:09
asfgit pushed a commit that referenced this pull request Jan 8, 2016
…branch 1.6)

backport #10609 to branch 1.6

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10656 from zsxwing/SPARK-12591-branch-1.6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants