New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6368][SQL] Build a specialized serializer for Exchange operator. #5497

Closed
wants to merge 14 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 13, 2015

Test build #30194 has finished for PR 5497 at commit 39704ab.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

SparkQA commented Apr 13, 2015

Test build #30194 has finished for PR 5497 at commit 39704ab.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 13, 2015

Test build #30196 has finished for PR 5497 at commit 2379eeb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

SparkQA commented Apr 13, 2015

Test build #30196 has finished for PR 5497 at commit 2379eeb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
val key = if (keySchema != null) new SpecificMutableRow(keySchema) else null
val value = if (valueSchema != null) new SpecificMutableRow(valueSchema) else null
val readKey = SparkSqlSerializer2.createDeserializationFunction(keySchema, rowIn, key)

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

readKey should always be () =>{} if the keySchema is null? The same for readValue?

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

readKey should always be () =>{} if the keySchema is null? The same for readValue?

This comment has been minimized.

@yhuai

yhuai Apr 13, 2015

Contributor

If the schema is null, we just have a function that does nothing. Is it what you were asking for?

@yhuai

yhuai Apr 13, 2015

Contributor

If the schema is null, we just have a function that does nothing. Is it what you were asking for?

val writeKey = SparkSqlSerializer2.createSerializationFunction(keySchema, rowOut)
val writeValue = SparkSqlSerializer2.createSerializationFunction(valueSchema, rowOut)
def writeObject[T: ClassTag](t: T): SerializationStream = {

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Can we use the Product2[Row, Row] instead of T here?

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Can we use the Product2[Row, Row] instead of T here?

This comment has been minimized.

@yhuai

yhuai Apr 13, 2015

Contributor

Seems we cannot change it since the SerializationStream defines the interface like this.

@yhuai

yhuai Apr 13, 2015

Contributor

Seems we cannot change it since the SerializationStream defines the interface like this.

@chenghao-intel

View changes

Show outdated Hide outdated .../src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala
case array: ArrayType => return false
case map: MapType => return false
case struct: StructType => return false
case decimal: DecimalType => return false

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

If all of the primitive types are supported in this PR, probably will be better for fully utilizing the SqlSerializer2 in some of the benchmarks, but that's ok if we leave it for future implementation.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

If all of the primitive types are supported in this PR, probably will be better for fully utilizing the SqlSerializer2 in some of the benchmarks, but that's ok if we leave it for future implementation.

This comment has been minimized.

@yhuai

yhuai Apr 13, 2015

Contributor

I will make it support DecimalType in my next update.

@yhuai

yhuai Apr 13, 2015

Contributor

I will make it support DecimalType in my next update.

case NullType => // Write nothing.
case BooleanType =>

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Should we use the nullable property here?

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Should we use the nullable property here?

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Is that possible to return a function instead? It probably reduce some of overhead for pattern matching in runtime.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Is that possible to return a function instead? It probably reduce some of overhead for pattern matching in runtime.

@chenghao-intel

View changes

Show outdated Hide outdated sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
serializer
}

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Rewrite the Exchange.toString to print the serializer also, which will provide more sophisticated information for troubleshooting.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Rewrite the Exchange.toString to print the serializer also, which will provide more sophisticated information for troubleshooting.

with Logging
with Serializable{
def newInstance(): SerializerInstance = new ShuffleSerializerInstance(keySchema, valueSchema)

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Instead of using Array[DataType], how about using Array[Expression]? Just for example, it's not necessary to serde a Literal.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Instead of using Array[DataType], how about using Array[Expression]? Just for example, it's not necessary to serde a Literal.

@chenghao-intel

View changes

Show outdated Hide outdated sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
@@ -139,6 +141,8 @@ private[sql] class SQLConf extends Serializable {
*/
private[spark] def codegenEnabled: Boolean = getConf(CODEGEN_ENABLED, "false").toBoolean
private[spark] def useSqlSerializer2: Boolean = getConf(USE_SQL_SERIALIZER2, "false").toBoolean

This comment has been minimized.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Instead of using a Boolean value, how about use a string type says the class name of the Serializer? We probably will add some other Serializer e.g. the codegen version.

@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

Instead of using a Boolean value, how about use a string type says the class name of the Serializer? We probably will add some other Serializer e.g. the codegen version.

This comment has been minimized.

@marmbrus

marmbrus Apr 17, 2015

Contributor

This seems pretty hard as there is no standard interface to the serializer constructor.

Perhaps we should document this and say it is experimental?

@marmbrus

marmbrus Apr 17, 2015

Contributor

This seems pretty hard as there is no standard interface to the serializer constructor.

Perhaps we should document this and say it is experimental?

This comment has been minimized.

@marmbrus

marmbrus Apr 17, 2015

Contributor

Also do we want to turn it on by default? Its easy to turn off if we find bugs.

@marmbrus

marmbrus Apr 17, 2015

Contributor

Also do we want to turn it on by default? Its easy to turn off if we find bugs.

@chenghao-intel

This comment has been minimized.

Show comment
Hide comment
@chenghao-intel

chenghao-intel Apr 13, 2015

Contributor

@yhuai this is a really cool improvement, definitely will improve the performance a lot. I have some of the comments about the future improvement(of course we can leave it for future), the most of the concern is using the Seq[Expression] probably better for Array[DataType] in constructing the Serializer, as we can optimize it for not serializing the Literal stuff.

Contributor

chenghao-intel commented Apr 13, 2015

@yhuai this is a really cool improvement, definitely will improve the performance a lot. I have some of the comments about the future improvement(of course we can leave it for future), the most of the concern is using the Seq[Expression] probably better for Array[DataType] in constructing the Serializer, as we can optimize it for not serializing the Literal stuff.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 14, 2015

Test build #30215 has finished for PR 5497 at commit c9373c8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

SparkQA commented Apr 14, 2015

Test build #30215 has finished for PR 5497 at commit c9373c8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 14, 2015

Test build #30255 has finished for PR 5497 at commit 8297732.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.1.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

SparkQA commented Apr 14, 2015

Test build #30255 has finished for PR 5497 at commit 8297732.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.1.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 14, 2015

Test build #30259 has finished for PR 5497 at commit 43b9fb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

SparkQA commented Apr 14, 2015

Test build #30259 has finished for PR 5497 at commit 43b9fb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 15, 2015

Test build #30288 has finished for PR 5497 at commit 3e09655.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • snappy-java-1.1.1.6.jar

SparkQA commented Apr 15, 2015

Test build #30288 has finished for PR 5497 at commit 3e09655.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • snappy-java-1.1.1.6.jar

yhuai added some commits Apr 15, 2015

Merge remote-tracking branch 'upstream/master' into serializer2
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 16, 2015

Test build #30382 has finished for PR 5497 at commit 791b96a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.

SparkQA commented Apr 16, 2015

Test build #30382 has finished for PR 5497 at commit 791b96a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 16, 2015

Test build #30384 has finished for PR 5497 at commit 09e587a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.

SparkQA commented Apr 16, 2015

Test build #30384 has finished for PR 5497 at commit 09e587a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.
@marmbrus

View changes

Show outdated Hide outdated sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
SparkSqlSerializer2.support(valueSchema)
val serializer = if (useSqlSerializer2) {
logInfo("Use SparkSqlSerializer2.")

This comment has been minimized.

@marmbrus

marmbrus Apr 17, 2015

Contributor

nit: Using

same below

@marmbrus

marmbrus Apr 17, 2015

Contributor

nit: Using

same below

@marmbrus

View changes

Show outdated Hide outdated sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
valueSchema: Array[DataType],
numPartitions: Int): Serializer = {
val useSqlSerializer2 =
!(sortBasedShuffleOn && numPartitions > bypassMergeThreshold) &&

This comment has been minimized.

@marmbrus

marmbrus Apr 17, 2015

Contributor

Comment why this is a condition.

@marmbrus

marmbrus Apr 17, 2015

Contributor

Comment why this is a condition.

@marmbrus

View changes

Show outdated Hide outdated .../src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala
import org.apache.spark.sql.types._
/**
* The serialization stream for SparkSqlSerializer2.

This comment has been minimized.

@marmbrus

marmbrus Apr 17, 2015

Contributor

Nit: this comment is pretty content less. Instead maybe discuss benefits/limitations?

@marmbrus

marmbrus Apr 17, 2015

Contributor

Nit: this comment is pretty content less. Instead maybe discuss benefits/limitations?

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 18, 2015

Test build #30513 has finished for PR 5497 at commit da562c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

SparkQA commented Apr 18, 2015

Test build #30513 has finished for PR 5497 at commit da562c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@marmbrus

This comment has been minimized.

Show comment
Hide comment
@marmbrus

marmbrus Apr 21, 2015

Contributor

Thanks! Merged to master.

Contributor

marmbrus commented Apr 21, 2015

Thanks! Merged to master.

@asfgit asfgit closed this in ce7ddab Apr 21, 2015

aihex added a commit to aihex/spark that referenced this pull request Apr 21, 2015

[SPARK-6368][SQL] Build a specialized serializer for Exchange operator.
JIRA: https://issues.apache.org/jira/browse/SPARK-6368

Author: Yin Huai <yhuai@databricks.com>

Closes #5497 from yhuai/serializer2 and squashes the following commits:

da562c5 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
50e0c3d [Yin Huai] When no filed is emitted to shuffle, use SparkSqlSerializer for now.
9f1ed92 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
6d07678 [Yin Huai] Address comments.
4273b8c [Yin Huai] Enabled SparkSqlSerializer2.
09e587a [Yin Huai] Remove TODO.
791b96a [Yin Huai] Use UTF8String.
60a1487 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
3e09655 [Yin Huai] Use getAs for Date column.
43b9fb4 [Yin Huai] Test.
8297732 [Yin Huai] Fix test.
c9373c8 [Yin Huai] Support DecimalType.
2379eeb [Yin Huai] ASF header.
39704ab [Yin Huai] Specialized serializer for Exchange.

nemccarthy added a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015

[SPARK-6368][SQL] Build a specialized serializer for Exchange operator.
JIRA: https://issues.apache.org/jira/browse/SPARK-6368

Author: Yin Huai <yhuai@databricks.com>

Closes #5497 from yhuai/serializer2 and squashes the following commits:

da562c5 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
50e0c3d [Yin Huai] When no filed is emitted to shuffle, use SparkSqlSerializer for now.
9f1ed92 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
6d07678 [Yin Huai] Address comments.
4273b8c [Yin Huai] Enabled SparkSqlSerializer2.
09e587a [Yin Huai] Remove TODO.
791b96a [Yin Huai] Use UTF8String.
60a1487 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
3e09655 [Yin Huai] Use getAs for Date column.
43b9fb4 [Yin Huai] Test.
8297732 [Yin Huai] Fix test.
c9373c8 [Yin Huai] Support DecimalType.
2379eeb [Yin Huai] ASF header.
39704ab [Yin Huai] Specialized serializer for Exchange.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment