Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Apr 1, 2015

This PR change the internal representation for StringType from java.lang.String to UTF8String, which is implemented use Array[Byte](encoded in UTF-8).

This PR should not break any public API, Row.getString() will still return java.lang.String.

This is the first step of improve the performance of String in SQL.

cc @rxin

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29512 has started for PR 5303 at commit a85fb27.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29512 has finished for PR 5303 at commit a85fb27.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //final class MutableString extends MutableValue
    • case class Literal(var value: Any, dataType: DataType) extends LeafExpression
    • trait CaseConversionExpression
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29512/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a object Literal and add an apply function that converts string into UTF8String, rather than making this a var.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29522 has started for PR 5303 at commit 6b499ac.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29522 has finished for PR 5303 at commit 6b499ac.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • case class Literal(var value: Any, dataType: DataType) extends LeafExpression
    • trait CaseConversionExpression
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29522/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29526 has started for PR 5303 at commit 5f9e120.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29526 has finished for PR 5303 at commit 5f9e120.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • case class Literal(var value: Any, dataType: DataType) extends LeafExpression
    • trait CaseConversionExpression
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29526/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29527 has started for PR 5303 at commit 38c303e.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29527 has finished for PR 5303 at commit 38c303e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • case class Literal(var value: Any, dataType: DataType) extends LeafExpression
    • trait CaseConversionExpression
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29527/
Test FAILed.

@davies davies changed the title [SPARK-6638] [SQL] Improve performance of StringType in SQL [WIP] [SPARK-6638] [SQL] Improve performance of StringType in SQL Apr 1, 2015
@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29552 has started for PR 5303 at commit c7dd4d2.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29552 has finished for PR 5303 at commit c7dd4d2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • case class Literal(var value: Any, dataType: DataType) extends LeafExpression
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.1.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29552/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29557 has started for PR 5303 at commit dbfa1ed.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29632 has started for PR 5303 at commit ccaf78e.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29633 has started for PR 5303 at commit 28d6f32.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
	sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
	sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
	sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
	sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala
@davies
Copy link
Contributor Author

davies commented Apr 3, 2015

@rxin @marmbrus This PR is ready to review

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29634 has started for PR 5303 at commit 28f3d81.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29637 has started for PR 5303 at commit e5fa5b8.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2015

Can we open a new PR and close this one to get rid of the all the Jenkins messages?

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29632 has finished for PR 5303 at commit ccaf78e.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29632/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this function somewhere before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we have a similar one: convertToCatalyst (a: Any, dt: DataType)

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29633 has finished for PR 5303 at commit 28d6f32.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29633/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29634 has finished for PR 5303 at commit 28f3d81.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29634/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29637 has finished for PR 5303 at commit e5fa5b8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29637/
Test FAILed.

@davies
Copy link
Contributor Author

davies commented Apr 3, 2015

@rxin I will create a new PR after fixing the rest tests.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #635 has started for PR 5303 at commit e5fa5b8.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29658 has started for PR 5303 at commit 8d17f21.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29658 has finished for PR 5303 at commit 8d17f21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class MutableString extends MutableValue
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29658/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #635 has finished for PR 5303 at commit e5fa5b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@davies
Copy link
Contributor Author

davies commented Apr 3, 2015

Close this one to get rid of all the jenkins comments.

@davies davies closed this Apr 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants