Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some optimizations #2983

Closed
wants to merge 11 commits into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented Oct 28, 2014

This is still marked WIP because there are a few TODOs, but I'll remove that tag when done.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22389 has started for PR 2983 at commit 4e4bf3f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22389 has finished for PR 2983 at commit 4e4bf3f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22389/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22396 has started for PR 2983 at commit 7d3178b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22396 has finished for PR 2983 at commit 7d3178b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22396/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22406 has started for PR 2983 at commit f136942.

  • This patch merges cleanly.

@mateiz mateiz changed the title [SPARK-3930] [SPARK-3933] [WIP] Support fixed-precision decimal in SQL, and some optimizations [SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some optimizations Oct 29, 2014
@mateiz
Copy link
Contributor Author

mateiz commented Oct 29, 2014

I've marked this as not WIP anymore, because the main TODOs left are in the Hive support. I intend to send that as a separate patch, though I can also add it here. Right now this makes each Hive type be the unlimited-precision decimal, whereas in fact we should respect the precision and scale set in the Hive metastore in Hive 13; but the previous Spark SQL code doesn't respect that either so these are not a regression.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22409 has started for PR 2983 at commit 44301f6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22406 has finished for PR 2983 at commit f136942.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging
    • abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable with Logging
    • class JobUIData(
    • public final class JavaStatusAPIDemo
    • public static final class IdentityWithDelay<T> implements Function<T, T>
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):
    • class DateType(PrimitiveType):
    • case class BitwiseAnd(left: Expression, right: Expression) extends BinaryArithmetic
    • case class BitwiseOr(left: Expression, right: Expression) extends BinaryArithmetic
    • case class BitwiseXor(left: Expression, right: Expression) extends BinaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryExpression
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class CreateTableAsSelect[T](
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]
    • logDebug("Found class for $serdeName")

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22406/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22409 has finished for PR 2983 at commit 44301f6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22409/
Test FAILed.

@mateiz
Copy link
Contributor Author

mateiz commented Oct 29, 2014

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22414 has started for PR 2983 at commit 44301f6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22414 has finished for PR 2983 at commit 44301f6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22414/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22418 has started for PR 2983 at commit 4ca62cd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22418 has finished for PR 2983 at commit 4ca62cd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]
    • class DeferredObjectAdapter(oi: ObjectInspector) extends DeferredObject

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22418/
Test FAILed.

@mateiz
Copy link
Contributor Author

mateiz commented Oct 29, 2014

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22443 has started for PR 2983 at commit 4ca62cd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22443 has finished for PR 2983 at commit 4ca62cd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22443/
Test FAILed.

@mateiz
Copy link
Contributor Author

mateiz commented Oct 29, 2014

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22459 has started for PR 2983 at commit 4ca62cd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22459 has finished for PR 2983 at commit 4ca62cd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22459/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22620/
Test PASSed.

| "TimestampType" ^^^ TimestampType
)

protected lazy val fixedDecimalType: Parser[DataType] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically not required. This parser is only for reading data from old parquet files that were encoded with old versions of spark sql. Hopefully we can drop it completely some day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, wouldn't it be less confusing to leave it in for now though? I can also remove it if you prefer.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 1, 2014

Hey Matei, this is pretty awesome! A few minor comments and this needs to be merged. Otherwise LGTM.

…r now

Implement CAST to fixed-precision decimal

Add rules for propagating precision through decimal calculations

These work by casting things to Decimal.Unlimited to do the actual
operation, then adding a cast on the result. They will result in more
casts than needed, but on the other hand they avoid having each
arithmetic operator know about decimal precision rules. We might be able
to add more rules later to eliminate some intermediate casts.
Optimize sums and averages on fixed-precision Decimals
@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22705 has started for PR 2983 at commit 35e6b02.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22705 has finished for PR 2983 at commit 35e6b02.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22705/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #504 has started for PR 2983 at commit 35e6b02.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #504 has finished for PR 2983 at commit 35e6b02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants