Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8635][SQL] improve performance of CatalystTypeConverters #7018

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

In CatalystTypeConverters.createToCatalystConverter, we add special handling for primitive types. We can apply this strategy to more places to improve performance.

@cloud-fan
Copy link
Contributor Author

the benchmark (ScalaUdf will convert from catalyst to scala and back again):

case class Floor(child: Expression) extends UnaryExpression with Predicate {
  override def toString = s"Floor $child"

  override def eval(input: InternalRow): Any = {
    child.eval(input) match {
      case null => null
      case s: Seq[Int] => s.sum
    }
  }
}

object T {
  def benchmark(count: Int, expr: Expression): Unit = {
    var i = 0
    val row = new GenericRow(Array[Any]((1 to 10).toSeq))
    val s = System.currentTimeMillis()
    while (i < count) {
      expr.eval(row)
      i += 1
    }
    val e = System.currentTimeMillis()

    println (s"${expr.getClass.getSimpleName}  -- ${e - s} ms")
  }
  def main(args: Array[String]) {
    def func(s: Seq[Int]) = s.sum
    val attr = BoundReference(0, ArrayType(IntegerType), true)
    val udf0 = ScalaUdf(func _, IntegerType, attr :: Nil)
    val udf1 = Floor(attr)

    benchmark(1000000, udf0)
    benchmark(1000000, udf0)
    benchmark(1000000, udf0)

    benchmark(1000000, udf1)
    benchmark(1000000, udf1)
    benchmark(1000000, udf1)
  }
}

before:
ScalaUdf -- 321 ms
ScalaUdf -- 313 ms
ScalaUdf -- 232 ms
Floor -- 40 ms
Floor -- 7 ms
Floor -- 7 ms

after:
ScalaUdf -- 73 ms
ScalaUdf -- 26 ms
ScalaUdf -- 23 ms
Floor -- 34 ms
Floor -- 7 ms
Floor -- 7 ms

@@ -258,16 +273,13 @@ object CatalystTypeConverters {
toScala(row(column).asInstanceOf[InternalRow])
}

private object StringConverter extends CatalystTypeConverter[Any, String, Any] {
private object StringConverter extends CatalystTypeConverter[Any, String, UTF8String] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal type of StringType should always be UTF8String.

@@ -90,7 +90,7 @@ private[sql] object FrequentItems extends Logging {
(name, originalSchema.fields(index).dataType)
}

val freqItems = df.select(cols.map(Column(_)) : _*).rdd.aggregate(countMaps)(
val freqItems = df.select(cols.map(Column(_)) : _*).internalRowRdd.aggregate(countMaps)(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we calculate singlePassFreqItems, we don't need to convert catalyst types to scala types before calculation. DataFrame.rdd is RDD[Row] and we need RDD[InternalRow] here.

@cloud-fan
Copy link
Contributor Author

cc @rxin @marmbrus

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35787 has finished for PR 7018 at commit 326c82c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35789 has finished for PR 7018 at commit 8b16630.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Typical use case would be converting a collection of rows that have the same schema. You will
* call this function once to get a converter, and apply it to every row.
*/
private[sql] def createToScalaConverter(dataType: DataType): Any => Any = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the StructConverter to use createToScalaConverter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, the StructConverter also need toCatalystConverter.

@davies
Copy link
Contributor

davies commented Jun 26, 2015

LGTM, only one minor comment.

@davies
Copy link
Contributor

davies commented Jun 26, 2015

merged this into master, thanks!

@asfgit asfgit closed this in 1a79f0e Jun 26, 2015
@cloud-fan cloud-fan deleted the converter branch June 26, 2015 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants