Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Mar 12, 2017

What changes were proposed in this pull request?

This pr removed unnecessary type conversions per call in Hive: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala#L116

How was this patch tested?

Existing tests

@maropu
Copy link
Member Author

maropu commented Mar 12, 2017

I'm not sure this makes sense, so could you check? cc: @hvanhovell

@maropu
Copy link
Member Author

maropu commented Mar 12, 2017

I quickly did check performance changes;

public class TestGenericUDF extends GenericUDF {
  @Override
  public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
      return PrimitiveObjectInspectorFactory.javaLongObjectInspector;
  }

  @Override
  public Object evaluate(DeferredObject[] args) throws HiveException {
      final long a1 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[0].get());
      final long a2 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[1].get());
      final long a3 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[2].get());
      final long a4 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[3].get());
      return a1 + a2 + a3 + a4;
  }
}
$./bin/spark-shell --master local[1]  --conf spark.sql.shuffle.partitions=1 -v

scala> sql("CREATE TEMPORARY FUNCTION testUdf AS 'hivemall.ftvec.TestGenericUDF'")
scala> :paste
def timer[R](block: => R): R = {
  val t0 = System.nanoTime()
  val result = block
  val t1 = System.nanoTime()
  println("Elapsed time: " + ((t1 - t0 + 0.0) / 1000000000.0)+ "s")
  result
}

scala> spark.range(3000000).createOrReplaceTempView("t")
scala> timer { sql("SELECT testUdf(id, id, id, id) FROM t").queryExecution.executedPlan.execute().foreach(x => {}) }
# performance w/o this patch
Elapsed time: 1.901269167s

# performance w/ this patch
Elapsed time: 0.492860666s

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74396 has finished for PR 17264 at commit 6468fde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

row.get(i, dataSchema(i).dataType),
fieldRefs.get(i).getFieldObjectInspector,
dataSchema(i).dataType))
wrappers(i)(row.get(i, dataSchema(i).dataType))
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @maropu .
This seems to be improvement, too. Could you add some description about this?
The current description only mentions hiveUDFs.scala.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[this] val wrappers = dataSchema.zip(structOI.getAllStructFieldRefs().asScala.toSeq)
.map { case (f, i) =>
wrapperFor(i.getFieldObjectInspector, f.dataType)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(my personal taste that might be worth considering..

private[this] val wrappers = dataSchema
  .zip(structOI.getAllStructFieldRefs.asScala.toSeq)
  .map { case (f, i) => wrapperFor(i.getFieldObjectInspector, f.dataType) }

or maybe

private[this] val wrappers = 
  structOI.getAllStructFieldRefs.asScala.toSeq.zip(dataSchema).map { case (ref, field) =>
    wrapperFor(ref.getFieldObjectInspector, field.dataType)
  }

which resembles

val wrappers = ssoi.getAllStructFieldRefs.asScala.zip(structType).map {
case (ref, tpe) => wrapperFor(ref.getFieldObjectInspector, tpe.dataType)
}

)

@maropu
Copy link
Member Author

maropu commented Mar 12, 2017

Thanks for valuable comments! I'll update soon.

@hvanhovell
Copy link
Contributor

@maropu this looks like a nice improvement. It is an idea to check all uses of wrap, and see if we can replace them by wrapperFor?

@maropu
Copy link
Member Author

maropu commented Mar 14, 2017

@hvanhovell I checked all the related code and then I think this pr includes all the place we could replace wrap with wrapperFor.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74537 has finished for PR 17264 at commit efd3628.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM merging to master. Thanks!

@asfgit asfgit closed this in 6325a2f Mar 14, 2017
@rxin
Copy link
Contributor

rxin commented Mar 14, 2017

In the future can we put the perf result in PR descriptions?

@maropu
Copy link
Member Author

maropu commented Mar 14, 2017

@rxin okay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants