Skip to content

[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading#14313

Closed
HyukjinKwon wants to merge 9 commits intoapache:masterfrom
HyukjinKwon:SPARK-16674
Closed

[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading#14313
HyukjinKwon wants to merge 9 commits intoapache:masterfrom
HyukjinKwon:SPARK-16674

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

Currently, JDBCRDD.compute is doing type dispatch for each row to read appropriate values.
It might not have to be done like this because the schema is already kept in JDBCRDD.

So, appropriate converters can be created first according to the schema, and then apply them to each row.

How was this patch tested?

Existing tests should cover this.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 22, 2016

Could you please take a look here @cloud-fan and @yhuai ? This is happening for writing too. I would like to open new one for writing later.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #62705 has finished for PR 14313 at commit ad64483.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #62708 has finished for PR 14313 at commit 5eae0e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(rs: ResultSet, pos: Int) =>
// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it.
val dateVal = rs.getDate(pos)
if (dateVal != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option(dateVal).map(...).orNull?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jul 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can be a critical path. I think we don't need to introduce extra function calls.

@SparkQA
Copy link

SparkQA commented Jul 24, 2016

Test build #62760 has finished for PR 14313 at commit 5335093.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion
// A `JDBCConversion` is responsible for converting a value from `ResultSet`
// to a value in a field for `InternalRow`.
private type JDBCConversion = (ResultSet, Int) => Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also explain what's the 2 arguments in the comment?

@HyukjinKwon
Copy link
Member Author

@cloud-fan I just addressed your comments. I added another argument in JDBCConversion for MutableRow so that we can avoid type-boxing.

case object TimestampConversion extends JDBCConversion
case object BinaryConversion extends JDBCConversion
case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion
// A `JDBCConversion` is responsible for converting and setting a value from `ResultSet`
Copy link
Contributor

@cloud-fan cloud-fan Jul 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBCConversion seems not a good name now, do you have any better ideas?

@cloud-fan
Copy link
Contributor

LGTM except some style comments, thanks for working on it!

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62789 has finished for PR 14313 at commit 8ac66b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// A `JDBCValueSetter` is responsible for converting and setting a value from `ResultSet`
// into a field for `MutableRow`. The last argument `Int` means the index for the
// value to be set in the row and also used for the value to retrieve from `ResultSet`.
private type ValueSetter = (ResultSet, MutableRow, Int) => Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBCValueSetter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@HyukjinKwon
Copy link
Member Author

Thank you for your review @cloud-fan. Sorry for too many nits. I will try to be more careful for the next time.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62792 has finished for PR 14313 at commit a385318.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* Maps a StructType to a type tag list.
* Creates a StructType to setters for each type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this doc?

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62801 has finished for PR 14313 at commit c336382.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62804 has finished for PR 14313 at commit 486dabd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62811 has finished for PR 14313 at commit 25894f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 7ffd99e Jul 25, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master!

zzcclp added a commit to zzcclp/spark that referenced this pull request Jul 27, 2016
@HyukjinKwon HyukjinKwon deleted the SPARK-16674 branch January 2, 2018 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants