[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading#14313
[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading#14313HyukjinKwon wants to merge 9 commits intoapache:masterfrom
Conversation
|
Could you please take a look here @cloud-fan and @yhuai ? This is happening for writing too. I would like to open new one for writing later. |
|
Test build #62705 has finished for PR 14313 at commit
|
|
Test build #62708 has finished for PR 14313 at commit
|
| (rs: ResultSet, pos: Int) => | ||
| // DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. | ||
| val dateVal = rs.getDate(pos) | ||
| if (dateVal != null) { |
There was a problem hiding this comment.
Option(dateVal).map(...).orNull?
There was a problem hiding this comment.
I guess this can be a critical path. I think we don't need to introduce extra function calls.
|
Test build #62760 has finished for PR 14313 at commit
|
| case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion | ||
| // A `JDBCConversion` is responsible for converting a value from `ResultSet` | ||
| // to a value in a field for `InternalRow`. | ||
| private type JDBCConversion = (ResultSet, Int) => Any |
There was a problem hiding this comment.
also explain what's the 2 arguments in the comment?
|
@cloud-fan I just addressed your comments. I added another argument in |
| case object TimestampConversion extends JDBCConversion | ||
| case object BinaryConversion extends JDBCConversion | ||
| case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion | ||
| // A `JDBCConversion` is responsible for converting and setting a value from `ResultSet` |
There was a problem hiding this comment.
JDBCConversion seems not a good name now, do you have any better ideas?
|
LGTM except some style comments, thanks for working on it! |
|
Test build #62789 has finished for PR 14313 at commit
|
| // A `JDBCValueSetter` is responsible for converting and setting a value from `ResultSet` | ||
| // into a field for `MutableRow`. The last argument `Int` means the index for the | ||
| // value to be set in the row and also used for the value to retrieve from `ResultSet`. | ||
| private type ValueSetter = (ResultSet, MutableRow, Int) => Unit |
|
Thank you for your review @cloud-fan. Sorry for too many nits. I will try to be more careful for the next time. |
|
Test build #62792 has finished for PR 14313 at commit
|
|
|
||
| /** | ||
| * Maps a StructType to a type tag list. | ||
| * Creates a StructType to setters for each type. |
|
Test build #62801 has finished for PR 14313 at commit
|
|
Test build #62804 has finished for PR 14313 at commit
|
|
Test build #62811 has finished for PR 14313 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
Currently,
JDBCRDD.computeis doing type dispatch for each row to read appropriate values.It might not have to be done like this because the schema is already kept in
JDBCRDD.So, appropriate converters can be created first according to the schema, and then apply them to each row.
How was this patch tested?
Existing tests should cover this.