New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. #16891
Conversation
Test build #72731 has finished for PR 16891 at commit
|
.executeUpdate() | ||
conn.prepareStatement("INSERT INTO datetime VALUES (" | ||
+ "1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})").executeUpdate() | ||
conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, t TIMESTAMP)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to clean up these 2 tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing the patch. I think cleanup is not required, these tables are not persistent across the test runs. They are cleaned up when docker container is removed at the end of the test. Currently I did notice any setup in the afterAll() to do it after the test.
I moved up creation of temporary views also to the same place, to keep them together. And possibly any future tests can also use these tables.
test("SPARK-19318: connection property keys should be case-sensitive") { | ||
sql( | ||
s""" | ||
|CREATE TEMPORARY TABLE datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use CREATE TEMPORARY VIEW
please, CREATE TEMPORARY TABLE
is deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|USING org.apache.spark.sql.jdbc | ||
|OPTIONS (url '$jdbcUrl', dbTable 'datetime', oracle.jdbc.mapDateToTimestamp 'false') | ||
""".stripMargin.replaceAll("\n", " ")) | ||
val row = sql("SELECT * FROM datetime where id = 1").collect()(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use .head
instead of (0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util | |||
class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not we just expose this original map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. For some reason I was hung up on making only the case-sensitive key available to the caller. Changed the code to expose the original map , it made code simpler. Thank you very much for the suggestion.
@@ -75,7 +75,7 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { | |||
s""" | |||
|CREATE OR REPLACE TEMPORARY VIEW PEOPLE1 | |||
|USING org.apache.spark.sql.jdbc | |||
|OPTIONS (url '$url1', dbtable 'TEST.PEOPLE1', user 'testUser', password 'testPass') | |||
|OPTIONS (url '$url1', dbTable 'TEST.PEOPLE1', user 'testUser', password 'testPass') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this? I think the spark specific properties should still be case insensitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should be case-insensitive. Just additional case-sensitivity test case.
During testing of my fix I did not notice a test in the write suite for data source table for case-sensitivity checking during insert. I flipped the "dbTable" to make sure case-insensitivity is not broken in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not obvious. We might remove it in the future. How about adding a dedicated test case in JDBCWriteSuite.scala? https://github.com/sureshthalamati/spark/blob/a1560742f2196ba04c14ad50e955bdcc839c4ad8/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala#L249-L259
You can make the key name more obvious. Something like DbTaBlE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. Added a new test case.
41d3362
to
a156074
Compare
Test build #72807 has started for PR 16891 at commit |
Thank you for reviewing the PR @cloud-fan. Addressed the review comments, please let me know if it requires any further changes. |
retest this please |
*/ | ||
class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] | ||
class CaseInsensitiveMap(val caseSensitiveMap: Map[String, String]) extends Map[String, String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's name it originalMap
, and rename baseMap
to keyLowercasedMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
override def + [B1 >: String](kv: (String, B1)): Map[String, B1] = | ||
baseMap + kv.copy(_1 = kv._1.toLowerCase) | ||
override def +[B1 >: String](kv: (String, B1)): Map[String, B1] = { | ||
new CaseInsensitiveMap(caseSensitiveMap + kv.copy(_2 = kv._2.asInstanceOf[String])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why kv.copy(_2 = kv._2.asInstanceOf[String])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy is unnecessary, but I do need the cast, otherwise getting compiler error :
Error:(34, 47) type mismatch;
found : (String, B1)
required: (String, String)
new CaseInsensitiveMap(caseSensitiveMap + kv )
I am thinking of changing it to the following :
new CaseInsensitiveMap(caseSensitiveMap + kv.asInstanceOf[(String, String)])
Thank you for the feedback.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
class CaseInsensitiveMap[T](originalMap: Map[String, T]) extends Map[String, T] {
...
override def +[B1 >: T](kv: (String, B1)): Map[String, B1] = {
new CaseInsesitveMap(originalMap + kv)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change it worked. It does touch more files , i hope that is ok.
Thanks a lot for the suggestion.
def this(params: Map[String, String]) = { | ||
this(params match { | ||
case cMap: CaseInsensitiveMap => cMap | ||
case _ => new CaseInsensitiveMap(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a general problem, let's create an object CaseInsensitiveMap
and put this logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to CaseInsenstiveMap and marked the constructor private to avoid any nested created of the case-insensitive maps.
) | ||
|
||
assert(new JDBCOptions(parameters).asConnectionProperties.keySet() | ||
.toArray()(0) == "oracle.jdbc.mapDateToTimestamp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we test it with xxx.keySet().contains("xxx")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed as part of test case cleanup.
|
||
override def iterator: Iterator[(String, String)] = baseMap.iterator | ||
|
||
override def -(key: String): Map[String, String] = baseMap - key.toLowerCase | ||
override def -(key: String): Map[String, String] = { | ||
new CaseInsensitiveMap(caseSensitiveMap.filterKeys(k => k.toLowerCase != key.toLowerCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String.equalsIgnoreCase
can help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
assert(connProps.get("connTimeOut") == "60") | ||
assert(caseInsensitiveMap1.get("dbtable").get == "t1") | ||
|
||
// remove key from case-insensitive map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key removing is case insensitive, this test doesn't reflect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -925,4 +925,53 @@ class JDBCSuite extends SparkFunSuite | |||
assert(res.generatedRows.isEmpty) | |||
assert(res.outputRows === foobarCnt :: Nil) | |||
} | |||
|
|||
test("SPARK-19318: Connection properties keys should be case-sensitivie.") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you simply this test a bit? e.g.
val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value")
def testJdbcOption(option: JDBCOption): Unit = {
// Spark JDBC data source options are case-insensitive
assert(options.table == "t1")
// When we convert it to properties, it should be case-sensitive.
assert(options.asProperties.get("customkey") == null)
assert(options.asProperties.get("customKey") == "a-value")
assert(options.asConnectionProperties.get("customkey") == null)
assert(options.asConnectionProperties.get("customKey") == "a-value")
}
val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value")
testJdbcOption(new JDBCOption(parameters))
testJdbcOption(new JDBCOption(new CaseInsensitiveMap(parameters)))
test add/remove key-value from the case-insensitive map ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we can remove https://github.com/apache/spark/pull/16891/files#diff-dc4b58851b084b274df6fe6b189db84dR960
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that looks much better. Updated the test case.
Test build #72820 has finished for PR 16891 at commit
|
assert(row1.getInt(0) == 1) | ||
assert(row1.getDate(1).equals(Date.valueOf("1991-11-09"))) | ||
assert(row1.getTimestamp(2).equals(Timestamp.valueOf("1996-01-01 01:23:45"))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After addressing all the comments, I will run the docker tests in my local environment. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
… the user specified options in case-sensitive manner.
… generic type parameter
a156074
to
9e31ec3
Compare
Test build #72866 has finished for PR 16891 at commit
|
Thank you for the feedback @cloud-fan @gatorsmile . Addressed the review comments, please let me know if it requires any other changes. |
|CREATE TEMPORARY VIEW people_view | ||
|USING org.apache.spark.sql.jdbc | ||
|OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass') | ||
""".stripMargin.replaceAll("\n", " ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indents are not right. Could you fix all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"""
|CREATE TEMPORARY VIEW people_view
|USING org.apache.spark.sql.jdbc
|OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass')
""".stripMargin.replaceAll("\n", " "))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
I ran the docker tests in my local computer. Now, finally, all the tests can pass! :) |
LGTM |
Test build #72892 has finished for PR 16891 at commit
|
thanks, merging to master! |
Thank you @cloud-fan , @gatorsmile |
…by the user in case-sensitive manner. ## What changes were proposed in this pull request? The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner. This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case. This PR enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection. Alternative approach PR apache#16847 is to pass original input keys to JDBC data source by adding check in the Data source class and handle case-insensitivity in the JDBC source code. ## How was this patch tested? Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes apache#16891 from sureshthalamati/jdbc_case_senstivity_props_fix-SPARK-19318.
…t case failure: `: General data types to be mapped to Oracle` ## What changes were proposed in this pull request? This PR is backport of #16891 to Spark 2.1. ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #19259 from wangyum/SPARK-22041-BACKPORT-2.1.
What changes were proposed in this pull request?
The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner.
This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case.
This PR enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection.
Alternative approach PR #16847 is to pass original input keys to JDBC data source by adding check in the Data source class and handle case-insensitivity in the JDBC source code.
How was this patch tested?
Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully.