[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. #16891

sureshthalamati · 2017-02-11T04:37:07Z

What changes were proposed in this pull request?

The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner.

This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case.

This PR enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection.

Alternative approach PR #16847 is to pass original input keys to JDBC data source by adding check in the Data source class and handle case-insensitivity in the JDBC source code.

How was this patch tested?

Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully.

SparkQA · 2017-02-11T07:10:43Z

Test build #72731 has finished for PR 16891 at commit 41d3362.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-12T02:22:56Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala

+      .executeUpdate()
+    conn.prepareStatement("INSERT INTO datetime VALUES ("
+      + "1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})").executeUpdate()
+    conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, t TIMESTAMP)")


do we need to clean up these 2 tables?

Thank you for reviewing the patch. I think cleanup is not required, these tables are not persistent across the test runs. They are cleaned up when docker container is removed at the end of the test. Currently I did notice any setup in the afterAll() to do it after the test.

I moved up creation of temporary views also to the same place, to keep them together. And possibly any future tests can also use these tables.

cloud-fan · 2017-02-12T02:23:39Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala

+  test("SPARK-19318: connection property keys should be case-sensitive") {
+    sql(
+      s"""
+         |CREATE TEMPORARY TABLE datetime


use CREATE TEMPORARY VIEW please, CREATE TEMPORARY TABLE is deprecated

cloud-fan · 2017-02-12T02:24:28Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala

+         |USING org.apache.spark.sql.jdbc
+         |OPTIONS (url '$jdbcUrl', dbTable 'datetime', oracle.jdbc.mapDateToTimestamp 'false')
+      """.stripMargin.replaceAll("\n", " "))
+    val row = sql("SELECT * FROM datetime where id = 1").collect()(0)


nit: use .head instead of (0)

cloud-fan · 2017-02-12T02:28:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala

@@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util
 class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]


why not we just expose this original map?

Good question. For some reason I was hung up on making only the case-sensitive key available to the caller. Changed the code to expose the original map , it made code simpler. Thank you very much for the suggestion.

cloud-fan · 2017-02-12T02:28:49Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala

@@ -75,7 +75,7 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
      s"""
        |CREATE OR REPLACE TEMPORARY VIEW PEOPLE1
        |USING org.apache.spark.sql.jdbc
-        |OPTIONS (url '$url1', dbtable 'TEST.PEOPLE1', user 'testUser', password 'testPass')
+        |OPTIONS (url '$url1', dbTable 'TEST.PEOPLE1', user 'testUser', password 'testPass')


why change this? I think the spark specific properties should still be case insensitive

Yes, they should be case-insensitive. Just additional case-sensitivity test case.
During testing of my fix I did not notice a test in the write suite for data source table for case-sensitivity checking during insert. I flipped the "dbTable" to make sure case-insensitivity is not broken in this case.

This is not obvious. We might remove it in the future. How about adding a dedicated test case in JDBCWriteSuite.scala? https://github.com/sureshthalamati/spark/blob/a1560742f2196ba04c14ad50e955bdcc839c4ad8/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala#L249-L259

You can make the key name more obvious. Something like DbTaBlE

sure. Added a new test case.

SparkQA · 2017-02-13T07:42:37Z

Test build #72807 has started for PR 16891 at commit a156074.

sureshthalamati · 2017-02-13T07:49:06Z

Thank you for reviewing the PR @cloud-fan. Addressed the review comments, please let me know if it requires any further changes.

cloud-fan · 2017-02-13T18:27:01Z

retest this please

cloud-fan · 2017-02-13T18:29:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala

 */
-class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]
+class CaseInsensitiveMap(val caseSensitiveMap: Map[String, String]) extends Map[String, String]


nit: let's name it originalMap, and rename baseMap to keyLowercasedMap

cloud-fan · 2017-02-13T18:30:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala

-  override def + [B1 >: String](kv: (String, B1)): Map[String, B1] =
-    baseMap + kv.copy(_1 = kv._1.toLowerCase)
+  override def +[B1 >: String](kv: (String, B1)): Map[String, B1] = {
+    new CaseInsensitiveMap(caseSensitiveMap + kv.copy(_2 = kv._2.asInstanceOf[String]))


why kv.copy(_2 = kv._2.asInstanceOf[String])?

copy is unnecessary, but I do need the cast, otherwise getting compiler error :
Error:(34, 47) type mismatch;
found : (String, B1)
required: (String, String)
new CaseInsensitiveMap(caseSensitiveMap + kv )

I am thinking of changing it to the following :
new CaseInsensitiveMap(caseSensitiveMap + kv.asInstanceOf[(String, String)])

Thank you for the feedback.
^

how about

class CaseInsensitiveMap[T](originalMap: Map[String, T]) extends Map[String, T] { ... override def +[B1 >: T](kv: (String, B1)): Map[String, B1] = { new CaseInsesitveMap(originalMap + kv) } }

I made this change it worked. It does touch more files , i hope that is ok.
Thanks a lot for the suggestion.

cloud-fan · 2017-02-13T18:31:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala

+  def this(params: Map[String, String]) = {
+    this(params match {
+      case cMap: CaseInsensitiveMap => cMap
+      case _ => new CaseInsensitiveMap(params)


I think this is a general problem, let's create an object CaseInsensitiveMap and put this logic there.

Moved it to CaseInsenstiveMap and marked the constructor private to avoid any nested created of the case-insensitive maps.

cloud-fan · 2017-02-13T18:32:53Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

+    )
+
+    assert(new JDBCOptions(parameters).asConnectionProperties.keySet()
+      .toArray()(0) == "oracle.jdbc.mapDateToTimestamp")


how about we test it with xxx.keySet().contains("xxx")?

removed as part of test case cleanup.

cloud-fan · 2017-02-13T18:35:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala


  override def iterator: Iterator[(String, String)] = baseMap.iterator

-  override def -(key: String): Map[String, String] = baseMap - key.toLowerCase
+  override def -(key: String): Map[String, String] = {
+    new CaseInsensitiveMap(caseSensitiveMap.filterKeys(k => k.toLowerCase != key.toLowerCase))


nit: String.equalsIgnoreCase can help here.

cloud-fan · 2017-02-13T18:36:41Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

+    assert(connProps.get("connTimeOut") == "60")
+    assert(caseInsensitiveMap1.get("dbtable").get == "t1")
+
+    // remove key from case-insensitive map


key removing is case insensitive, this test doesn't reflect it.

cloud-fan · 2017-02-13T18:50:22Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -925,4 +925,53 @@ class JDBCSuite extends SparkFunSuite
    assert(res.generatedRows.isEmpty)
    assert(res.outputRows === foobarCnt :: Nil)
  }
+
+  test("SPARK-19318: Connection properties keys should be case-sensitivie.") {


can you simply this test a bit? e.g.

val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value") def testJdbcOption(option: JDBCOption): Unit = { // Spark JDBC data source options are case-insensitive assert(options.table == "t1") // When we convert it to properties, it should be case-sensitive. assert(options.asProperties.get("customkey") == null) assert(options.asProperties.get("customKey") == "a-value") assert(options.asConnectionProperties.get("customkey") == null) assert(options.asConnectionProperties.get("customKey") == "a-value") } val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value") testJdbcOption(new JDBCOption(parameters)) testJdbcOption(new JDBCOption(new CaseInsensitiveMap(parameters))) test add/remove key-value from the case-insensitive map ...

then we can remove https://github.com/apache/spark/pull/16891/files#diff-dc4b58851b084b274df6fe6b189db84dR960

Thank you, that looks much better. Updated the test case.

SparkQA · 2017-02-13T21:06:44Z

Test build #72820 has finished for PR 16891 at commit a156074.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class CaseInsensitiveMap(val caseSensitiveMap: Map[String, String]) extends Map[String, String]

gatorsmile · 2017-02-13T22:07:01Z

...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala

+    assert(row1.getInt(0) == 1)
+    assert(row1.getDate(1).equals(Date.valueOf("1991-11-09")))
+    assert(row1.getTimestamp(2).equals(Timestamp.valueOf("1996-01-01 01:23:45")))
+  }


After addressing all the comments, I will run the docker tests in my local environment. Thanks!

… the user specified options in case-sensitive manner.

…uld exist

… generic type parameter

SparkQA · 2017-02-14T11:58:02Z

Test build #72866 has finished for PR 16891 at commit 9e31ec3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class FileStreamOptions(parameters: CaseInsensitiveMap[String]) extends Logging

sureshthalamati · 2017-02-14T15:54:58Z

Thank you for the feedback @cloud-fan @gatorsmile . Addressed the review comments, please let me know if it requires any other changes.

gatorsmile · 2017-02-14T17:50:15Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala

+           |CREATE TEMPORARY VIEW people_view
+           |USING org.apache.spark.sql.jdbc
+           |OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass')
+      """.stripMargin.replaceAll("\n", " "))


The indents are not right. Could you fix all of them?

s""" |CREATE TEMPORARY VIEW people_view |USING org.apache.spark.sql.jdbc |OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass') """.stripMargin.replaceAll("\n", " "))

Fixed. Thanks

gatorsmile · 2017-02-14T17:51:24Z

I ran the docker tests in my local computer. Now, finally, all the tests can pass! :)

cloud-fan · 2017-02-14T18:44:30Z

LGTM

SparkQA · 2017-02-14T23:13:44Z

Test build #72892 has finished for PR 16891 at commit 27338c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-14T23:34:29Z

thanks, merging to master!

sureshthalamati · 2017-02-14T23:50:39Z

Thank you @cloud-fan , @gatorsmile

…by the user in case-sensitive manner. ## What changes were proposed in this pull request? The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner. This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case. This PR enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection. Alternative approach PR apache#16847 is to pass original input keys to JDBC data source by adding check in the Data source class and handle case-insensitivity in the JDBC source code. ## How was this patch tested? Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes apache#16891 from sureshthalamati/jdbc_case_senstivity_props_fix-SPARK-19318.

…t case failure: `: General data types to be mapped to Oracle` ## What changes were proposed in this pull request? This PR is backport of #16891 to Spark 2.1. ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #19259 from wangyum/SPARK-22041-BACKPORT-2.1.

sureshthalamati mentioned this pull request Feb 11, 2017

[SPARK-19318][SQL] Fix to send JDBC connection properties to the source as user-specified (case-sensitive) #16847

Closed

cloud-fan reviewed Feb 12, 2017

View reviewed changes

sureshthalamati force-pushed the jdbc_case_senstivity_props_fix-SPARK-19318 branch from 41d3362 to a156074 Compare February 13, 2017 07:38

cloud-fan reviewed Feb 13, 2017

View reviewed changes

gatorsmile reviewed Feb 13, 2017

View reviewed changes

sureshthalamati added 4 commits February 14, 2017 00:04

[SPARK-19318][SQL} Fix to keep track of JDBC connection properties in…

203aa04

… the user specified options in case-sensitive manner.

removed unnecessary getOrelse as the map is internal one, and the sho…

1c2eec6

…uld exist

Addressed review comments

ca0d007

Fixes to address the review comments. Replaced use of instanceOf with…

9e31ec3

… generic type parameter

sureshthalamati force-pushed the jdbc_case_senstivity_props_fix-SPARK-19318 branch from a156074 to 9e31ec3 Compare February 14, 2017 09:12

gatorsmile reviewed Feb 14, 2017

View reviewed changes

Fixing indentation for temporary table strings

27338c9

asfgit closed this in f48c5a5 Feb 14, 2017

wangyum mentioned this pull request Sep 17, 2017

[BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Docker test case failure: SPARK-16625: General data types to be mapped to Oracle #19259

Closed

		@@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util
		class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]

[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. #16891

[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. #16891

Conversation

sureshthalamati commented Feb 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 13, 2017

sureshthalamati commented Feb 13, 2017

cloud-fan commented Feb 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 14, 2017

sureshthalamati commented Feb 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Feb 14, 2017

cloud-fan commented Feb 14, 2017

SparkQA commented Feb 14, 2017

cloud-fan commented Feb 14, 2017

sureshthalamati commented Feb 14, 2017