Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. #16891

Conversation

sureshthalamati
Copy link
Contributor

What changes were proposed in this pull request?

The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner.

This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case.

This PR enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection.

Alternative approach PR #16847 is to pass original input keys to JDBC data source by adding check in the Data source class and handle case-insensitivity in the JDBC source code.

How was this patch tested?

Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully.

@SparkQA
Copy link

SparkQA commented Feb 11, 2017

Test build #72731 has finished for PR 16891 at commit 41d3362.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.executeUpdate()
conn.prepareStatement("INSERT INTO datetime VALUES ("
+ "1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})").executeUpdate()
conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, t TIMESTAMP)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to clean up these 2 tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing the patch. I think cleanup is not required, these tables are not persistent across the test runs. They are cleaned up when docker container is removed at the end of the test. Currently I did notice any setup in the afterAll() to do it after the test.

I moved up creation of temporary views also to the same place, to keep them together. And possibly any future tests can also use these tables.

test("SPARK-19318: connection property keys should be case-sensitive") {
sql(
s"""
|CREATE TEMPORARY TABLE datetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use CREATE TEMPORARY VIEW please, CREATE TEMPORARY TABLE is deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$jdbcUrl', dbTable 'datetime', oracle.jdbc.mapDateToTimestamp 'false')
""".stripMargin.replaceAll("\n", " "))
val row = sql("SELECT * FROM datetime where id = 1").collect()(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use .head instead of (0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util
class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not we just expose this original map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. For some reason I was hung up on making only the case-sensitive key available to the caller. Changed the code to expose the original map , it made code simpler. Thank you very much for the suggestion.

@@ -75,7 +75,7 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
s"""
|CREATE OR REPLACE TEMPORARY VIEW PEOPLE1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url1', dbtable 'TEST.PEOPLE1', user 'testUser', password 'testPass')
|OPTIONS (url '$url1', dbTable 'TEST.PEOPLE1', user 'testUser', password 'testPass')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this? I think the spark specific properties should still be case insensitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should be case-insensitive. Just additional case-sensitivity test case.
During testing of my fix I did not notice a test in the write suite for data source table for case-sensitivity checking during insert. I flipped the "dbTable" to make sure case-insensitivity is not broken in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not obvious. We might remove it in the future. How about adding a dedicated test case in JDBCWriteSuite.scala? https://github.com/sureshthalamati/spark/blob/a1560742f2196ba04c14ad50e955bdcc839c4ad8/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala#L249-L259

You can make the key name more obvious. Something like DbTaBlE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. Added a new test case.

@sureshthalamati sureshthalamati force-pushed the jdbc_case_senstivity_props_fix-SPARK-19318 branch from 41d3362 to a156074 Compare February 13, 2017 07:38
@SparkQA
Copy link

SparkQA commented Feb 13, 2017

Test build #72807 has started for PR 16891 at commit a156074.

@sureshthalamati
Copy link
Contributor Author

Thank you for reviewing the PR @cloud-fan. Addressed the review comments, please let me know if it requires any further changes.

@cloud-fan
Copy link
Contributor

retest this please

*/
class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]
class CaseInsensitiveMap(val caseSensitiveMap: Map[String, String]) extends Map[String, String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's name it originalMap, and rename baseMap to keyLowercasedMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

override def + [B1 >: String](kv: (String, B1)): Map[String, B1] =
baseMap + kv.copy(_1 = kv._1.toLowerCase)
override def +[B1 >: String](kv: (String, B1)): Map[String, B1] = {
new CaseInsensitiveMap(caseSensitiveMap + kv.copy(_2 = kv._2.asInstanceOf[String]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why kv.copy(_2 = kv._2.asInstanceOf[String])?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy is unnecessary, but I do need the cast, otherwise getting compiler error :
Error:(34, 47) type mismatch;
found : (String, B1)
required: (String, String)
new CaseInsensitiveMap(caseSensitiveMap + kv )

I am thinking of changing it to the following :
new CaseInsensitiveMap(caseSensitiveMap + kv.asInstanceOf[(String, String)])

Thank you for the feedback.
^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

class CaseInsensitiveMap[T](originalMap: Map[String, T]) extends Map[String, T] {
  ...
  override def +[B1 >: T](kv: (String, B1)): Map[String, B1] = {
    new CaseInsesitveMap(originalMap + kv)
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change it worked. It does touch more files , i hope that is ok.
Thanks a lot for the suggestion.

def this(params: Map[String, String]) = {
this(params match {
case cMap: CaseInsensitiveMap => cMap
case _ => new CaseInsensitiveMap(params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a general problem, let's create an object CaseInsensitiveMap and put this logic there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to CaseInsenstiveMap and marked the constructor private to avoid any nested created of the case-insensitive maps.

)

assert(new JDBCOptions(parameters).asConnectionProperties.keySet()
.toArray()(0) == "oracle.jdbc.mapDateToTimestamp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we test it with xxx.keySet().contains("xxx")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed as part of test case cleanup.


override def iterator: Iterator[(String, String)] = baseMap.iterator

override def -(key: String): Map[String, String] = baseMap - key.toLowerCase
override def -(key: String): Map[String, String] = {
new CaseInsensitiveMap(caseSensitiveMap.filterKeys(k => k.toLowerCase != key.toLowerCase))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: String.equalsIgnoreCase can help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

assert(connProps.get("connTimeOut") == "60")
assert(caseInsensitiveMap1.get("dbtable").get == "t1")

// remove key from case-insensitive map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key removing is case insensitive, this test doesn't reflect it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -925,4 +925,53 @@ class JDBCSuite extends SparkFunSuite
assert(res.generatedRows.isEmpty)
assert(res.outputRows === foobarCnt :: Nil)
}

test("SPARK-19318: Connection properties keys should be case-sensitivie.") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you simply this test a bit? e.g.

val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value")
def testJdbcOption(option: JDBCOption): Unit = {
  // Spark JDBC data source options are case-insensitive
  assert(options.table == "t1")
  // When we convert it to properties, it should be case-sensitive.
  assert(options.asProperties.get("customkey") == null)
  assert(options.asProperties.get("customKey") == "a-value")
  assert(options.asConnectionProperties.get("customkey") == null)
  assert(options.asConnectionProperties.get("customKey") == "a-value")
}

val parameters = Map("dbTAblE" -> "t1", "cutomKey" -> "a-value")
testJdbcOption(new JDBCOption(parameters))
testJdbcOption(new JDBCOption(new CaseInsensitiveMap(parameters)))
test add/remove key-value from the case-insensitive map ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can remove https://github.com/apache/spark/pull/16891/files#diff-dc4b58851b084b274df6fe6b189db84dR960

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that looks much better. Updated the test case.

@SparkQA
Copy link

SparkQA commented Feb 13, 2017

Test build #72820 has finished for PR 16891 at commit a156074.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CaseInsensitiveMap(val caseSensitiveMap: Map[String, String]) extends Map[String, String]

assert(row1.getInt(0) == 1)
assert(row1.getDate(1).equals(Date.valueOf("1991-11-09")))
assert(row1.getTimestamp(2).equals(Timestamp.valueOf("1996-01-01 01:23:45")))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After addressing all the comments, I will run the docker tests in my local environment. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@sureshthalamati sureshthalamati force-pushed the jdbc_case_senstivity_props_fix-SPARK-19318 branch from a156074 to 9e31ec3 Compare February 14, 2017 09:12
@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72866 has finished for PR 16891 at commit 9e31ec3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FileStreamOptions(parameters: CaseInsensitiveMap[String]) extends Logging

@sureshthalamati
Copy link
Contributor Author

Thank you for the feedback @cloud-fan @gatorsmile . Addressed the review comments, please let me know if it requires any other changes.

|CREATE TEMPORARY VIEW people_view
|USING org.apache.spark.sql.jdbc
|OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass')
""".stripMargin.replaceAll("\n", " "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indents are not right. Could you fix all of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      s"""
        |CREATE TEMPORARY VIEW people_view
        |USING org.apache.spark.sql.jdbc
        |OPTIONS (uRl '$url1', DbTaBlE 'TEST.PEOPLE1', User 'testUser', PassWord 'testPass')
       """.stripMargin.replaceAll("\n", " "))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks

@gatorsmile
Copy link
Member

I ran the docker tests in my local computer. Now, finally, all the tests can pass! :)

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72892 has finished for PR 16891 at commit 27338c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in f48c5a5 Feb 14, 2017
@sureshthalamati
Copy link
Contributor Author

Thank you @cloud-fan , @gatorsmile

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…by the user in case-sensitive manner.

## What changes were proposed in this pull request?
The reason for test failure is that the property “oracle.jdbc.mapDateToTimestamp” set by the test was getting converted into all lower case. Oracle database expects this property in case-sensitive manner.

This test was passing in previous releases because connection properties were sent as user specified for the test case scenario. Fixes to handle all option uniformly in case-insensitive manner, converted the JDBC connection properties also to lower case.

This PR  enhances CaseInsensitiveMap to keep track of input case-sensitive keys , and uses those when creating connection properties that are passed to the JDBC connection.

Alternative approach PR apache#16847  is to pass original input keys to JDBC data source by adding check in the  Data source class and handle case-insensitivity in the JDBC source code.

## How was this patch tested?
Added new test cases to JdbcSuite , and OracleIntegrationSuite. Ran docker integration tests passed on my laptop, all tests passed successfully.

Author: sureshthalamati <suresh.thalamati@gmail.com>

Closes apache#16891 from sureshthalamati/jdbc_case_senstivity_props_fix-SPARK-19318.
asfgit pushed a commit that referenced this pull request Sep 19, 2017
…t case failure: `: General data types to be mapped to Oracle`

## What changes were proposed in this pull request?

This PR is backport of #16891 to Spark 2.1.

## How was this patch tested?

unit tests

Author: Yuming Wang <wgyumg@gmail.com>

Closes #19259 from wangyum/SPARK-22041-BACKPORT-2.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants