Skip to content

[SPARK-15025][SQL] fix duplicate of PATH key in datasource table options#12804

Closed
xwu0226 wants to merge 6 commits intoapache:masterfrom
xwu0226:SPARK-15025
Closed

[SPARK-15025][SQL] fix duplicate of PATH key in datasource table options#12804
xwu0226 wants to merge 6 commits intoapache:masterfrom
xwu0226:SPARK-15025

Conversation

@xwu0226
Copy link
Contributor

@xwu0226 xwu0226 commented Apr 30, 2016

What changes were proposed in this pull request?

The issue is that when the user provides the path option with uppercase "PATH" key, options contains PATH key and will get into the non-external case in the following code in createDataSourceTables.scala, where a new key "path" is created with a default path.

val optionsWithPath =
      if (!options.contains("path")) {
        isExternal = false
        options + ("path" -> sessionState.catalog.defaultTablePath(tableIdent))
      } else {
        options
      }

So before creating hive table, serdeInfo.parameters will contain both "PATH" and "path" keys and different directories. and Hive table's dataLocation contains the value of "path".

The fix in this PR is to convert options in the code above to CaseInsensitiveMap before checking for containing "path" key.

How was this patch tested?

A testcase is added

@xwu0226
Copy link
Contributor Author

xwu0226 commented May 4, 2016

@yhuai @liancheng @liancheng @hvanhovell Can any of you help take a quick look at this change? Thank you very much!

@yhuai
Copy link
Contributor

yhuai commented May 4, 2016

ok to test.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57819 has finished for PR 12804 at commit 633c850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

xwu0226 added 2 commits May 8, 2016 00:11
…ause following up PR12974"

This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid of calling metadataHive.runSqlHive? For this case, we want to check that there is a single entry in table properties with the key as PATH and the location is the one we want, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai Thanks for your input!. Yes, you are right. I can get the CatalogTable.storage.serdeProperties to check the 'PATH' key. I will modify the test case.

@xwu0226
Copy link
Contributor Author

xwu0226 commented May 9, 2016

@yhuai I updated the testcase to check the key path from CatalogTable.storage.serdeProperties. In this case, we are supposed to see only PATH key-value. Let me know if you have any question. Thanks!

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58167 has finished for PR 12804 at commit 8c3fbe6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58166 has finished for PR 12804 at commit 1f6264e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented May 9, 2016

test this please

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58174 has finished for PR 12804 at commit 8c3fbe6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented May 10, 2016

LGTM. Merging to master and branch 2.0.

asfgit pushed a commit that referenced this pull request May 10, 2016
## What changes were proposed in this pull request?
The issue is that when the user provides the path option with uppercase "PATH" key, `options` contains `PATH` key and will get into the non-external case in the following code in `createDataSourceTables.scala`, where a new key "path" is created with a default path.
```
val optionsWithPath =
      if (!options.contains("path")) {
        isExternal = false
        options + ("path" -> sessionState.catalog.defaultTablePath(tableIdent))
      } else {
        options
      }
```
So before creating hive table, serdeInfo.parameters will contain both "PATH" and "path" keys and different directories. and Hive table's dataLocation contains the value of "path".

The fix in this PR is to convert `options` in the code above to `CaseInsensitiveMap` before checking for containing "path" key.

## How was this patch tested?
A testcase is added

Author: xin Wu <xinwu@us.ibm.com>

Closes #12804 from xwu0226/SPARK-15025.

(cherry picked from commit 980bba0)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in 980bba0 May 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants