Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories #27130

Closed
wants to merge 5 commits into from

Conversation

kevinyu98
Copy link
Contributor

@kevinyu98 kevinyu98 commented Jan 8, 2020

What changes were proposed in this pull request?

This PR aims to add these test cases for resolution of ORC table location reported by SPARK-25993
also add corresponding test cases for Parquet table.

Why are the changes needed?

The current behavior is complex, this test case suites are designed to prevent the accidental behavior change. This pr is rebased on master, the original pr is 23108

Does this PR introduce any user-facing change?

No. This adds test cases only.

How was this patch tested?

This is a new test case.

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon HyukjinKwon changed the title [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories Jan 8, 2020
@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116300 has finished for PR 27130 at commit 9722230.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

|LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin
sql(topDirStatement)
if (parquetConversion == "true") {
checkAnswer(sql("select * from tbl1"), Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we capitalize the SQL statement like SELECT * FROM tbl1?

if (parquetConversion == "true") {
checkAnswer(sql("select * from tbl1"), Nil)
} else {
intercept[IOException](sql("select * from tbl1").show())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had better check the exception message.

|LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin
sql(l1DirStatement)
if (parquetConversion == "true") {
checkAnswer(sql("select * from tbl2"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT * FROM tbl2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

checkAnswer(sql("select * from tbl2"),
(1 to 2).map(i => Row(i, i, s"parq$i")))
} else {
intercept[IOException](sql("select * from tbl2").show())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the exception message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole exception message is
Not a file: file:/Users/qianyangyu/IdeaProjects/spark/target/tmp/spark-abc8c1ad-4a3a-420f-b4fc-58d995be9bb0/l1, I will check the first part Not a file:.

|LOCATION '${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin
sql(l2DirStatement)
if (parquetConversion == "true") {
checkAnswer(sql("select * from tbl3"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT * FROM tbl3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

checkAnswer(sql("select * from tbl3"),
(3 to 4).map(i => Row(i, i, s"parq$i")))
} else {
intercept[IOException](sql("select * from tbl3").show())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the exception message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the checking

withTempDir { dir =>
try {
hiveClient.runSqlHive("USE default")
hiveClient.runSqlHive(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use runSqlHive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will change to sql.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for keeping working on this. Sorry for your long waiting. At this time, I hope we can merge your PR.

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116421 has finished for PR 27130 at commit 9bc32ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (parquetConversion == "true") {
checkAnswer(sql("SELECT * FROM tbl1"), Nil)
} else {
val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql("SELECT * FROM tbl1").show() seems to need to be in the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changed.

} else {
val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show()
}.getMessage
assert(msg.contains("Not a file:"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

sql(l1DirStatement)
if (parquetConversion == "true") {
checkAnswer(sql("SELECT * FROM tbl2"),
(1 to 2).map(i => Row(i, i, s"parq$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge 269 and 270 into one line here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

checkAnswer(sql("SELECT * FROM tbl3"),
(3 to 4).map(i => Row(i, i, s"parq$i")))
} else {
val msg = intercept[IOException] {sql("SELECT * FROM tbl3").show()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql("SELECT * FROM tbl3").show()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to next line.

checkAnswer(sql("SELECT * FROM tbl5"),
(1 to 4).map(i => Row(i, i, s"parq$i")))
} else {
val msg = intercept[IOException] {sql("SELECT * FROM tbl5").show()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

sql("USE default")
sql(
"""
|CREATE EXTERNAL TABLE hive_orc(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here.
@kevinyu98 . Do you want to get a table created by Hive here?
Usually, we use the table name, hive_orc, for that table. Please see https://github.com/apache/spark/pull/27130/files#diff-a8c26a35def87a13e6b59db19d9fb8a1R68 .

And, you still using hiveClient.runSqlHive at line 192. I'm wondering what is the test target in this PR~.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thanks for pointing out this. I was using other test cases without thinking too much. I have changed the name. I also replaced the hiveClient.runSqlHive for the insert stmt.

@SparkQA
Copy link

SparkQA commented Jan 12, 2020

Test build #116548 has finished for PR 27130 at commit 2cf3e26.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kevinyu98
Copy link
Contributor Author

@dongjoon-hyun can we re-test it? I ran HiveParquetSourceSuite and HiveOrcSourceSuite locally, it works fine. thanks.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116657 has finished for PR 27130 at commit 2cf3e26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kevinyu98 .
I agree with HiveOrcSourceSuite.scala. Could you update HiveParquetSourceSuite.scala like HiveOrcSourceSuite.scala? There is no need to be different.

@kevinyu98
Copy link
Contributor Author

@dongjoon-hyun sure, thanks.

val l1DirSqlStatement = s"SELECT * FROM tbl2"
if (convertMetastore) {
checkAnswer(sql(l1DirSqlStatement),
(1 to 2).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make line 250 and 251 as one line like HiveParquetSourceSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

(1 to 2).map(i => Row(i, i, s"orc$i")))
} else {
checkAnswer(sql(l1DirSqlStatement),
(1 to 6).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

val l2DirSqlStatement = s"SELECT * FROM tbl3"
if (convertMetastore) {
checkAnswer(sql(l2DirSqlStatement),
(3 to 4).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

(3 to 4).map(i => Row(i, i, s"orc$i")))
} else {
checkAnswer(sql(l2DirSqlStatement),
(3 to 6).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

val wildcardTopDirSqlStatement = s"SELECT * FROM tbl4"
if (convertMetastore) {
checkAnswer(sql(wildcardTopDirSqlStatement),
(1 to 2).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

val wildcardL1DirSqlStatement = s"SELECT * FROM tbl5"
if (convertMetastore) {
checkAnswer(sql(wildcardL1DirSqlStatement),
(1 to 4).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

val wildcardL2SqlStatement = s"SELECT * FROM tbl6"
if (convertMetastore) {
checkAnswer(sql(wildcardL2SqlStatement),
(3 to 6).map(i => Row(i, i, s"orc$i")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

checkAnswer(sql(wildcardL2SqlStatement), Nil)
}
}
} finally {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we convert this try..finally with withTable like HiveParquetSourceSuite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116894 has finished for PR 27130 at commit 0bb628f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2020

Test build #116967 has finished for PR 27130 at commit 39f271f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @kevinyu98 .
Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants