-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with subdirectories #27130
Conversation
ok to test |
Test build #116300 has finished for PR 27130 at commit
|
|LOCATION '${s"${path.getCanonicalPath}"}'""".stripMargin | ||
sql(topDirStatement) | ||
if (parquetConversion == "true") { | ||
checkAnswer(sql("select * from tbl1"), Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we capitalize the SQL statement like SELECT * FROM tbl1
?
if (parquetConversion == "true") { | ||
checkAnswer(sql("select * from tbl1"), Nil) | ||
} else { | ||
intercept[IOException](sql("select * from tbl1").show()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had better check the exception message.
|LOCATION '${s"${path.getCanonicalPath}/l1/"}'""".stripMargin | ||
sql(l1DirStatement) | ||
if (parquetConversion == "true") { | ||
checkAnswer(sql("select * from tbl2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT * FROM tbl2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
checkAnswer(sql("select * from tbl2"), | ||
(1 to 2).map(i => Row(i, i, s"parq$i"))) | ||
} else { | ||
intercept[IOException](sql("select * from tbl2").show()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the exception message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole exception message is
Not a file: file:/Users/qianyangyu/IdeaProjects/spark/target/tmp/spark-abc8c1ad-4a3a-420f-b4fc-58d995be9bb0/l1
, I will check the first part Not a file:
.
|LOCATION '${s"${path.getCanonicalPath}/l1/l2/"}'""".stripMargin | ||
sql(l2DirStatement) | ||
if (parquetConversion == "true") { | ||
checkAnswer(sql("select * from tbl3"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT * FROM tbl3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
checkAnswer(sql("select * from tbl3"), | ||
(3 to 4).map(i => Row(i, i, s"parq$i"))) | ||
} else { | ||
intercept[IOException](sql("select * from tbl3").show()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the exception message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the checking
withTempDir { dir => | ||
try { | ||
hiveClient.runSqlHive("USE default") | ||
hiveClient.runSqlHive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use runSqlHive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will change to sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for keeping working on this. Sorry for your long waiting. At this time, I hope we can merge your PR.
Test build #116421 has finished for PR 27130 at commit
|
if (parquetConversion == "true") { | ||
checkAnswer(sql("SELECT * FROM tbl1"), Nil) | ||
} else { | ||
val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql("SELECT * FROM tbl1").show()
seems to need to be in the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, changed.
} else { | ||
val msg = intercept[IOException] {sql("SELECT * FROM tbl1").show() | ||
}.getMessage | ||
assert(msg.contains("Not a file:")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sql(l1DirStatement) | ||
if (parquetConversion == "true") { | ||
checkAnswer(sql("SELECT * FROM tbl2"), | ||
(1 to 2).map(i => Row(i, i, s"parq$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge 269 and 270 into one line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
checkAnswer(sql("SELECT * FROM tbl3"), | ||
(3 to 4).map(i => Row(i, i, s"parq$i"))) | ||
} else { | ||
val msg = intercept[IOException] {sql("SELECT * FROM tbl3").show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql("SELECT * FROM tbl3").show()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to next line.
checkAnswer(sql("SELECT * FROM tbl5"), | ||
(1 to 4).map(i => Row(i, i, s"parq$i"))) | ||
} else { | ||
val msg = intercept[IOException] {sql("SELECT * FROM tbl5").show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sql("USE default") | ||
sql( | ||
""" | ||
|CREATE EXTERNAL TABLE hive_orc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here.
@kevinyu98 . Do you want to get a table created by Hive here?
Usually, we use the table name, hive_orc
, for that table. Please see https://github.com/apache/spark/pull/27130/files#diff-a8c26a35def87a13e6b59db19d9fb8a1R68 .
And, you still using hiveClient.runSqlHive
at line 192. I'm wondering what is the test target in this PR~.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Thanks for pointing out this. I was using other test cases without thinking too much. I have changed the name. I also replaced the hiveClient.runSqlHive for the insert stmt.
Test build #116548 has finished for PR 27130 at commit
|
@dongjoon-hyun can we re-test it? I ran |
retest this please |
Test build #116657 has finished for PR 27130 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kevinyu98 .
I agree with HiveOrcSourceSuite.scala
. Could you update HiveParquetSourceSuite.scala
like HiveOrcSourceSuite.scala
? There is no need to be different.
@dongjoon-hyun sure, thanks. |
val l1DirSqlStatement = s"SELECT * FROM tbl2" | ||
if (convertMetastore) { | ||
checkAnswer(sql(l1DirSqlStatement), | ||
(1 to 2).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make line 250 and 251 as one line like HiveParquetSourceSuite
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
(1 to 2).map(i => Row(i, i, s"orc$i"))) | ||
} else { | ||
checkAnswer(sql(l1DirSqlStatement), | ||
(1 to 6).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
val l2DirSqlStatement = s"SELECT * FROM tbl3" | ||
if (convertMetastore) { | ||
checkAnswer(sql(l2DirSqlStatement), | ||
(3 to 4).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
(3 to 4).map(i => Row(i, i, s"orc$i"))) | ||
} else { | ||
checkAnswer(sql(l2DirSqlStatement), | ||
(3 to 6).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
val wildcardTopDirSqlStatement = s"SELECT * FROM tbl4" | ||
if (convertMetastore) { | ||
checkAnswer(sql(wildcardTopDirSqlStatement), | ||
(1 to 2).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
val wildcardL1DirSqlStatement = s"SELECT * FROM tbl5" | ||
if (convertMetastore) { | ||
checkAnswer(sql(wildcardL1DirSqlStatement), | ||
(1 to 4).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
val wildcardL2SqlStatement = s"SELECT * FROM tbl6" | ||
if (convertMetastore) { | ||
checkAnswer(sql(wildcardL2SqlStatement), | ||
(3 to 6).map(i => Row(i, i, s"orc$i"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
checkAnswer(sql(wildcardL2SqlStatement), Nil) | ||
} | ||
} | ||
} finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we convert this try..finally
with withTable
like HiveParquetSourceSuite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Test build #116894 has finished for PR 27130 at commit
|
Test build #116967 has finished for PR 27130 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @kevinyu98 .
Merged to master.
What changes were proposed in this pull request?
This PR aims to add these test cases for resolution of ORC table location reported by SPARK-25993
also add corresponding test cases for Parquet table.
Why are the changes needed?
The current behavior is complex, this test case suites are designed to prevent the accidental behavior change. This pr is rebased on master, the original pr is 23108
Does this PR introduce any user-facing change?
No. This adds test cases only.
How was this patch tested?
This is a new test case.