[SPARK-49094][SQL] Fix ignoreCorruptFiles non-functioning for hive orc impl with mergeSchema off#47583
[SPARK-49094][SQL] Fix ignoreCorruptFiles non-functioning for hive orc impl with mergeSchema off#47583yaooqinn wants to merge 4 commits intoapache:masterfrom
Conversation
…c impl with mergeSchema off
|
cc @cloud-fan @dongjoon-hyun @HyukjinKwon thanks in advance |
| } | ||
| } | ||
|
|
||
| test("SPARK-49094: ignoreCorruptFiles works for hive orc") { |
There was a problem hiding this comment.
Shall we mention mergeSchema?
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala
Outdated
Show resolved
Hide resolved
| SQLConf.IGNORE_CORRUPT_FILES.key -> "false", | ||
| SQLConf.ORC_IMPLEMENTATION.key -> "hive") { | ||
| checkAnswer(spark.read | ||
| .option("mergeSchema", value = false) |
There was a problem hiding this comment.
To be complete, could you add another test case, SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false" without this line?
There was a problem hiding this comment.
Seq(true, false).foreach { mergeSchema =>
checkAnswer(spark.read
.option("mergeSchema", value = mergeSchema)
.option("ignoreCorruptFiles", value = true)
.orc(basePath), Row(0L, 1))
}Using mergeSchema w/ true & false can also achieve your request. I will address this in this way.
There was a problem hiding this comment.
Unfortunately, no, it's different because the previous code doesn't care of OrcOption. I expect the exact requested test case, @yaooqinn .
There was a problem hiding this comment.
the previous code doesn't care of OrcOption
mergeSchema is actually from OrcOption for both if and else code branches, see
There was a problem hiding this comment.
Oh, I realized the source of our disagreement. I overlooked the outermost withSQLConf wrapper, which is redundant.
There was a problem hiding this comment.
Does everything seem to be okay? The test now tests both mergeSchema on and off.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for fixing this. If we add one more test case, this PR looks good to me.
…c impl with mergeSchema off
…c impl with mergeSchema off
dongjoon-hyun
left a comment
There was a problem hiding this comment.
New code looks cleaner and correct. Thank you. Yes, I thought that redundant part was your intention of test coverage. We are all good. :)
|
BTW, I believe we can convert SPARK-49094 issue to |
|
It makes sense to me. I will bring this to master/3.5/3.4 |
…c impl with mergeSchema off ### What changes were proposed in this pull request? ignoreCorruptFiles now applies to all file data sources except for hive orc implementation with mergeSchema off ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #47583 from yaooqinn/SPARK-49094. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 6631abc) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…c impl with mergeSchema off ### What changes were proposed in this pull request? ignoreCorruptFiles now applies to all file data sources except for hive orc implementation with mergeSchema off ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #47583 from yaooqinn/SPARK-49094. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 6631abc) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
|
Thank you, @yaooqinn . |
|
late LGTM |
|
BTW, let's be more mindful of the behavior change definition. Even if it's a good and safe user-facing change, we should still document it in the |
What changes were proposed in this pull request?
ignoreCorruptFiles now applies to all file data sources except for hive orc implementation with mergeSchema off
Why are the changes needed?
bugfix
Does this PR introduce any user-facing change?
no
How was this patch tested?
new tests
Was this patch authored or co-authored using generative AI tooling?
no