[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev#28955
[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev#28955HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @viirya, @dbtsai, @MaxGekk, @cloud-fan |
ffe1583 to
7a36dd3
Compare
|
|
||
| val data = (1 to 4).map(i => Tuple1(Option(i.b))) | ||
| import testImplicits._ | ||
| withNestedDataFrame(data.toDF()) { case (inputDF, colName, resultFun) => |
There was a problem hiding this comment.
I didn't review them one line by one line, assuming they just remove the outer withNestedDataFrame
| withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString) { | ||
| withSQLConf( | ||
| SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString, | ||
| SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") { |
There was a problem hiding this comment.
There's one diff here.
| protected def withParquetDataFrame(df: DataFrame, testVectorized: Boolean = true) | ||
| (f: DataFrame => Unit): Unit = { | ||
| withTempPath { file => | ||
| withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") { |
| withTempPath { file => | ||
| millisData.map(i => Tuple1(Timestamp.valueOf(i))).toDF | ||
| .write.format(dataSourceName).save(file.getCanonicalPath) | ||
| readParquetFile(file.getCanonicalPath) { df => |
There was a problem hiding this comment.
From 2 lines to 4 lines? This looks like an exception. Is this inevitable?
There was a problem hiding this comment.
Yup. I couldn't find a better way without having another method.
There was a problem hiding this comment.
Thank you for refactoring. Looks neater. I guess you are assuming a backporting to your internal branch, but Apache Spark will not backport this to branch-3.0 and this only adds additional commit. So, minimize the diff as a follow-up for the existing commits doesn't make sense to Apache Spark.
In short, this is just a normal commit doing refactoring for the future PRs. So, please remove minimizes the diff from the title and PR description. That's not a benefit to Apache Spark master branch (AS-IS) because the commit log grows monotonically always.
Also, we had better use a new JIRA ID because all of those(SPARK-25556, SPARK-17636, SPARK-31026 , SPARK-31060) are already shipped as a part of 3.0.0. Otherwise, we will lose a traceability for this improvement commit because this will not land on branch-3.0.
|
Test build #124647 has finished for PR 28955 at commit
|
|
retest this please |
|
Test build #124690 has finished for PR 28955 at commit
|
|
retest this please |
|
Oh sure @dongjoon-hyun. Let's use a new JIRA ID. But just to give you a bit of more contexts, I said "minimize the diff" because it will minimize the diff at #28761 (comment), and if other codes match. I was thinking about backporting this, @dongjoon-hyun to remove the unnecessary diff when you backport. It's a test-only PR so I guess it's fine to backport. For example, you can backport a test from This isn't related to any internal branch stuff :-). it's just from #28761 (comment). |
|
BTW @dbtsai, let's consider to block a PR even when the comments are from tests in particular when the releases are close. Seems like it can be an issue in this case, and I definitely want to avoid such current situation that complicates backporting and matching with other codes. |
|
Thank you for updating. |
|
Test build #124700 has finished for PR 28955 at commit
|
…potential conflicts in dev ### What changes were proposed in this pull request? This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours. Most of changes in tests are back before #27728 by combining `withNestedDataFrame` and `withParquetDataFrame`. Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment) ### Why are the changes needed? For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested. Closes #28955 from HyukjinKwon/SPARK-25556-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 8194d9e) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
|
Thank you guys. Merged to master and branch-3.0. |
What changes were proposed in this pull request?
This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours.
Most of changes in tests are back before #27728 by combining
withNestedDataFrameandwithParquetDataFrame.Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment)
Why are the changes needed?
For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this.
Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
Manually tested.