Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more unit tests from Spark 3.3/3.4 #1336

Closed
zhouyuan opened this issue Apr 12, 2023 · 5 comments
Closed

Support more unit tests from Spark 3.3/3.4 #1336

zhouyuan opened this issue Apr 12, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@zhouyuan
Copy link
Contributor

zhouyuan commented Apr 12, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Gluten has ported the major part SQL based unit tests from Spark 3.2 branch, however upstream Spark has added more unit tests in 3.3 and 3.4 branch. Gluten does not check for those new tests

Describe the solution you'd like
Gluten should check for those new tests from Spark 3.3/3.4

The existing unit tests in Spark 3.2 should be refactored as common unit tests.

  • In Spark 3.2 gluten should check those common unit tests
  • In Spark 3.3 gluten should check those common unit tests and the new tests in 3.3

Describe alternatives you've considered
N/A

Additional context
N/a

CC: @zzcclp

@zhouyuan zhouyuan added the enhancement New feature or request label Apr 12, 2023
@zhouyuan
Copy link
Contributor Author

zhouyuan commented Apr 13, 2023

Here's the list of different tests in Spark 3.3 vs. Spark 3.2
https://gist.github.com/zhouyuan/8c41cb1b579b3ca5bb5879ff7260c139
So there are two kinds of changes:

  • newly added tests -> put the new code on spark 3.3
  • modified tests over Spark 3.2 due to API change 1) for removed unit tests, disbale them in common, enabled in spark 3.2 2) for modified tests, refactor them to move part into common UT, part into spark3.2, and part into spark3.3

@rui-mo
Copy link
Contributor

rui-mo commented Jun 19, 2023

Exclude should be able to resolve a persistent view from Spark 33 test in #1996 due to below issue.

image

@zhouyuan
Copy link
Contributor Author

Below tests need to be fixed for Spark 3.3

execution/adaptive/GlutenAdaptiveQueryExecSuite.scala
execution/benchmarks/ParquetReadBenchmark.scala
execution/datasources/json/GlutenJsonSuite.scala
```
- Various partition value types
- Various inferred partition value types
- Various partition value types
- Various inferred partition value types
- SPARK-32908: maximum target error in percentile_approx

- SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64
- support batch reads for schema
- SPARK-36182: read TimestampNTZ as TimestampLTZ
- SPARK-36797: Union should resolve nested columns as top-level columns
- SPARK-37371: UnionExec should support columnar if all children support columnar
- SPARK-36280: Remove redundant aliases after RewritePredicateSubquery
- SPARK-36182: read TimestampNTZ as TimestampLTZ
- SPARK-39833: pushed filters with project without filter columns
- SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64
- support batch reads for schema
- SPARK-36794: Ignore duplicated key when building relation for semi/anti hash join

@PengleiShi
Copy link
Contributor

Spark 3.4.1 has been a stable release, will gluten support 3.4 in the near future?

@zhouyuan
Copy link
Contributor Author

closing with new issues opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants