Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Failures need fix for UTs which are newly added in vanilla spark3.3 #2169

Open
3 of 63 tasks
yma11 opened this issue Jul 1, 2023 · 1 comment
Open
3 of 63 tasks
Labels
bug Something isn't working

Comments

@yma11
Copy link
Contributor

yma11 commented Jul 1, 2023

  • SPARK-35675: EnsureRequirements remove shuffle should respect PartitioningCollection
    --------GlutenBroadcastJoinSuite-------
  • replace partial hash aggregate with sort aggregate
  • replace partial and final hash aggregate together with sort aggregate
  • do not replace hash aggregate if child does not have sort order
  • do not replace hash aggregate if there is no group-by column
  • Merge runtime bloom filters
  • GlutenParquetDeltaByteArrayEncodingSuite
  • GlutenParquetDeltaLengthByteArrayEncodingSuite
  • GlutenParquetFieldIdIOSuite
  • Parquet reads infer fields using field ids correctly
  • absence of field ids
  • multiple id matches
  • read parquet file without ids
  • global read/write flag should work correctly
    ----- GlutenParquetVectorizedSuite ------
  • metadata struct (parquet): read partial/all metadata struct fields
  • metadata struct (parquet): read metadata struct fields with random ordering
  • metadata struct (parquet): read metadata struct fields with expressions
  • metadata struct (parquet): select only metadata
  • metadata struct (parquet): select and re-select
  • metadata struct (parquet): alias
  • metadata struct (parquet): upper/lower case when case sensitive is true
  • metadata struct (parquet): read metadata with offheap set to true
  • metadata struct (parquet): read metadata with offheap set to false
  • metadata struct (parquet): read metadata withnestedSchemaPruning set to true
  • metadata struct (parquet): read metadata withnestedSchemaPruning set to false
  • metadata struct (parquet): prune metadata schema in projects
  • metadata struct (parquet): write _metadata in parquet and read back
  • aggregate push down - different data types

____________GlutenParquetV2AggregatePushDownSuite------------

  • nested column: Count(top level column) push down
  • Count(partition column): push down
  • filter alias over aggregate
  • alias over aggregate
  • aggregate over alias push down
  • aggregate with partition filter can be pushed down
  • aggregate with partition group by can be pushed down
  • aggregate with multi partition group by columns can be pushed down
  • aggregate push down - MIN/MAX/COUNT
  • aggregate push down - different data types
  • column name case sensitivity
  • aggregate push down - different data types

___________GlutenOrcV2AggregatePushDownSuite-------------

  • nested column: Count(top level column) push down
  • Count(partition column): push down
  • filter alias over aggregate
  • alias over aggregate
  • aggregate over alias push down
  • aggregate with partition filter can be pushed down
  • aggregate with partition group by can be pushed down
  • aggregate with multi partition group by columns can be pushed down
  • aggregate push down - MIN/MAX/COUNT
  • aggregate push down - different data types
  • column name case sensitivity
  • replace partial hash aggregate with sort aggregate
  • replace partial and final hash aggregate together with sort aggregate
  • do not replace hash aggregate if child does not have sort order
  • do not replace hash aggregate if there is no group-by column
  • Merge runtime bloom filters
  • determining the number of reducers: aggregate operator
  • determining the number of reducers: join operator
  • determining the number of reducers: complex query 1
  • determining the number of reducers: complex query 2
  • SPARK-24705 adaptive query execution works correctly when exchange reuse enabled
  • Union two datasets with different pre-shuffle partition number
  • SPARK-34790: enable IO encryption in AQE partition coalescing
    ---- GlutenEnsureRequirementsSuite: reorder should handle PartitioningCollection
@yma11 yma11 added the bug Something isn't working label Jul 1, 2023
@zhouyuan zhouyuan changed the title Failures need fix for UTs which are newly added in vanilla spark3.3 [VL] Failures need fix for UTs which are newly added in vanilla spark3.3 Jul 3, 2023
@gaoyangxiaozhu
Copy link
Contributor

gaoyangxiaozhu commented Aug 9, 2023

Just FYI since I can't update the task status.

My 3 PRs cover below Suites.

GlutenParquetDeltaByteArrayEncodingSuite
GlutenParquetDeltaLengthByteArrayEncodingSuite
GlutenParquetFieldIdIOSuite
GlutenParquetVectorizedSuite
GlutenFileMetadataStructSuite
GlutenParquetV2AggregatePushDownSuite
GlutenOrcV1AggregatePushDownSuite
GlutenOrcV2AggregatePushDownSuite
GlutenReplaceHashWithSortAggSuite
GlutenBroadcastJoinSuite

remaining suites need fixed by Baidu:
GlutenEnsureRequirementsSuite
GlutenCoalesceShufflePartitionsSuite
GlutenInjectRuntimeFilterSuite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants