Skip to content

Comments

[SPARK-38843][SQL] Fix translate metadata col filters#36126

Closed
wangyum wants to merge 1 commit intoapache:masterfrom
wangyum:SPARK-38843
Closed

[SPARK-38843][SQL] Fix translate metadata col filters#36126
wangyum wants to merge 1 commit intoapache:masterfrom
wangyum:SPARK-38843

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Apr 9, 2022

What changes were proposed in this pull request?

Fix translate metadata col filters because it can't be pushed down.

  1. The FileSourceStrategy log is incorrect:
    Before this PR:
    07:24:45.131 ERROR org.apache.spark.sql.execution.datasources.FileSourceStrategy: Pushed Filters: 
    IsNotNull(_metadata.file_path),StringContains(_metadata.file_path,data/f0)
    
    After this PR:
    07:24:45.144 ERROR org.apache.spark.sql.execution.datasources.FileSourceStrategy: Pushed Filters: 
    
  2. It should be also pushable in other places:
    val translated = DataSourceStrategy.translateFilter(filterExpr, true)

    https://github.com/apache/spark/blob/ac4f2af8d9184db3962427133ef2940c8b515c26/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L705

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UT.

@github-actions github-actions bot added the SQL label Apr 9, 2022
@wangyum
Copy link
Member Author

wangyum commented Apr 10, 2022

cc @Yaohua628 @cloud-fan

val pushedFilters = dataFilters
.flatMap(DataSourceStrategy.translateFilter(_, supportNestedPredicatePushdown))
val pushedFilters = DataSourceStrategy.pushedDownFilters(dataFilters, fsRelation)
logInfo(s"Pushed Filters: ${pushedFilters.mkString(",")}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used for logging. Can we move the logging into FileSourceScanExec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSourceStrategy.scala#L701-L706 also needs to ensure filter is pushable.

@cloud-fan
Copy link
Contributor

The code looks a bit messy. How many times do we translate the filters during the entire query planning phase?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 24, 2022
@github-actions github-actions bot closed this Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants