Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL][SPARK-39528] Use V2 Filter in SupportsRuntimeFiltering #36918

Closed
wants to merge 10 commits into from

Conversation

huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Use V2 Filter in run time filtering for V2 Table

Why are the changes needed?

We should use V2 Filter in DS V2.
#32921 (comment)

Does this PR introduce any user-facing change?

Yes
new interface SupportsRuntimeV2Filtering

How was this patch tested?

new test suite

@github-actions github-actions bot added the SQL label Jun 20, 2022
@huaxingao huaxingao changed the title Support runtime V2 filtering [SQL][SPARK-39528] Use V2 Filter in SupportsRuntimeFiltering Jun 20, 2022
@@ -55,16 +57,27 @@ case class BatchScanExec(

@transient private lazy val filteredPartitions: Seq[Seq[InputPartition]] = {
val dataSourceFilters = runtimeFilters.flatMap {
case DynamicPruningExpression(e) => DataSourceStrategy.translateRuntimeFilter(e)
case DynamicPruningExpression(e) =>
if (scan.isInstanceOf[SupportsRuntimeFiltering]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about match here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks

@huaxingao
Copy link
Contributor Author

cc @cloud-fan Could you please take a look when you have a moment? Thanks!

scan match {
case _: SupportsRuntimeFiltering =>
DataSourceStrategy.translateRuntimeFilter(e)
case _: SupportsRuntimeV2Filtering =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make SupportsRuntimeV2Filtering have higher priority over SupportsRuntimeFiltering? Also we need to document the behavior if a source implements both of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to me that a data source would implement both SupportsRuntimeV2Filtering and SupportsRuntimeFiltering?

}
val literals = values.map { value =>
val literal = Literal(value)
LiteralValue(literal.value, literal.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to infer the data type by creating a catalyst Literal. The type must be in.child.dataType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks

if (partitioning.length == 1 && partitioning.head.references().length == 1) {
val ref = partitioning.head.references().head
filters.foreach {
case p : Predicate if p.name().equals("IN") =>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like some unapply method to extract what you want is more preferable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicate is a java class. I don't think unapply can be used

@huaxingao
Copy link
Contributor Author

The test failure is unrelated.

@cloud-fan
Copy link
Contributor

The GA failure is unrelated. Merging to master, thanks!

@cloud-fan cloud-fan closed this in f49411e Jul 28, 2022
@huaxingao
Copy link
Contributor Author

Thanks @cloud-fan @zinking

@@ -1805,3 +1805,21 @@ class DynamicPartitionPruningV2SuiteAEOff extends DynamicPartitionPruningV2Suite

class DynamicPartitionPruningV2SuiteAEOn extends DynamicPartitionPruningV2Suite
with EnableAdaptiveExecutionSuite

abstract class DynamicPartitionPruningV2FilterSuite
extends DynamicPartitionPruningDataSourceSuiteBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we extend DynamicPartitionPruningV2Suite here? then we can save the override protected def runAnalyzeColumnCommands: Boolean = false, and catalog configs will be overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have a follow-up here

@LorenzoMartini
Copy link
Contributor

Hi @huaxingao.

We are trying to use spark datasourceV2 and noticed that the spark v2 built-in data sources (eg parquet one, looking at ParquetScan) don't support this (SupportsRuntimeFiltering nor SupportsRuntimeV2Filtering) by default, creating a large performance difference between using v1 and v2 datasource ootb.

Is there a plan to have them support this? It would be really beneficial for the file scans to be able to do this and given they already benefit of some push downs we were wondering why the runtime filtering is not implemented. Or maybe I am missing something? And in that case it would be great to understand how to have spark file sources take advantage of dpp.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants