Skip to content

[SPARK-48925] Add interface which prevents DataSourceV2Strategy to do planning for scan nodes#47443

Closed
urosstan-db wants to merge 1 commit intoapache:masterfrom
urosstan-db:SPARK-48925-Add-interface-which-prohibit-scan-planning-using-data-source-v2-strategy
Closed

[SPARK-48925] Add interface which prevents DataSourceV2Strategy to do planning for scan nodes#47443
urosstan-db wants to merge 1 commit intoapache:masterfrom
urosstan-db:SPARK-48925-Add-interface-which-prohibit-scan-planning-using-data-source-v2-strategy

Conversation

@urosstan-db
Copy link
Contributor

@urosstan-db urosstan-db commented Jul 22, 2024

What changes were proposed in this pull request?

Add new interface (ExternallyPlannedV1Scan) for V1Scan which will prevent DataSourceV2Strategy to do planning of optimized scan node.

Why are the changes needed?

Sometimes, we want to extend Spark with new strategies (it can be done by adding strategy to extraStrategies property of SparkSession), and we want that strategy to plan DataSourceV2ScanRelation node. Even extra strategies are applied before other strategies (and have extra priority), often those strategies tries to do some push downs (or optimization) with Filter and Project logical plans. If that push down fails, then DataSourceV2Strategy will do planning, because it can handle Filter/Project for every expression.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No tests made, existing tests covers we did not make regression for current planning logic.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 22, 2024
}

override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case PhysicalOperation(_, _, DataSourceV2ScanRelation(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem would be solveable with plan costs, but AFAIK, first physical plan is always used for execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets protect code with SQLConf

@HyukjinKwon
Copy link
Member

Mind filing a Jira and linking it to the PR title?

@urosstan-db urosstan-db changed the title Add interface which prevents DataSourceV2Strategy to do planning for scan nodes [SPARK-48925] Add interface which prevents DataSourceV2Strategy to do planning for scan nodes Jul 23, 2024
@urosstan-db
Copy link
Contributor Author

Mind filing a Jira and linking it to the PR title?

Sorry for not linking it, I already created ticket.

@urosstan-db
Copy link
Contributor Author

@cloud-fan Per agreement made with Wenchen, we will close this PR for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants