Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Basic runnable version of ACBO (Advanced CBO) #5058

Merged
merged 10 commits into from
Mar 27, 2024

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Mar 21, 2024

See proposal #5057

This is the first runnable version of ACBO with TPCH SF 1.0 and TPCDS 10.0 passed.

After this patch, one could set spark.gluten.sql.advanced.cbo.enabled=true to enable ACBO. It's by default disabled.

Issues:

  1. It's now only replacing TransformPreOverrides() with a rough cost model to do fallback;
  2. It's not tested with CH backend yet. @baibaichen @zzcclp If you would like to evaluate it for CH; Otherwise I could disable it temporarily for CH in next patch;
  3. It may generate slow plan since some operators like filter / aggregation are not yet considered in first version of ACBO. Some refactors required and the side-effect of the issue will be amplified in performance test;
  4. The first version would only be runnable. Further integration works are required to make it ready for production.

The following improvements are on the way:

  1. Enable constraint propagation (for ops like AQEShuffleReadExec which could propagate children's convention to parent);
  2. Enable pattern based rule-matching;

The required facilities of the above were already added but not enabled yet. Will enable and test them in PRs respectively.

This comment was marked as abuse.

1 similar comment
Copy link

Run Gluten Clickhouse CI

@apache apache deleted a comment from github-actions bot Mar 21, 2024

This comment was marked as abuse.

12 similar comments
Copy link

Run Gluten Clickhouse CI

This comment was marked as abuse.

Copy link

Run Gluten Clickhouse CI

This comment was marked as abuse.

This comment was marked as abuse.

Copy link

Run Gluten Clickhouse CI

This comment was marked as abuse.

Copy link

Run Gluten Clickhouse CI

This comment was marked as abuse.

This comment was marked as abuse.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Member Author

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep filling code's document during further development.

And I'll make some notes on the PR first.

Comment on lines 453 to 462
- name: TPC-H SF1.0 && TPC-DS SF10.0 Parquet local spark3.2 with advanced CBO
run: |
$PATH_TO_GLUTEN_TE/$OS_IMAGE_NAME/gha/gha-checkout/exec.sh 'cd /opt/gluten/tools/gluten-it && \
mvn clean install -Pspark-3.2 \
&& GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \
--local --preset=velox --benchmark-type=h --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \
--extra-conf=spark.gluten.sql.advanced.cbo.enabled=true \
&& GLUTEN_IT_JVM_ARGS=-Xmx20G sbin/gluten-it.sh queries-compare \
--local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=40g -s=10.0 --threads=32 --iterations=1 \
--extra-conf=spark.gluten.sql.advanced.cbo.enabled=true'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI Job for ACBO + Velox + TPC-H SF1 + TPC-DS SF10

}
}

class Cbo[T <: AnyRef] private (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbo is a stateless optimization context consisting of configs and utilities.

assert(!notThrew, message)
}

private def validateModels(): Unit = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do validation on user's API implementations.

}
}

trait CboCluster[T <: AnyRef] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CboCluster is a set of nodes sharing the same context (the so-called "logical properties") in the original input plan. One cluster can derive its own set of CboGroups. Nodes in one CboGroup share the same ("physical") properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will raise a independent PR to add code doc. The code is under frequent modification as of now.

Comment on lines +21 to +30
case class CboConfig(
plannerType: PlannerType = PlannerType.Dp
)

object CboConfig {
sealed trait PlannerType
object PlannerType {
case object Exhaustive extends PlannerType
case object Dp extends PlannerType
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dp is the default planner implementation while Exhaustive is currently only used for testing. It's expected that we can implement parallelized optimization on exhaustive planner comparatively easier than on dp planner in future.


import io.glutenproject.cbo.memo.MemoStore

trait CboGroup[T <: AnyRef] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A set of nodes that share the same property set in the same cluster.

}
}

trait CanonicalNode[T <: AnyRef] extends CboNode[T] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonical node is a node with all children replaced by resident groups.

extends CanonicalNode[T]
}

trait GroupNode[T <: AnyRef] extends CboNode[T] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A node that exactly represents a group.

*/
package io.glutenproject.cbo

trait CboNode[T <: AnyRef] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single immutable node wrapper that hides the tree structure from it.

For representing tree structure, use CboPath.

Comment on lines 46 to 56
trait Best[T <: AnyRef] {
import Best._
def rootGroupId(): Int
def bestNodes(): Set[InGroupNode[T]]
def winnerNodes(): Set[InGroupNode[T]]
def costs(): InGroupNode[T] => Option[Cost]
def path(allGroups: Int => CboGroup[T]): KnownCostPath[T]
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best is basically the output of one shot of planning.

Comment on lines +126 to +127
trait PlannerState[T <: AnyRef] {
def cbo(): Cbo[T]
def memoState(): MemoState[T]
def rootGroupId(): Int
def best(): Best[T]
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immutable dump of the planner.


import scala.collection.mutable

class ForwardMemoTable[T <: AnyRef] private (override val cbo: Cbo[T])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memo table that handles cluster merging / forwarding internally.

def defineEquiv(node: CanonicalNode[T], newNode: T): Unit
}

trait Memo[T <: AnyRef] extends Closure[T] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memo is the basic structure that stores the whole search space of planner. All the nodes stored in it are canonized.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

This reverts commit 8fce2b9.
This reverts commit f23e095.
Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit c1e1cca into apache:main Mar 27, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants