Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19149] [SQL] Unify two sets of statistics in LogicalPlan #16529

Closed
wants to merge 2 commits into from

Conversation

wzhfy
Copy link
Contributor

@wzhfy wzhfy commented Jan 10, 2017

What changes were proposed in this pull request?

Currently we have two sets of statistics in LogicalPlan: a simple stats and a stats estimated by cbo, but the computing logic and naming are quite confusing, we need to unify these two sets of stats.

How was this patch tested?

Just modify existing tests.

@wzhfy
Copy link
Contributor Author

wzhfy commented Jan 10, 2017

cc @rxin @cloud-fan

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71124 has finished for PR 16529 at commit 3c75602.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71129 has finished for PR 16529 at commit cf50c4a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor Author

wzhfy commented Jan 10, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71131 has finished for PR 16529 at commit cf50c4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -81,44 +81,36 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
}
}

/** A cache for the estimated statistics, such that it will only be computed once. */
private val statsCache = new ThreadLocal[Option[Statistics]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be thread local?

Copy link
Contributor Author

@wzhfy wzhfy Jan 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible a LogicalPlan is shared by different threads? May I ask what's your concern about multi-thread in #16401 (comment)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i was thinking you could just use an AtomicReference

@rxin
Copy link
Contributor

rxin commented Jan 11, 2017

Merging in master. I will fix the thread local thing in a pr.

@asfgit asfgit closed this in a615513 Jan 11, 2017
@rxin
Copy link
Contributor

rxin commented Jan 11, 2017

Actually the multi-threaded issue probably doesn't matter. I will just change it to the original Option implementation.

ghost pushed a commit to dbtsai/spark that referenced this pull request Jan 11, 2017
## What changes were proposed in this pull request?
This patch simplifies slightly the logical plan statistics cache implementation, as discussed in apache#16529

## How was this patch tested?
N/A - this has no behavior change.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#16544 from rxin/SPARK-19149.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Currently we have two sets of statistics in LogicalPlan: a simple stats and a stats estimated by cbo, but the computing logic and naming are quite confusing, we need to unify these two sets of stats.

## How was this patch tested?

Just modify existing tests.

Author: wangzhenhua <wangzhenhua@huawei.com>
Author: Zhenhua Wang <wzh_zju@163.com>

Closes apache#16529 from wzhfy/unifyStats.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This patch simplifies slightly the logical plan statistics cache implementation, as discussed in apache#16529

## How was this patch tested?
N/A - this has no behavior change.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#16544 from rxin/SPARK-19149.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?

Currently we have two sets of statistics in LogicalPlan: a simple stats and a stats estimated by cbo, but the computing logic and naming are quite confusing, we need to unify these two sets of stats.

## How was this patch tested?

Just modify existing tests.

Author: wangzhenhua <wangzhenhua@huawei.com>
Author: Zhenhua Wang <wzh_zju@163.com>

Closes apache#16529 from wzhfy/unifyStats.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?
This patch simplifies slightly the logical plan statistics cache implementation, as discussed in apache#16529

## How was this patch tested?
N/A - this has no behavior change.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#16544 from rxin/SPARK-19149.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants