-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gluten-986] Support task input metrics, fix wrong output metrics. #997
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Yohahaha, for the output metrics issue, previously we found the ouputRows
and outputVectors
metrics from some Velox nodes are zero. That's why we use Scala side updateOutputMetrics
to compensate for that. It would be great if we can make sure all the affected operators work well now on the two output metrics after removal.
@@ -158,6 +159,7 @@ class CHIteratorApi extends IIteratorApi with Logging with LogLevelUtil { | |||
context: TaskContext, | |||
pipelineTime: SQLMetric, | |||
updateOutputMetrics: (Long, Long) => Unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed if it is not needed?
throw new UnsupportedOperationException(s"This operator doesn't support getStreamedLeafPlan.") | ||
override def getStreamedLeafPlan: SparkPlan = child match { | ||
case c: TransformSupport => | ||
c.getStreamedLeafPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change?
I would follow this problem if it is happened again. |
Got it. I agree to remove this workaround, and if we find the output metrics being 0, we need to check that issue in Velox. |
@@ -157,7 +158,7 @@ class CHIteratorApi extends IIteratorApi with Logging with LogLevelUtil { | |||
outputAttributes: Seq[Attribute], | |||
context: TaskContext, | |||
pipelineTime: SQLMetric, | |||
updateOutputMetrics: (Long, Long) => Unit, | |||
updateInputMetrics: (InputMetricsWrapper) => Unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzcclp we currently use the same metrics for Velox and Clickhouse. But it definitely isn't possible. Any good way to seperate these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzcclp we currently use the same metrics for Velox and Clickhouse. But it definitely isn't possible. Any good way to seperate these?
I will raise a pr to seperate these metrics for velox and ch backends in the next week.
Hi @rui-mo, could you help review again? |
@@ -179,7 +180,6 @@ class CHIteratorApi extends IIteratorApi with Logging with LogLevelUtil { | |||
|
|||
override def next(): Any = { | |||
val cb = resIter.next() | |||
updateOutputMetrics(1, cb.numRows()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zzcclp, is this function call still needed on CH backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this first, if there is any issue, I will fix it later.
…apache#997) * Support task input metrics, fix wrong output metrics. * Implement getStreamedLeafPlan for some operators. --------- Co-authored-by: yangchuan <yangchuan.zy@alibaba-inc.com>
What changes were proposed in this pull request?
Current
MetricsUpdater::updateOutputMetrics
is unnecessary and has bug, it will count twice output metrics.Implement task input metrics, only Scan operator could update that.