Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gluten-986]Align and enrich Scan operator metrics #987

Merged
merged 4 commits into from
Feb 21, 2023

Conversation

Yohahaha
Copy link
Contributor

What changes were proposed in this pull request?

Velox has supports to count scan IO time in this pr facebookincubator/velox#4026.

Gluten's Scan operator lack of scanTime metric, this pr will fix this problem.
Use cpu nanos instead of cpu count.

How was this patch tested?

image
image

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

@Yohahaha Yohahaha changed the title Align and enrich Scan operator metrics [Gluten-986]Align and enrich Scan operator metrics Feb 20, 2023
@github-actions
Copy link

#986

@rui-mo rui-mo self-requested a review February 20, 2023 05:34
@rui-mo
Copy link
Contributor

rui-mo commented Feb 20, 2023

@Yohahaha Thank you. Could you explain a bit why we need to use cpu nanos instead of cpu count ?
cc @FelixYBW

@Yohahaha
Copy link
Contributor Author

@rui-mo Thanks your reply.
Velox use DeltaCpuWallTimer to calculate the elapsed time in Driver.cpp, count is invoked count, cpuNanos and wallNanos is elapsed time.
https://github.com/facebookincubator/velox/blob/e4a2c99bbf8c61b4cb3edabd5ce1ac25509f0652/velox/common/time/CpuWallTimer.h#L72-L80
https://github.com/facebookincubator/velox/blob/e4a2c99bbf8c61b4cb3edabd5ce1ac25509f0652/velox/common/time/CpuWallTimer.h#L24-L28

@zhouyuan zhouyuan added the velox backend works for Velox backend label Feb 20, 2023
@rui-mo rui-mo merged commit cfbb867 into apache:main Feb 21, 2023
@FelixYBW
Copy link
Contributor

@Yohahaha just moted the PR, we use totaltime_xxx to breakdown the operator elapsed time as below. The PR changed all to "cpu time". Can you revert them?

image

"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_window"),

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_filter"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totaltime_filter

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_project"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totaltime_project

"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_batchscan"),
"scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "total scan time"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totaltime_batchscan

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_expand"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totaltime_expand

"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_filescan"),
"scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "total scan time"),
"scanTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "scan time"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_input"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totaltime_input

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_limit"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_sort"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"count" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_window"),
"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "wall time"),
"cpuNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "cpu time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here

@rui-mo
Copy link
Contributor

rui-mo commented Feb 27, 2023

@Yohahaha just moted the PR, we use totaltime_xxx to breakdown the operator elapsed time as below. The PR changed all to "cpu time". Can you revert them?

image

"wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime_window"),

@FelixYBW Sorry for the inconvenience, and I will open a PR to change them back.

@Yohahaha
Copy link
Contributor Author

The PR changed all to "cpu time". Can you revert them?

@FelixYBW Revert wall time or cpu time?

@FelixYBW
Copy link
Contributor

No, let's submit another PR just rename the totaltime related. Other modification are OK

The PR changed all to "cpu time". Can you revert them?

@FelixYBW Revert wall time or cpu time?

@FelixYBW
Copy link
Contributor

The metrics just needs to starts with "totaltime" we can use totaltime to xxx or totaltime of xxx

@FelixYBW
Copy link
Contributor

The tool is upstreamed to Gazelle already, you can find it here: https://github.com/oap-project/gazelle_plugin/tree/main/tools

But Gluten needs some modification, will do later.

@Yohahaha
Copy link
Contributor Author

@FelixYBW Is any one will submit a pr to do these rename work? I haven't seen the relevant pr, and I could do it.

@rui-mo
Copy link
Contributor

rui-mo commented Feb 27, 2023

@FelixYBW Is any one will submit a pr to do these rename work? I haven't seen the relevant pr, and I could do it.

@Yohahaha I haven't started this PR. Feel free to do that if it is OK for you. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
velox backend works for Velox backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants