Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data #46464

Closed
wants to merge 3 commits into from

Conversation

cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented May 8, 2024

What changes were proposed in this pull request?

This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data.

Why are the changes needed?

When the shuffle writer is SortShuffleWriter, it does not use SQLShuffleWriteMetricsReporter to update metrics, which causes AQE to obtain runtime statistics and the rowCount obtained is 0.

Some optimization rules rely on rowCount statistics, such as EliminateLimits. Because rowCount is 0, it removes the limit operator. At this time, we get data results without limit.

override def runtimeStatistics: Statistics = {
val dataSize = metrics("dataSize").value
val rowCount = metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN).value
Statistics(dataSize, Some(rowCount))
}

object EliminateLimits extends Rule[LogicalPlan] {
private def canEliminate(limitExpr: Expression, child: LogicalPlan): Boolean = {
limitExpr.foldable && child.maxRows.exists { _ <= limitExpr.eval().asInstanceOf[Int] }
}

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Production environment verification.

master metrics
image

PR metrics

image

Was this patch authored or co-authored using generative AI tooling?

No

cxzl25 added 2 commits May 8, 2024 14:14
… metrics resulting in potentially inaccurate data

### What changes were proposed in this pull request?
This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data.

### Why are the changes needed?
When the shuffle writer is SortShuffleWriter, it does not use SQLShuffleWriteMetricsReporter to update metrics, which causes AQE to obtain runtime statistics and the rowCount obtained is 0.

Some optimization rules rely on rowCount statistics, such as `EliminateLimits`. Because rowCount is 0, it removes the limit operator. At this time, we get data results without limit.

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L168-L172

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2067-L2070

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Production environment verification.

**master metrics**
<img width="296" alt="image" src="https://github.com/apache/spark/assets/3898450/dc9b6e8a-93ec-4f59-a903-71aa5b11962c">

**PR metrics**

<img width="276" alt="image" src="https://github.com/apache/spark/assets/3898450/2d73b773-2dcc-4d23-81de-25dcadac86c1">

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46273 from cxzl25/SPARK-48037.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

(cherry picked from commit e24f896)
…AndVerifyResult` to skip check results

### What changes were proposed in this pull request?
This PR aims to support AdaptiveQueryExecSuite to skip check results.

### Why are the changes needed?
apache#46273 (comment)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46316 from cxzl25/SPARK-48070.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

(cherry picked from commit 35767bb)
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @cxzl25 .

@github-actions github-actions bot added the INFRA label May 8, 2024
@@ -644,6 +644,7 @@ jobs:
python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 'alabaster==0.7.13'
python3.9 -m pip install ipython_genutils # See SPARK-38517
python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 'plotly>=4.8'
python3.9 -m pip install 'nbsphinx==0.9.3'
Copy link
Contributor Author

@cxzl25 cxzl25 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/cxzl25/spark/actions/runs/8997219681/job/24725778387#step:24:6423

Exception occurred:
  File "/usr/local/lib/python3.9/dist-packages/nbsphinx/__init__.py", line 1316, in apply
    for section in self.document.findall(docutils.nodes.section):
AttributeError: 'document' object has no attribute 'findall'

The failed CI uses nbsphinx 0.9.4 version, which requires docutils >= 0.18.1.

https://github.com/spatialaudio/nbsphinx/releases/tag/0.9.4

Release
0.9.4 May 7, 2024
0.9.3 Aug 27, 2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pin the nbsphinx version in the master branch as well? Similar to SPARK-39421 . cc @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I missed this line. My bad. We should proceed this separately because this this the following, @cxzl25 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun! I didn’t know why CI failed on the 3.4 branch at first, so I tested it in my own way.

Do we need to backport SPARK-48179 to branch 3.4?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to backport SPARK-48179 to branch 3.4?

I believe it's too late and would be redundant.

dongjoon-hyun pushed a commit that referenced this pull request May 8, 2024
…lated metrics resulting in potentially inaccurate data

### What changes were proposed in this pull request?
This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data.

### Why are the changes needed?
When the shuffle writer is SortShuffleWriter, it does not use SQLShuffleWriteMetricsReporter to update metrics, which causes AQE to obtain runtime statistics and the rowCount obtained is 0.

Some optimization rules rely on rowCount statistics, such as `EliminateLimits`. Because rowCount is 0, it removes the limit operator. At this time, we get data results without limit.

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L168-L172

https://github.com/apache/spark/blob/59d5946cfd377e9203ccf572deb34f87fab7510c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2067-L2070

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Production environment verification.

**master metrics**
<img width="296" alt="image" src="https://github.com/apache/spark/assets/3898450/dc9b6e8a-93ec-4f59-a903-71aa5b11962c">

**PR metrics**

<img width="276" alt="image" src="https://github.com/apache/spark/assets/3898450/2d73b773-2dcc-4d23-81de-25dcadac86c1">

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46464 from cxzl25/SPARK-48037-3.4.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants