Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Having-clause causes DecodeNode wedges into two-stage Agg mistakenly #19906

Merged

Conversation

satanson
Copy link
Contributor

@satanson satanson commented Mar 21, 2023

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes ##19901

Problem Summary(Required) :

For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto backported to target branch
    • 3.0
    • 2.5
    • 2.4
    • 2.3

stdpain
stdpain previously approved these changes Mar 21, 2023
…stakenly

Signed-off-by: satanson <ranpanf@gmail.com>
@sonarcloud
Copy link

sonarcloud bot commented Mar 21, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@satanson satanson merged commit 5f46ec7 into StarRocks:main Mar 21, 2023
@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-3.0

@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Mar 21, 2023
@mergify
Copy link
Contributor

mergify bot commented Mar 21, 2023

backport branch-3.0

✅ Backports have been created

@mergify
Copy link
Contributor

mergify bot commented Mar 21, 2023

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 21, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
mergify bot pushed a commit that referenced this pull request Mar 21, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
@satanson
Copy link
Contributor Author

@Mergifyio backport branch-2.3
@Mergifyio backport branch-2.4

@mergify
Copy link
Contributor

mergify bot commented Mar 22, 2023

backport branch-2.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
satanson added a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
@satanson
Copy link
Contributor Author

https://github.com/Mergifyio backport branch-2.4

@mergify
Copy link
Contributor

mergify bot commented Mar 22, 2023

backport branch-2.4

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2023
…stakenly (#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.

(cherry picked from commit 5f46ec7)
numbernumberone pushed a commit to numbernumberone/starrocks that referenced this pull request May 31, 2023
…stakenly (StarRocks#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Jun 5, 2023
…stakenly (StarRocks#19906)

Signed-off-by: satanson <ranpanf@gmail.com>
-------
For the query select count(distinct c1), count(distinct c2) from t0 having count(1) > 0, when CTE optimization is close, low-cardinality dict optimization is open and two-stage aggregation is adopted, A wrong plan will be generated, a DecodeNode wedges into two-stage Agg whose agg function is multi_distinct_count. The 1st agg(below DecodeNode) aggregates dict-encoding input data into Set and serialize it then send it to 2nd agg, the 2nd agg(above DecodeNode) deserializes the data and treat it as Set, this fact causes be crashes.

The root cause is that 1st agg is rewritten before 2nd agg when apply dict optimization, however 2nd agg fails to be rewritten because it has having-clausing that references some aggregation, so dict optimization can not propagates upwards and DecodeNode is interpolated between two aggs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants