Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](DECIMALV3)fix cumulative precision when literal and DECIMALV3 operations in Legacy #20354

Merged
merged 4 commits into from
Jun 9, 2023

Conversation

Mryange
Copy link
Contributor

@Mryange Mryange commented Jun 2, 2023

Proposed changes

before

mysql [test]>set enable_nereids_planner=false;

mysql [test]>select v1,1.00/v1 from divtest;
+------+----------------+
| v1   | 1.00 / `v1`    |
+------+----------------+
| 2.00 | 0.500000000000 |
| 3.00 | 0.333333333333 |
| 1.00 | 1.000000000000 |
+------+----------------+

after

mysql [test]>select v1,1.00/v1 from divtest;
+------+-------------+
| v1   | (1.00 / v1) |
+------+-------------+
| 2.00 |    0.500000 |
| 1.00 |    1.000000 |
| 3.00 |    0.333333 |
+------+-------------+

The precision handling for division with DECIMALV3 is as follows (excluding cases where division increases precision):

(p1, s1) / (p2, s2) ----> (p1 + s2, s1)

However, due to precision loss in division, it is considered to increase the precision of the left operand:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) ----> (p1 + s2, s1)

However, the legacy optimizer repeats the analyze and substitute steps for an expression, which can result in the accumulation of precision:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) =====> (p1 + s2 + s2, s1 + s2 + s2) / (p2, s2)

To address this, the previous approach was to forcibly convert the left operand of DECIMALV3 calculations. This results in rewriting the expression as:

(p1, s1) / (p2, s2) =====> cast((p1, s1) as (p1 + s2, s1 + s2)) / (p2, s2)

Then, during the substitution step, a check is performed. If it is a cast expression, the expression modified by the cast is extracted:

cast((p1, s1) as (p1 + s2, s1 + s2)) =====> (p1, s1)

protected Expr substituteImpl(ExprSubstitutionMap smap, ExprSubstitutionMap disjunctsMap, Analyzer analyzer) {
        if (isImplicitCast()) {
            return getChild(0).substituteImpl(smap, disjunctsMap, analyzer);
        }

This way, there won't be repeated analysis, preventing the continuous increase in precision. However, if the left expression is a constant (literal), theoretically, the precision would continue to increase. Unfortunately, the code that was removed in this PR (#19926) obscured this issue.

for (Expr child : children) {
    if (child instanceof DecimalLiteral && child.getType().isDecimalV3()) {
      ((DecimalLiteral)child).tryToReduceType();
    }
}

An attempt will be made to reduce the precision of literals in the expressions. However, this code snippet can cause such a bug.

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+

1.00 / 3.00, due to reduced precision, becomes 1 / 3.
<--Describe your changes.-->

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Mryange
Copy link
Contributor Author

Mryange commented Jun 2, 2023

run buildall

@Mryange Mryange marked this pull request as draft June 2, 2023 01:42
@Mryange
Copy link
Contributor Author

Mryange commented Jun 2, 2023

run buildall

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Jun 2, 2023
@Mryange
Copy link
Contributor Author

Mryange commented Jun 2, 2023

run p0

@Mryange Mryange changed the title [fix](regression) update some regression test [fix](DECIMALV3)fix cumulative precision when literal and DECIMALV3 operations in Legacy Jun 3, 2023
@Mryange
Copy link
Contributor Author

Mryange commented Jun 3, 2023

run buildall

@Mryange Mryange marked this pull request as ready for review June 3, 2023 13:14
@Mryange
Copy link
Contributor Author

Mryange commented Jun 3, 2023

run buildall

Copy link
Contributor

@Gabriel39 Gabriel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 4c6df90 into apache:master Jun 9, 2023
xiaokang pushed a commit that referenced this pull request Jun 9, 2023
…perations in Legacy (#20354)

The precision handling for division with DECIMALV3 is as follows (excluding cases where division increases precision):

(p1, s1) / (p2, s2) ----> (p1 + s2, s1)

However, due to precision loss in division, it is considered to increase the precision of the left operand:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) ----> (p1 + s2, s1)

However, the legacy optimizer repeats the analyze and substitute steps for an expression, which can result in the accumulation of precision:

(p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) =====> (p1 + s2 + s2, s1 + s2 + s2) / (p2, s2)

To address this, the previous approach was to forcibly convert the left operand of DECIMALV3 calculations. This results in rewriting the expression as:

(p1, s1) / (p2, s2) =====> cast((p1, s1) as (p1 + s2, s1 + s2)) / (p2, s2)

Then, during the substitution step, a check is performed. If it is a cast expression, the expression modified by the cast is extracted:

cast((p1, s1) as (p1 + s2, s1 + s2)) =====> (p1, s1)

protected Expr substituteImpl(ExprSubstitutionMap smap, ExprSubstitutionMap disjunctsMap, Analyzer analyzer) {
        if (isImplicitCast()) {
            return getChild(0).substituteImpl(smap, disjunctsMap, analyzer);
        }
This way, there won't be repeated analysis, preventing the continuous increase in precision. However, if the left expression is a constant (literal), theoretically, the precision would continue to increase. Unfortunately, the code that was removed in this PR (#19926) obscured this issue.

for (Expr child : children) {
    if (child instanceof DecimalLiteral && child.getType().isDecimalV3()) {
      ((DecimalLiteral)child).tryToReduceType();
    }
}
An attempt will be made to reduce the precision of literals in the expressions. However, this code snippet can cause such a bug.

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+
1.00 / 3.00, due to reduced precision, becomes 1 / 3.
<--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner dev/2.0-beta-merged kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants