Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #674: Fix issues with semantics of parenthesis removal #675

Merged
merged 7 commits into from
Aug 23, 2023

Conversation

magbak
Copy link
Contributor

@magbak magbak commented Aug 13, 2023

PR Info

This is my attempt at fixing #674

Bug Fixes

Parenthesis placement is now done based on positive matches with known associativity and precedence rules, and otherwise defaults to placing parentheses.

  • #674 identified issues where SQL parenthesis omission did not respect associativity and precedence rules. sea-query only removes parentheses now where it is certain that this respects such rules.

Changes

  • Parentheses are now dropped from expressions such as
    (`character` LIKE 'D') AND (`character` LIKE 'E') since LIKE has stronger precedence than the logical ops. Previously this happened only some times.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 13, 2023

Very cool! I'll look into it closely tomorrow

@magbak
Copy link
Contributor Author

magbak commented Aug 14, 2023

I am thinking that the (notational, not necessarily mathematical) (left) associativity of a binary operator could be encoded on the binary operator in types.rs instead as a function:

So that:

 //Figure out if left associativity rules allow us to drop the left parenthesis
        let drop_left_assoc = left.is_binary()
            && op == left.get_bin_oper().unwrap()
            && (matches!(
                op,
                BinOper::And
                    | BinOper::Or
                    | BinOper::Add
                    | BinOper::Sub
                    | BinOper::Mul
                    | BinOper::Mod
                    | BinOper::PgOperator(PgBinOper::Concatenate)
            );

Becomes:

 //Figure out if left associativity rules allow us to drop the left parenthesis
        let drop_left_assoc = left.is_binary()
            && op == left.get_bin_oper().unwrap()
            && op.notationally_left_associative();

Where notationally_left_associative(&self) -> bool returns true if it is known to be notationally left associative in all the db backends (and people know about it so the parentheses would be distracting), and false otherwise, e.g. does not have notational associativity such as when chaining the comparison operators.

Notational left associativity is mostly the case.

Edit: I think the name should be well_known_notational_left_associativity(), since it is crucial that people find it easier to read expressions with parentheses dropped by this property.

@magbak
Copy link
Contributor Author

magbak commented Aug 14, 2023

@tyt2y3: I will fix those failing checks once I know if the approach is in principle ok.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 14, 2023

Thank you. I am not sure I fully understood the notion of notational, so may be we can decide by examples?

e.g.

(("character" LIKE 'D') AND ("character" LIKE 'E')) # then
("character" LIKE 'D' AND "character" LIKE 'E') # now

A AND B AND C AND D # no change

I am trying to form a mental model for the existing behaviour, i.e. why did we want to have parens surrounding LIKE, i.e. (("character" LIKE 'D') AND ("character" LIKE 'E')). And if it's possible, I'd like to minimize the changes if there is no semantic difference.

@magbak
Copy link
Contributor Author

magbak commented Aug 15, 2023

Thank you. I am not sure I fully understood the notion of notational, so may be we can decide by examples?

e.g.

(("character" LIKE 'D') AND ("character" LIKE 'E')) # then
("character" LIKE 'D' AND "character" LIKE 'E') # now

A AND B AND C AND D # no change

I am trying to form a mental model for the existing behaviour, i.e. why did we want to have parens surrounding LIKE, i.e. (("character" LIKE 'D') AND ("character" LIKE 'E')). And if it's possible, I'd like to minimize the changes if there is no semantic difference.

Let me try to explain notational left associativity.
Notational left associativity means that for a given binary operator OP:
The SQL parser procesesses "a OP b OP c" the same as "(a OP b) OP c"
This means that when we encounter a BinaryExpr(BinaryExpr(a, OP, b)), OP, c) we can choose either of the representations above. But it only makes sense to do so when people are used to reading it without parentheses. The SQL parser will be equally happy.

I am using the term notational associativity, since there is also mathematical associativity which is about the results of such expressions when they are evaluated when you move parentheses around.

If we zoom out a bit, we see that in the whole expression, there is actually a bit of inconsistent behaviour:
Before:

"SELECT "character" FROM "character" WHERE ("character" LIKE 'C' OR (("character" LIKE 'D') AND ("character" LIKE 'E'))) AND (("character" LIKE 'F') OR ("character" LIKE 'G'))"

Why did sea-query remove the parentheses here? "character" LIKE 'C' OR and not from the other LIKE-expressions? Stay tuned.

Now:

"SELECT "character" FROM "character" WHERE ("character" LIKE 'C' OR ("character" LIKE 'D' AND "character" LIKE 'E')) AND ("character" LIKE 'F' OR "character" LIKE 'G')"

The reason the parentheses are dropped now around the binary LIKE-expressions in "character" LIKE 'D' AND "character" LIKE 'E' now is due to the relative precedence of the AND and LIKE operators. In all popular SQL engines, LIKE has higher precedence than the logical operators AND and OR. This means that LIKE finds the closest arguments before AND does so. Or in other words, AND has to wait until the LIKE expression is parsed, and has to relate to "character" LIKE 'D' as a whole.
So that means that an SQL parser will process ("character" LIKE 'D') AND ("character" LIKE 'E') in exactly the same way as "character" LIKE 'D' AND "character" LIKE 'E'. Again, we can choose between either representation when processing BinaryExpr(BinaryExpr("character", LIKE, 'D'), AND, BinaryExpr("character", LIKE, 'E')). But in my opinion we should only choose to drop the parentheses if people have an intuitive grasp of the relative precedence of the operators, such as with = having greater precedence than OR, AND and NOT, and of course where all SQL parsers agree to have a mostly uniform SQL string across databases.

I could not quite see why the solution behaved inconsistently in the past. After your question I decided to look into the technical reason why. The reason sea-query dropped the parentheses earlier was the fact that the first OR is encoded as a condition, and not as a binary OR.

Query::select()
            .column(Char::Character)
            .from(Char::Table)
            .cond_where(
                Cond::all()
                    .add(
                        Cond::any().add(Expr::col(Char::Character).like("C")).add(
                            Expr::col(Char::Character)
                                .like("D")
                                .and(Expr::col(Char::Character).like("E"))
                        )
                    )
                    .add(
                        Expr::col(Char::Character)
                            .like("F")
                            .or(Expr::col(Char::Character).like("G"))
                    )
            )

Having two code paths to writing the same kind of expression is not a good idea, since this opens the door to inconsistent behaviour which is hard to maintain over time (as is now becoming evident). IMHO, conditions, which are syntactic sugar should be rewritten as nested binary expressions before being written to the SQL string. Doing so can delete the whole function below, and more, which has subtle inconsistencies compared to how binary_expr works now and worked previously. I think this should be done before merging the PR.

/// Translate part of a condition to part of a "WHERE" clause.
    fn prepare_condition_where(&self, condition: &Condition, sql: &mut dyn SqlWriter) {
        if condition.negate {
            write!(sql, "NOT ").unwrap()
        }

        if condition.is_empty() {
            match condition.condition_type {
                ConditionType::All => self.prepare_constant_true(sql),
                ConditionType::Any => self.prepare_constant_false(sql),
            }
            return;
        }

        if condition.negate {
            write!(sql, "(").unwrap();
        }
        condition.conditions.iter().fold(true, |first, cond| {
            if !first {
                match condition.condition_type {
                    ConditionType::Any => write!(sql, " OR ").unwrap(),
                    ConditionType::All => write!(sql, " AND ").unwrap(),
                }
            }
            match cond {
                ConditionExpression::Condition(c) => {
                    if condition.conditions.len() > 1 {
                        write!(sql, "(").unwrap();
                    }
                    self.prepare_condition_where(c, sql);
                    if condition.conditions.len() > 1 {
                        write!(sql, ")").unwrap();
                    }
                }
                ConditionExpression::SimpleExpr(e) => {
                    if condition.conditions.len() > 1 && (e.is_logical() || e.is_between()) {
                        write!(sql, "(").unwrap();
                    }
                    self.prepare_simple_expr(e, sql);
                    if condition.conditions.len() > 1 && (e.is_logical() || e.is_between()) {
                        write!(sql, ")").unwrap();
                    }
                }
            }
            false
        });
        if condition.negate {
            write!(sql, ")").unwrap();
        }
    }

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 15, 2023

Thank you for your detailed explanation. You convinced me that a revamp is needed.

IMHO, conditions, which are syntactic sugar should be rewritten as nested binary expressions before being written to the SQL string.

This is a great idea if we can get the logic right. However, I'm concerned there is a loss of information. Consider Cond::all([a, b, Cond::all(c, d), e]) if we lower it into (((a and b) and (c and d)) and e), we'd lost the original intention that a, b, e are 'at the same level of depthness'. So we'd output (a and b and c and d and e), am I right?

My concern is, there are many testcases e.g. in SeaORM that we directly assert the SQL output. So I am in a position to want to minimize the impact. If we can find a point of incremental improvement, we'd be able to ship it sooner.

@magbak
Copy link
Contributor Author

magbak commented Aug 15, 2023

Thank you for your detailed explanation. You convinced me that a revamp is needed.

Excellent!

This is a great idea if we can get the logic right. However, I'm concerned there is a loss of information. Consider Cond::all([a, b, Cond::all(c, d), e]) if we lower it into (((a and b) and (c and d)) and e), we'd lost the original intention that a, b, e are 'at the same level of depthness'. So we'd output (a and b and c and d and e), am I right?

Yes, this information would definitely be lost, and the output would be like you describe.

My concern is, there are many testcases e.g. in SeaORM that we directly assert the SQL output. So I am in a position to want to minimize the impact. If we can find a point of incremental improvement, we'd be able to ship it sooner.

I will do a run of the tests in sea-orm and set the dependency to the git branch, listing any broken tests and an explanation why here. Perhaps the impact is small, and then we can have a think about the revamp. The revamp will not take much time - we are talking ETA a day or two in any case.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 15, 2023

Okay cool, then may be we make the changes first and see.

@magbak
Copy link
Contributor Author

magbak commented Aug 15, 2023

in Cargo.toml:
sea-query = { git = "https://github.com/magbak/sea-query", branch = "parentheses_fix", default-features = false, features = ["thread-safe", "hashable-value", "backend-mysql", "backend-postgres", "backend-sqlite"] }

in sea-orm-codegen/Cargo.toml:
sea-query = { git = "https://github.com/magbak/sea-query", branch = "parentheses_fix", default-features = false, features = ["thread-safe"] }

All tests pass. Doing the revamp tomorrow morning.
Will do a slight refactoring to get rid of the large closures / lambdas, which are not pretty.

@magbak
Copy link
Contributor Author

magbak commented Aug 16, 2023

Hope this didn't break too much - have tuned it so as to break as little as possible.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 16, 2023

It seems to be very brilliant. Allow me more time to read through the logic in details.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 17, 2023

Nice refactoring!

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 17, 2023

This is indeed very brilliant! I was looking through the test cases, and trying to find an example that the old way was better. But so far, it seems like it made more sense the new way.
If the CI passes, I will invite niteave to have a look as well. NB: he contributed the Condition API.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 17, 2023

@nitnelave we are working on a very interesting change that unify the Condition and BinOper query building backend. Can you take a look and review the changes?

@magbak
Copy link
Contributor Author

magbak commented Aug 17, 2023

This is indeed very brilliant! I was looking through the test cases, and trying to find an example that the old way was better. But so far, it seems like it made more sense the new way. If the CI passes, I will invite niteave to have a look as well. NB: he contributed the Condition API.

Great! Interested to hear what he thinks :-)

Copy link
Contributor

@nitnelave nitnelave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! A couple of comments here and there, but I like the precedence-based approach. There's just one case that I commented on that seems dubious, but overall it's a great change.

src/backend/query_builder.rs Outdated Show resolved Hide resolved
src/backend/query_builder.rs Outdated Show resolved Hide resolved
src/backend/query_builder.rs Outdated Show resolved Hide resolved
src/backend/query_builder.rs Outdated Show resolved Hide resolved
src/backend/query_builder.rs Show resolved Hide resolved
src/expr.rs Show resolved Hide resolved
src/expr.rs Show resolved Hide resolved
src/query/condition.rs Outdated Show resolved Hide resolved
@tyt2y3
Copy link
Member

tyt2y3 commented Aug 18, 2023

Thank you @nitnelave for reviewing!
@magbak Can you add a special case for Custom Expressions?
I think we are going to proceed to merge this into master, to be included in the next release.

@magbak
Copy link
Contributor Author

magbak commented Aug 18, 2023

@tyt2y3 Did you want custom expressions to always have parentheses no matter where they occur, even as the sole expr in a WHERE? At present they get parentheses whenever they are included in a different expression, this was a consequence of the changes I made, but we could play it even safer?

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 18, 2023

I would say always have parentheses when combined with any operator. i.e.
Expr::cust("1 + 1").add(Expr::value(2)) -> (1 + 1) + 2

For sole expression in where may be we can leave them naked?

select()..and_where(Expr::cust("a = 1")) -> SELECT .. FROM .. WHERE a = 1

select()..and_where(Expr::cust("a = 1").or(Expr::value(1).eq(1))) -> SELECT .. FROM .. WHERE (a = 1) OR 1 = 1

@magbak
Copy link
Contributor Author

magbak commented Aug 18, 2023

I would say always have parentheses when combined with any operator. i.e. Expr::cust("1 + 1").add(Expr::value(2)) -> (1 + 1) + 2

For sole expression in where may be we can leave them naked?

select()..and_where(Expr::cust("a = 1")) -> SELECT .. FROM .. WHERE a = 1

select()..and_where(Expr::cust("a = 1").or(Expr::value(1).eq(1))) -> SELECT .. FROM .. WHERE (a = 1) OR 1 = 1

That is how it works now :-)

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 18, 2023

Oh right! I might have confused with the change set. Should be all good now

@prakol16
Copy link

prakol16 commented Aug 23, 2023

Here is a case of unnecessary parentheses:

#[test]
fn test_sea_query_case_when_eq() {
    let case_statement: SimpleExpr = Expr::case(
        Expr::col(Alias::new("col")).lt(5),
        Expr::col(Alias::new("othercol"))
    )
        .finally(Expr::col(Alias::new("finalcol")))
        .into();

    let result = Query::select()
        .column(sea_query::Asterisk)
        .from(Alias::new("tbl"))
        .and_where(
            case_statement.eq(10)
        ).to_string(PostgresQueryBuilder);
    assert_eq!(result, r#"SELECT * FROM "tbl" WHERE (CASE WHEN ("col" < 5) THEN "othercol" ELSE "finalcol" END) = 10"#);
}

The test fails because result actually ends up being:

SELECT * FROM "tbl" WHERE ((CASE WHEN ("col" < 5) THEN "othercol" ELSE "finalcol" END)) = 10

The parentheses around the case statement got doubled (this was not the case previously) and I'm not sure why.

Still, if it's not a simple fix, I'd prefer to leave it like this and be more conservative with parentheses rather than try to fix this and introduce a soundness bug.

@tyt2y3
Copy link
Member

tyt2y3 commented Aug 23, 2023

@prakol16 is your test case based on the latest code in this PR?

@tyt2y3 tyt2y3 changed the base branch from master to parentheses_fix August 23, 2023 19:46
@tyt2y3 tyt2y3 merged commit ae2c01d into SeaQL:parentheses_fix Aug 23, 2023
20 checks passed
@tyt2y3
Copy link
Member

tyt2y3 commented Aug 23, 2023

Merged into a new branch. @prakol16 can you open a PR on parentheses_fix and add your test there?
We can collaborate in a new PR.

@tyt2y3 tyt2y3 mentioned this pull request Aug 25, 2023
tyt2y3 added a commit that referenced this pull request Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in logic for dropping parentheses
4 participants