-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid optimization of CASE-WHEN expressions #33751
Comments
There is a further constraint around Assuming that SELECT "b"."BlogId", "b"."Url", CASE
WHEN "b"."Url" = 'http://efcore/' THEN 1
WHEN "b"."Url" LIKE 'http://%' THEN 2
WHEN "b"."Url" LIKE '%/' THEN 3
END = 1 AS "Category"
FROM "Blogs" AS "b" classifies all of the rows in one of the 3 categories:
while the (incorrectly) optimized version SELECT "b"."BlogId", "b"."Url", "b"."Url" = 'http://efcore/' AS "Category"
FROM "Blogs" AS "b" classifies all of the rows as:
|
@ranma42 all makes sense, thanks for the minimal repro and the bug analysis! /cc @maumar
Shouldn't the optimization simply collect all WHEN blocks with the specific value, and string them together with ORs? In other words: CASE
WHEN x THEN 1
WHEN y THEN 2
WHEN z THEN 1
END = 1 ... would get simplified to
Can you provide an example for this? At least in the above example, if none of the conditions hold, the CASE/WHEN would return NULL, and the simplification should return the same rows - or am I missing something? (though there may indeed be issues around NULL vs. false, we'd need to think about this a bit moer. BTW an unrelated note: I'm not sure why the optimization is limited to constants only - it seems we could widen it to arbitrary SqlExpressions (though keep in mind the cost of the deep comparisons). |
CASE
WHEN "b"."Url" = 'http://efcore/' THEN 1
WHEN "b"."Url" LIKE 'http://%' THEN 2
WHEN "b"."Url" LIKE '%/' THEN 1
END = 1 should not simplify to "b"."Url" = 'http://efcore/' OR "b"."Url" LIKE '%/' The If we want to transform it, it should get simplified to "b"."Url" = 'http://efcore/' OR (NOT("b"."Url" LIKE 'http://%') AND "b"."Url" LIKE '%/') |
I think you missed the part about "which cannot be noticed while filtering, but would affect projection" ;) For a concrete example, if we simplify CASE
WHEN x > 0 THEN 1
WHEN y > 0 THEN 2
END = 1 to x > 2 the values for the
You can try it directly as: WITH T(x, y) AS (VALUES (0, 0),(1, 0),(0, 1))
SELECT x, y, CASE
WHEN x > 0 THEN 1
WHEN y > 0 THEN 2
END = 1 AS c,
x > 0 AS simple FROM T; |
I agree that it would make sense to handle a more general case, but some care should be taken to avoid introducing subtle issues (see my two previous comments). From my point of view, instead, it makes sense to limit this optimization to constants (as it makes it easy to control its complexity), but it would be natural to extend it to other binary operators. For example, we could distribute the binop and perform constant folding. This could open some more possibilities:
example of generic transformCASE
WHEN a THEN 2
WHEN b THEN 1
WHEN c THEN 1
WHEN d THEN 3
END = 1 --- distribute & constand fold --> CASE
WHEN a THEN FALSE
WHEN b THEN TRUE
WHEN c THEN TRUE
WHEN d THEN FALSE
END --- merge --> CASE
WHEN a THEN FALSE
WHEN b OR c THEN TRUE
WHEN d THEN FALSE
END --- if we are in a filter --> CASE
WHEN a THEN FALSE
WHEN b OR c THEN TRUE
END example of the "happy path":CASE
WHEN a THEN 2
WHEN b THEN 1
WHEN c THEN 1
WHEN d THEN 3
END < 3 --- distribute & constand fold --> CASE
WHEN a THEN TRUE
WHEN b THEN TRUE
WHEN c THEN TRUE
WHEN d THEN FALSE
END --- merge --> CASE
WHEN a OR b OR c THEN TRUE
WHEN d THEN FALSE
END --- if we are in a filter --> a OR b OR c |
fun fact: while trying to write the test cases in https://github.com/ranma42/efcore/tree/fix-case-when-9 I hit another (apparently unrelated) issue around nullability 😅 EDIT: posted the issue as #33752 |
You're absolutely right, we indeed can't ignore WHENs with other result values since they may match and therefore cause a later WHEN to not apply. I don't think it's feasible for us to recognize when WHENs are truly disjoint, and I'm not sure that rewriting a CASE/WHEN to e.g. the above We may want to consider removing this optimization, or maybe reduce its scope considerably (in case it answers some specific need we have..). Leaving to @maumar to investigate and think about it... |
In general checking whether the cases are disjoint is hard, but some special cases might be easier, for example:
As mentioned in #33751 (comment) there are also some optimization opportunities on the value expressions (opposed to the test expressions).
Except for the (already supported) |
I also agree that the most valuable optimization is the compare one. I would not like to strip case just for the sake of doing it (i.e. @ranma42 a PR with some |
One thought... We generally don't try to go too far with optimizations which help with "badly-written" LINQ queries; there's an infinite number of ways in which a query can be expressed via inefficient LINQ, and we generally avoid adding a large amount of complexity (and also compilation time!) in trying to compensate for that. However, the same optimization which may help with badly-written LINQ queries can also help with SQL generated internally with the query pipeline; of course, when that's the case the optimization can be quite important. So I think we should be generally pragmatic here; if we know of internally-produced SQL which is improved by an optimization, that's definitely a good reason to do it. If, however, we don't think there's such a case, and the optimization would only help with badly-written user queries, then unless the optimization is trivial/easy, IMHO we shouldn't necessarily do it. @ranma42 I'd suggest opening an issue for the planned optimization before spending too much time in implementation, so we can discuss. |
If it makes sense to you I would
|
* Add tests for `CASE WHEN END = const` optimization * Support optimization of `CompareTo(a, b) == {-1,0,1}` * Remove invalid optimization of `CASE WHEN ... END = const` Fixes #33751
fixed by @ranma42 |
The
SqlExpressionSimplifyingExpressionVisitor
performs an invalid optimization ofCASE-WHEN
expressions:The simplification has these additional requirements that are currently not being checked:
sqlConstantComponent?.Value
only appears once in the list of case resultsmatchingCaseBlock
is disjoint from all of the previous casesELSE
is never reached, aka at least one of the conditions holdsAn example program that showcases the bug (and can be conveniently run on https://dotnetfiddle.net/ ;) ) is:
The first query is correctly translated as:
The second query is simplified to
which misses all of the
Url
s that would match the 3rd-case (example:https://efcore/
).The third query is simplified to
which incorrectly also matches
http://efcore/
.Include provider and version information
EF Core version: 8.0.5
Database provider: Microsoft.EntityFrameworkCore.Sqlite
Target framework: .NET 8.0
Operating system: Linux (/WSL)
IDE: Visual Studio Code 1.89.1
EDIT: added condition about
ELSE
/NULL
The text was updated successfully, but these errors were encountered: