Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform OUTER joins into INNER ones if the WHERE condition violates the outer join rules #6992

Closed
dyemanov opened this issue Oct 6, 2021 · 13 comments

Comments

@dyemanov
Copy link
Member

dyemanov commented Oct 6, 2021

Example:

SELECT *
FROM T1 LEFT JOIN T2 ON T1.ID = T2.ID
WHERE T2.FIELD1 = 0

In this case the condition T2.FIELD1 = 0 effectively removes all the "fake NULL" rows of T2, so the result is the same as for the INNER JOIN. However, the optimizer is forced to use the T1->T2 join order while T2->T1 could also be considered. Also, the optimizer cannot use hash/merge algorithms for joining. It makes sense to detect this case during join processing and internally replace LEFT with INNER before optimization starts.

This is primarily intended to improve "ad hoc" and machine-generated (e.g. ORM) queries. However, it may also somewhat (from the performance POV) break user applications, as many people used to "hint" the optimizer by writing LEFT intentionally, exactly to "pin" the manually chosen join order instead of relying on the optimizer. Usually, the join is performed by the PK/FK reference and the fake WHERE condition explicitly checks that PK (which is known to never be NULL by its own) to produce the INNER result.

The proposed improvement, however, may work around this compatibility issue. In particular, checks for NULL, e.g. WHERE T2.ID IS NOT NULL for the join example above, would not transform LEFT into INNER, because theoretically they may also be used for checking fake NULLs inside T2 after LEFT JOIN. However, regular comparisons that ignore NULLs by their nature, will cause the LEFT->INNER transformation. So only hints using very artificial checks like T2.ID > 0 will be affected.

This patch has been tested in production for a while and nobody stepped on problems yet. However, in order to minimize possible risks, perhaps it should be added to the major release (v5) only, or we could provide a per-database compatibility switch in firebird.conf.

@dyemanov dyemanov self-assigned this Oct 6, 2021
@omachtandras
Copy link

Hi Dmitry!

I recommend that this could be a per-database compatibility switch in firebird.conf, and turn off as default setting.
We intentionally use this method in places where Firebird uses a bad plan for very complex querys.
It would be a serious job to explore all the places where we use this solution.
And we don't know for sure whether every colleague always uses a primary key or unique key field in the where condition to check not null.

András

@mrotteveel
Copy link
Member

In my opinion, this should not be off by default, new features or improvements that are off by default don't get used.

@aafemt
Copy link
Contributor

aafemt commented Oct 6, 2021

Per-database options is a huge pain for support. In the major release it would be better to register cases of generated bad plans as bugs for separate optimizer improvements.

@omachtandras
Copy link

@mrotteveel : We have clients who do not have physical access to the firebird config files (or firebird is used by other applications with different needs). Databases are created dynamically using our program without modifying and fierbird config files... If this behaviour can be modified in an on connect trigger, that is a sufficient solution too.

@aafemt : we have not reported these cases so far because

  • it takes too much time a test database
  • the plan seems logical (analyzing the index statistics), but
  • cause a significant record reading surplus, leading to a serious slowdown

Nevertheless, if such an anomaly occurs again, I promise we will try to bring it into a reportable format.

@dyemanov
Copy link
Member Author

dyemanov commented Oct 6, 2021

@omachtandras, it's not really a question whether PK/UK field is used or some other fields. If that field is not nullable and you check it via FIELD IS NOT NULL -- everything is OK, nothing will be changed for you. If you use some other conditions (like FIELD > 0 mentioned in this ticket), then plans may change. Can you check that?

@omachtandras
Copy link

@dyemanov : thanks for the clarification. For sure, we always use a not nullable field with field is not null in the where conditions.

@dyemanov
Copy link
Member Author

dyemanov commented Oct 6, 2021

I think it's worth adding that verification (that IS NOT NULL does not break plans) to our QA tests if/when this improvement is committed.

@EPluribusUnum
Copy link

@omachtandras , @aafemt this is the problematic case for us : #1476
In this case we use the LEFT OUTER workaround.

@sim1984
Copy link

sim1984 commented Oct 7, 2021

If I understand correctly, it can be disabled like this

SET QUERY TRANSFORMATION OFF

in the BEFORE CONNECT trigger or just before the query is executed

@aafemt
Copy link
Contributor

aafemt commented Oct 7, 2021 via email

@sim1984
Copy link

sim1984 commented Oct 7, 2021

@aafemt
I agree, but the globally disabling of some optimizer algorithms would also not hurt, since it is not always possible to rewrite the query in such a way as to assign hints to it.

@aafemt
Copy link
Contributor

aafemt commented Oct 7, 2021 via email

@sim1984
Copy link

sim1984 commented Oct 7, 2021

I vote for hints and for global switches. The PLAN clause is too limited, and in my opinion, even harmful.

@dyemanov dyemanov changed the title Transform LEFT joins into INNER ones if the WHERE condition violates the outer join rules Transform OUTER joins into INNER ones if the WHERE condition violates the outer join rules Oct 16, 2022
dyemanov added a commit that referenced this issue Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants