-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize schema managers' ::listTables() methods #5268
Conversation
0121628
to
274830a
Compare
6fb2357
to
699aad8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great 👍
Please document deprecations in UPGRADE.md
use function array_shift; | ||
|
||
/** | ||
* Base class for schema managers that improves database schema introspection performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of improvements makes sense in a commit message or PR, but in code, one might wonder what is the slow version of this.
* Base class for schema managers that improves database schema introspection performance. | |
* Base class for schema managers that provides good database schema introspection performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, just avoid qualitative statements at all
Base class for schema managers that provides database schema introspection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the original "improves" is more appropriate than "provides good". But I believe I want to get rid of this class entirely as follows:
- Declare the new abstract methods in the existing class as non-abstract with the
throw new NotImplemented()
body. - Create temporary deprecated methods that leverage the new API (e.g.
protected doListTableDetails()
). - In those classes that want to opt into the new API (all bundled schema managers), implement the abstract methods (already done) and instead of inheriting the default implementation of
listTableDetails()
from the base class, calldoListTableDetails()
.
Otherwise, if we want to do more rework like this in the next minor release (e.g. listing sequences, users, etc.), we'll have to introduce another temporary abstract class turning it into a mess.
699aad8
to
47132e6
Compare
Co-authored-by: mondrake <mondrake.org@gmail.com>
47132e6
to
194c6eb
Compare
Well, this is an advancement! 🚀🚀🚀 Thank you |
Fixes #2676
Closes #2766
Closes #4882
Change summary
If an entire schema needs to be introspected, instead of listing the tables and then introspecting each table individually, the new implementation fetches all schema objects of each type in one query and then groups them into tables. This reduces the number of queries used for introspection from O(N) (where N is the number of tables in the schema) to O(1).
Design considerations
From the design standpoint, schema introspection deserves a separate API. Similar to the wrapper connection and the platform classes, the schema manager is already a God object.
Introducing a schema introspection API is the next logical step in improving the quality of the codebase but I decided not to do it now:
Proposed design
All common logic for introspecting the entire schema in one shot is implemented in the abstract schema manager class. In order to be able to introduce new abstract methods, we introduce a temporary internalSee #5268 (comment).AbstractIntrospectingSchemaManager
and extend all concrete schema managers from it. This temporary class will be merged intoAbstractSchemaManager
in the next major release.Instead of using the now deprecated
AbstractPlatform::getList*SQL()
methods, the schema managers build the queries themselves. This has the following advantages:SELECT COLUMN_NAME AS field
from the schema introspection queries since each schema manager will process the result of the query that it builds itself for itself.Performance considerations
From my previous experience with Oracle, the time a query against a schema introspection view takes doesn't depend that much on whether a single table or the entire schema is introspected. Introspecting N tables via an individual query each would take N times longer than introspecting them all at once.
In my development environment, the integration test suite takes ~40% less time to run with this change:
Compatibility considerations
Depending on whether the
AbstractPlatform::getList*SQL()
methods are considered part of the SPI, the proposed changes may imply a BC break. If an API consumer uses a bundled platform and a schema manager and overrides one of these methods, the overridden logic will be no longer taken into account since the schema managers now use their own implementation.Until the next major release, the implementation of the
AbstractPlatform::getList*SQL()
methods remains intact, tested but no longer used.Deprecations
The public Platform methods that generated SQL for introspecting a single table have been deprecated.
Implementation details
Changes in schema introspection logic:
COLSEQ
when joining referencing and referenced columns of a foreign key constraint. Otherwise, the query would return a product of the two sets and result in columns being returned out of order:dbal/src/Platforms/DB2Platform.php
Lines 410 to 413 in 699aad8
After:
dbal/src/Schema/DB2SchemaManager.php
Lines 350 to 354 in 699aad8
WHERE TABSCHEMA = <database>
clause to the column, index, etc. introspection queries. Otherwise, they might return objects from other schemas.DISTINCT
clause in the query (which is rarely a good idea) which in turn was used to select the columns that weren't used anywhere.TODO:
AbstractIntrospectingSchemaManager::normalizeIdentifier()
.