New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Sanitize function namespaces. #13404
Conversation
Please review the commits separately -- they are fully independent. |
0267461
to
7cd4dff
Compare
Nice cleanup!
Does this mean that we've abandoned the idea of a Reviewed 4 of 4 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 3 of 3 files at r4, 7 of 7 files at r5. pkg/sql/set.go, line 150 at r3 (raw file):
pkg/sql/table.go, line 516 at r1 (raw file):
Could you add to the commit message your source to defend that this change is adding "compatibility with pg"? Including the link you shared later would be sufficient. pkg/sql/table.go, line 521 at r1 (raw file):
Since we're now modifying in place, how should this method behave if Comments from Reviewable |
Note to self: adjust in light of #13512 when it's merged. |
Prior to this patch the table lookup code would search first the search path and only then the session's current database. For compatibility with pg the session's database must be searched first. References: - The definition of `current_schema` (not `current_schemas!`) in https://www.postgresql.org/docs/9.6/static/functions-info.html - https://www.postgresql.org/docs/9.6/static/ddl-schemas.html#DDL-SCHEMAS-PATH
The rest of this commit message uses the following definition: a db name is in the search path *explicitly* if and only if it is part of the default search path (the one initialized with the session) or it has been mentioned explicitly in a `SET SEARCH_PATH = ...` statement. Some context: - the (explicit) default search path in PostgreSQL is `{$user,public}` (and pg_catalog is not listed in it). - PostgreSQL does some magic. If the name `pg_catalog` is not explicitly in the search path, then PostgreSQL will start all name lookups by searching in `pg_catalog`. This special behavior is not active if the name `pg_catalog` is mentioned explicitly in the search path. Because of this magic, the idea of a search path entails, *in PostgreSQL specifically*, a difference between an "explicit search path" (the list of names that were put in the search path explicitly, as per the definition above), and an "implicit search path" (the same list, but with `pg_catalog` prepended to it if not already listed in the explicit search path). The user can choose which one is returned, in PostgreSQL, by `current_schemas()` using a boolean argument. In CockroachDB there is also some magic, but it's different magic. CockroachDB's magic is twofold. 1. *if a current session database has been set, then that is searched first, implicitly, before the explicit search path.* That's different from PostgreSQL, as explained below. 2. There is no special treatment for `pg_catalog` during lookups -- or rather, we have decided there would be no special treatment, because the idea to have to search and check whether `pg_catalog` is already in the search path or not (to decide whether to apply further magic) on *every lookup* sounds like unwanted overhead. Yet we deem it useful for compatibility to have pg-specific names searched in `pg_catalog` in similar circumstances. So CockroachDB applies different magic to make it work: a) the default search path in CockroachDB is `{pg_catalog}` and b) upon encountering `SET SEARCH_PATH = ...`, if the statement does not list `pg_catalog` explicitly, then it is *prepended* to the values stored in the search path. (Why prepended? So as to match pg's behavior when `pg_catalog` is not listed explicitly.) (Note: PostgreSQL has no special treatment for the current database -- that's because it does not provide a way to "set the current database", and instead the idea of "current database" in PostgreSQL is exactly "the first value in the explicit search path".) Because of point 2 above, the idea of "explicit" vs "implicit" search path evaporates in CockroachDB and instead there may or may not be an extra `pg_catalog` inserted in the search path automatically upon `SET SEARCH_PATH`. By the time the `current_schemas()` function is evaluated, the information about whether a leading `pg_catalog`, if any, was part of the original `SET SEARCH_PATH` statement or was inserted automatically has been lost already. So instead this patch drops any pretense that there is a difference between "implicit" and "explicit", makes `current_schemas()` ignore its boolean argument entirely and always returns the full value of the search path -- always containing `pg_catalog`, wherever `SET SEARCH_PATH` may have decided to put it.
7cd4dff
to
d218669
Compare
No it's still there, except that the new namespace is now called If/after we introduce support for both user-defined functions and custom namespaces (schemas), we will need to revisit this indeed. At that point we can safely rename the Review status: 2 of 16 files reviewed at latest revision, 3 unresolved discussions. pkg/sql/set.go, line 150 at r3 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/sql/table.go, line 516 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/sql/table.go, line 521 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
Reviewed 14 of 14 files at r6, 3 of 3 files at r7, 3 of 3 files at r8, 3 of 3 files at r9, 7 of 7 files at r10. pkg/sql/parser/builtins.go, line 1488 at r9 (raw file):
I'd keep this. The section comments are useful. Comments from Reviewable |
Prior to this patch, there were two sorts of functions: - functions defined with a "`pg_catalog.`" prefix. These functions could be found either when used with their prefix, or without their prefix as long as `pg_catalog` was in the search path (and the session was defined to include `pg_catalog` in the search path by default). - functions defined without a "`pg_catalog.`" prefix. These functions could only be found when used without a prefix. This created a lurking bug with regards to compatibility: in PostgreSQL, *all* built-in functions reside in namespace `pg_catalog`, not only those fews that have motivated the "`pg_catalog.`" prefix in CockroachDB so far. For example, `SELECT pg_catalog.count(*) FROM foo` is valid, and so is `SELECT pg_catalog.length(name) FROM users`. It was just a matter of time before a user would demand compatibility with a tool producing pg queries automatically using this form. To address this, three solutions were considered: - move all built-in functions supported by CockroachDB to the namespace `pg_catalog`. Since `pg_catalog` is always searched (it's always part of the search path), unqualified names would still be found. This would achieve compatibility but make the name "`pg_catalog`" much more prominent in error messages, which use the fully qualified name of the function in all cases. - move all built-in functions supported by CockroachDB to a new namespace with a more generic name, say "`sql`", then tweak the lookup rules so that whenever the prefix "`pg_catalog`" is used, it would be substituted implicitly by "`sql`". This would achieve compatibility and also make the error messages less pg-flavored. The drawback of this approach is that it introduces a mandatory prefix "`sql.`" in many places in the code and error messages where it was not present before. - move all built-in functions supported by both CockroachDB and PostgreSQL to the global namespace (the one with no prefix), then tweak the lookup rules so that whenever the prefix "`pg_catalog`" is used, it would be removed to look up the remainder of the name in the global namespace. This also achieves compatibility and has the least impact on the code and error messages. The drawback of this approach is that those functions that are really pg-specific and merely present to please ORMs looking at `pg_proc` now also appear in the auto-generated docs for the global namespace. The last solution was deemed to offer the best trade-off and was thus adopted for the time being. Remaining issues with documentation, if any, can be addressed in a subsequent patch by adding a boolean to mark the builtin definition of those pg compatibility functions we do not want to promote.
d218669
to
4ee37ed
Compare
TFYR! Review status: 15 of 16 files reviewed at latest revision, 1 unresolved discussion. pkg/sql/parser/builtins.go, line 1488 at r9 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
Fixes #13313.
tl;dr: this introduces the following user-visible changes:
SET SEARCH_PATH = ...
SHOW ALL
current_schemas()
now behaves the same regardless of which argument is given to itpg_catalog.
" prefix disappears from the built-in function definitions, thus also from the docs generated automatically from them.Details:
sql: fix the table namespace resolution order.
Prior to this patch the table lookup code would search first the
search path and only then the session's current database.
For compatibility with pg the session's database must be searched
first.
sql: fix the semantics of current_schemas()
The rest of this commit message uses the following definition: a db
name is in the search path explicitly if and only if it is part of
the default search path (the one initialized with the session) or it
has been mentioned explicitly in a
SET SEARCH_PATH = ...
statement.Some context:
the (explicit) default search path in PostgreSQL is
{$user,public}
(and pg_catalog is not listed in it).
PostgreSQL does some magic. If the name
pg_catalog
is notexplicitly in the search path, then PostgreSQL will start all name
lookups by searching in
pg_catalog
. This special behavior is notactive if the name
pg_catalog
is mentioned explicitly in thesearch path.
Because of this magic, the idea of a search path entails, in
PostgreSQL specifically, a difference between an "explicit search
path" (the list of names that were put in the search path explicitly,
as per the definition above), and an "implicit search path" (the same
list, but with
pg_catalog
prepended to it if not already listed inthe explicit search path). The user can choose which one is returned,
in PostgreSQL, by
current_schemas()
using a boolean argument.In CockroachDB there is also some magic, but it's different magic.
CockroachDB's magic is twofold.
if a current session database has been set, then that is
searched first, implicitly, before the explicit search
path. That's different from PostgreSQL, as explained below.
There is no special treatment for
pg_catalog
during lookups -- orrather, we have decided there would be no special treatment,
because the idea to have to search and check whether
pg_catalog
is already in the search path or not (to decide whether to apply
further magic) on every lookup sounds like unwanted overhead.
Yet we deem it useful for compatibility to have pg-specific names
searched in
pg_catalog
in similar circumstances. So CockroachDBapplies different magic to make it work: a) the default search path
in CockroachDB is
{pg_catalog}
and b) upon encounteringSET SEARCH_PATH = ...
, if the statement does not listpg_catalog
explicitly, then it is prepended to the values stored in the
search path.
(Why prepended? So as to match pg's behavior when
pg_catalog
isnot listed explicitly.)
(Note: PostgreSQL has no special treatment for the current database --
that's because it does not provide a way to "set the current
database", and instead the idea of "current database" in PostgreSQL is
exactly "the first value in the explicit search path".)
Because of point 2 above, the idea of "explicit" vs "implicit" search
path evaporates in CockroachDB and instead there may or may not be an
extra
pg_catalog
inserted in the search path automatically uponSET SEARCH_PATH
. By the time thecurrent_schemas()
function isevaluated, the information about whether a leading
pg_catalog
, ifany, was part of the original
SET SEARCH_PATH
statement or wasinserted automatically has been lost already.
So instead this patch drops any pretense that there is a difference
between "implicit" and "explicit", makes
current_schemas()
ignoreits boolean argument entirely and always returns the full value of the
search path -- always containing
pg_catalog
, whereverSET SEARCH_PATH
may have decided to put it.sql: make the lookup of built-in functions smarter wrt
pg_catalog
.Prior to this patch, there were two sorts of functions:
functions defined with a "
pg_catalog.
" prefix. These functionscould be found either when used with their prefix, or without their
prefix as long as
pg_catalog
was in the search path (and thesession was defined to include
pg_catalog
in the search path bydefault).
functions defined without a "
pg_catalog.
" prefix. These functionscould only be found when used without a prefix.
This created a lurking bug with regards to compatibility: in
PostgreSQL, all built-in functions reside in namespace
pg_catalog
,not only those fews that have motivated the "
pg_catalog.
" prefix inCockroachDB so far. For example,
SELECT pg_catalog.count(*) FROM foo
is valid, and so is
SELECT pg_catalog.length(name) FROM users
. Itwas just a matter of time before a user would demand compatibility
with a tool producing pg queries automatically using this form.
To address this, three solutions were considered:
move all built-in functions supported by CockroachDB to the
namespace
pg_catalog
. Sincepg_catalog
is always searched (it'salways part of the search path), unqualified names would still be
found. This would achieve compatibility but make the name
"
pg_catalog
" much more prominent in error messages, which use thefully qualified name of the function in all cases.
move all built-in functions supported by CockroachDB to a new
namespace with a more generic name, say "
sql
", then tweak thelookup rules so that whenever the prefix "
pg_catalog
" is used, itwould be substituted implicitly by "
sql
". This would achievecompatibility and also make the error messages less pg-flavored. The
drawback of this approach is that it introduces a mandatory prefix
"
sql.
" in many places in the code and error messages where it wasnot present before.
move all built-in functions supported by both CockroachDB and
PostgreSQL to the global namespace (the one with no prefix), then
tweak the lookup rules so that whenever the prefix "
pg_catalog
" isused, it would be removed to look up the remainder of the name in
the global namespace. This also achieves compatibility and has the
least impact on the code and error messages. The drawback of this
approach is that those functions that are really pg-specific and
merely present to please ORMs looking at
pg_proc
now also appearin the auto-generated docs for the global namespace.
The last solution was deemed to offer the best trade-off and was thus
adopted for the time being. Remaining issues with documentation, if
any, can be addressed in a subsequent patch by adding a boolean to
mark the builtin definition of those pg compatibility functions we do
not want to promote.
This change is