Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Sanitize function namespaces. #13404

Merged
merged 5 commits into from Feb 10, 2017

Conversation

knz
Copy link
Contributor

@knz knz commented Feb 3, 2017

Fixes #13313.

tl;dr: this introduces the following user-visible changes:

  1. the ability to tweak the search path with SET SEARCH_PATH = ...
  2. the current user is now reported in the output of SHOW ALL
  3. current_schemas() now behaves the same regardless of which argument is given to it
  4. the "pg_catalog." prefix disappears from the built-in function definitions, thus also from the docs generated automatically from them.

Details:

sql: fix the table namespace resolution order.

Prior to this patch the table lookup code would search first the
search path and only then the session's current database.

For compatibility with pg the session's database must be searched
first.

sql: fix the semantics of current_schemas()

The rest of this commit message uses the following definition: a db
name is in the search path explicitly if and only if it is part of
the default search path (the one initialized with the session) or it
has been mentioned explicitly in a SET SEARCH_PATH = ... statement.

Some context:

  • the (explicit) default search path in PostgreSQL is {$user,public}
    (and pg_catalog is not listed in it).

  • PostgreSQL does some magic. If the name pg_catalog is not
    explicitly in the search path, then PostgreSQL will start all name
    lookups by searching in pg_catalog. This special behavior is not
    active if the name pg_catalog is mentioned explicitly in the
    search path.

Because of this magic, the idea of a search path entails, in
PostgreSQL specifically
, a difference between an "explicit search
path" (the list of names that were put in the search path explicitly,
as per the definition above), and an "implicit search path" (the same
list, but with pg_catalog prepended to it if not already listed in
the explicit search path). The user can choose which one is returned,
in PostgreSQL, by current_schemas() using a boolean argument.

In CockroachDB there is also some magic, but it's different magic.

CockroachDB's magic is twofold.

  1. if a current session database has been set, then that is
    searched first, implicitly, before the explicit search
    path.
    That's different from PostgreSQL, as explained below.

  2. There is no special treatment for pg_catalog during lookups -- or
    rather, we have decided there would be no special treatment,
    because the idea to have to search and check whether pg_catalog
    is already in the search path or not (to decide whether to apply
    further magic) on every lookup sounds like unwanted overhead.

    Yet we deem it useful for compatibility to have pg-specific names
    searched in pg_catalog in similar circumstances. So CockroachDB
    applies different magic to make it work: a) the default search path
    in CockroachDB is {pg_catalog} and b) upon encountering SET SEARCH_PATH = ..., if the statement does not list pg_catalog
    explicitly, then it is prepended to the values stored in the
    search path.

    (Why prepended? So as to match pg's behavior when pg_catalog is
    not listed explicitly.)

(Note: PostgreSQL has no special treatment for the current database --
that's because it does not provide a way to "set the current
database", and instead the idea of "current database" in PostgreSQL is
exactly "the first value in the explicit search path".)

Because of point 2 above, the idea of "explicit" vs "implicit" search
path evaporates in CockroachDB and instead there may or may not be an
extra pg_catalog inserted in the search path automatically upon SET SEARCH_PATH. By the time the current_schemas() function is
evaluated, the information about whether a leading pg_catalog, if
any, was part of the original SET SEARCH_PATH statement or was
inserted automatically has been lost already.

So instead this patch drops any pretense that there is a difference
between "implicit" and "explicit", makes current_schemas() ignore
its boolean argument entirely and always returns the full value of the
search path -- always containing pg_catalog, wherever SET SEARCH_PATH may have decided to put it.

sql: make the lookup of built-in functions smarter wrt pg_catalog.

Prior to this patch, there were two sorts of functions:

  • functions defined with a "pg_catalog." prefix. These functions
    could be found either when used with their prefix, or without their
    prefix as long as pg_catalog was in the search path (and the
    session was defined to include pg_catalog in the search path by
    default).

  • functions defined without a "pg_catalog." prefix. These functions
    could only be found when used without a prefix.

This created a lurking bug with regards to compatibility: in
PostgreSQL, all built-in functions reside in namespace pg_catalog,
not only those fews that have motivated the "pg_catalog." prefix in
CockroachDB so far. For example, SELECT pg_catalog.count(*) FROM foo
is valid, and so is SELECT pg_catalog.length(name) FROM users. It
was just a matter of time before a user would demand compatibility
with a tool producing pg queries automatically using this form.

To address this, three solutions were considered:

  • move all built-in functions supported by CockroachDB to the
    namespace pg_catalog. Since pg_catalog is always searched (it's
    always part of the search path), unqualified names would still be
    found. This would achieve compatibility but make the name
    "pg_catalog" much more prominent in error messages, which use the
    fully qualified name of the function in all cases.

  • move all built-in functions supported by CockroachDB to a new
    namespace with a more generic name, say "sql", then tweak the
    lookup rules so that whenever the prefix "pg_catalog" is used, it
    would be substituted implicitly by "sql". This would achieve
    compatibility and also make the error messages less pg-flavored. The
    drawback of this approach is that it introduces a mandatory prefix
    "sql." in many places in the code and error messages where it was
    not present before.

  • move all built-in functions supported by both CockroachDB and
    PostgreSQL to the global namespace (the one with no prefix), then
    tweak the lookup rules so that whenever the prefix "pg_catalog" is
    used, it would be removed to look up the remainder of the name in
    the global namespace. This also achieves compatibility and has the
    least impact on the code and error messages. The drawback of this
    approach is that those functions that are really pg-specific and
    merely present to please ORMs looking at pg_proc now also appear
    in the auto-generated docs for the global namespace.

The last solution was deemed to offer the best trade-off and was thus
adopted for the time being. Remaining issues with documentation, if
any, can be addressed in a subsequent patch by adding a boolean to
mark the builtin definition of those pg compatibility functions we do
not want to promote.


This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 3, 2017

Please review the commits separately -- they are fully independent.

@knz knz force-pushed the sanitize-function-namespaces branch 2 times, most recently from 0267461 to 7cd4dff Compare February 4, 2017 00:07
@nvanbenschoten
Copy link
Member

Nice cleanup!

s/smarted/smarter/ in the fifth commit title.

Does this mean that we've abandoned the idea of a sql. namespace for all builtin functions, and an implciit mapping from pg_catalog -> sql? From the fifth commit message, that sounds like the case.


Reviewed 4 of 4 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 3 of 3 files at r4, 7 of 7 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/set.go, line 150 at r3 (raw file):

					name, v, val.ResolvedType())
			}
			if s == "pg_catalog" {

s/"pg_catalog"/pgCatalogName/


pkg/sql/table.go, line 516 at r1 (raw file):

}

// searchAndQualifyDatabase augments the table name with the database

Could you add to the commit message your source to defend that this change is adding "compatibility with pg"? Including the link you shared later would be sufficient.


pkg/sql/table.go, line 521 at r1 (raw file):

// provided TableName is modified in-place in case of success, and
// left unchanged otherwise.
func (p *planner) searchAndQualifyDatabase(tn *parser.TableName) error {

Since we're now modifying in place, how should this method behave if tn is already qualified? That should be included in the method's contract.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 9, 2017

Note to self: adjust in light of #13512 when it's merged.

Prior to this patch the table lookup code would search first the
search path and only then the session's current database.

For compatibility with pg the session's database must be searched
first.

References:

- The definition of `current_schema` (not `current_schemas!`) in
  https://www.postgresql.org/docs/9.6/static/functions-info.html
- https://www.postgresql.org/docs/9.6/static/ddl-schemas.html#DDL-SCHEMAS-PATH
The rest of this commit message uses the following definition: a db
name is in the search path *explicitly* if and only if it is part of
the default search path (the one initialized with the session) or it
has been mentioned explicitly in a `SET SEARCH_PATH = ...` statement.

Some context:

- the (explicit) default search path in PostgreSQL is `{$user,public}`
  (and pg_catalog is not listed in it).

- PostgreSQL does some magic. If the name `pg_catalog` is not
  explicitly in the search path, then PostgreSQL will start all name
  lookups by searching in `pg_catalog`. This special behavior is not
  active if the name `pg_catalog` is mentioned explicitly in the
  search path.

Because of this magic, the idea of a search path entails, *in
PostgreSQL specifically*, a difference between an "explicit search
path" (the list of names that were put in the search path explicitly,
as per the definition above), and an "implicit search path" (the same
list, but with `pg_catalog` prepended to it if not already listed in
the explicit search path). The user can choose which one is returned,
in PostgreSQL, by `current_schemas()` using a boolean argument.

In CockroachDB there is also some magic, but it's different magic.

CockroachDB's magic is twofold.

1. *if a current session database has been set, then that is
   searched first, implicitly, before the explicit search
   path.* That's different from PostgreSQL, as explained below.

2. There is no special treatment for `pg_catalog` during lookups -- or
   rather, we have decided there would be no special treatment,
   because the idea to have to search and check whether `pg_catalog`
   is already in the search path or not (to decide whether to apply
   further magic) on *every lookup* sounds like unwanted overhead.

   Yet we deem it useful for compatibility to have pg-specific names
   searched in `pg_catalog` in similar circumstances. So CockroachDB
   applies different magic to make it work: a) the default search path
   in CockroachDB is `{pg_catalog}` and b) upon encountering `SET
   SEARCH_PATH = ...`, if the statement does not list `pg_catalog`
   explicitly, then it is *prepended* to the values stored in the
   search path.

   (Why prepended? So as to match pg's behavior when `pg_catalog` is
   not listed explicitly.)

(Note: PostgreSQL has no special treatment for the current database --
that's because it does not provide a way to "set the current
database", and instead the idea of "current database" in PostgreSQL is
exactly "the first value in the explicit search path".)

Because of point 2 above, the idea of "explicit" vs "implicit" search
path evaporates in CockroachDB and instead there may or may not be an
extra `pg_catalog` inserted in the search path automatically upon `SET
SEARCH_PATH`. By the time the `current_schemas()` function is
evaluated, the information about whether a leading `pg_catalog`, if
any, was part of the original `SET SEARCH_PATH` statement or was
inserted automatically has been lost already.

So instead this patch drops any pretense that there is a difference
between "implicit" and "explicit", makes `current_schemas()` ignore
its boolean argument entirely and always returns the full value of the
search path -- always containing `pg_catalog`, wherever `SET
SEARCH_PATH` may have decided to put it.
@knz knz force-pushed the sanitize-function-namespaces branch from 7cd4dff to d218669 Compare February 9, 2017 23:21
@knz
Copy link
Contributor Author

knz commented Feb 9, 2017

Does this mean that we've abandoned the idea of a sql. namespace for all builtin functions, and an implciit mapping from pg_catalog -> sql?

No it's still there, except that the new namespace is now called "" instead of "sql". Why? Because otherwise I'd have needed also to patch up the doc generator to keep its output simple.

If/after we introduce support for both user-defined functions and custom namespaces (schemas), we will need to revisit this indeed. At that point we can safely rename the "" namespace to "sql".


Review status: 2 of 16 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/set.go, line 150 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/"pg_catalog"/pgCatalogName/

Done.


pkg/sql/table.go, line 516 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you add to the commit message your source to defend that this change is adding "compatibility with pg"? Including the link you shared later would be sufficient.

Done.


pkg/sql/table.go, line 521 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since we're now modifying in place, how should this method behave if tn is already qualified? That should be included in the method's contract.

Done.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

:lgtm:


Reviewed 14 of 14 files at r6, 3 of 3 files at r7, 3 of 3 files at r8, 3 of 3 files at r9, 7 of 7 files at r10.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/parser/builtins.go, line 1488 at r9 (raw file):

	},

	// pg_catalog functions.

I'd keep this. The section comments are useful.


Comments from Reviewable

Prior to this patch, there were two sorts of functions:

- functions defined with a "`pg_catalog.`" prefix. These functions
  could be found either when used with their prefix, or without their
  prefix as long as `pg_catalog` was in the search path (and the
  session was defined to include `pg_catalog` in the search path by
  default).

- functions defined without a "`pg_catalog.`" prefix. These functions
  could only be found when used without a prefix.

This created a lurking bug with regards to compatibility: in
PostgreSQL, *all* built-in functions reside in namespace `pg_catalog`,
not only those fews that have motivated the "`pg_catalog.`" prefix in
CockroachDB so far. For example, `SELECT pg_catalog.count(*) FROM foo`
is valid, and so is `SELECT pg_catalog.length(name) FROM users`. It
was just a matter of time before a user would demand compatibility
with a tool producing pg queries automatically using this form.

To address this, three solutions were considered:

- move all built-in functions supported by CockroachDB to the
  namespace `pg_catalog`. Since `pg_catalog` is always searched (it's
  always part of the search path), unqualified names would still be
  found. This would achieve compatibility but make the name
  "`pg_catalog`" much more prominent in error messages, which use the
  fully qualified name of the function in all cases.

- move all built-in functions supported by CockroachDB to a new
  namespace with a more generic name, say "`sql`", then tweak the
  lookup rules so that whenever the prefix "`pg_catalog`" is used, it
  would be substituted implicitly by "`sql`". This would achieve
  compatibility and also make the error messages less pg-flavored. The
  drawback of this approach is that it introduces a mandatory prefix
  "`sql.`" in many places in the code and error messages where it was
  not present before.

- move all built-in functions supported by both CockroachDB and
  PostgreSQL to the global namespace (the one with no prefix), then
  tweak the lookup rules so that whenever the prefix "`pg_catalog`" is
  used, it would be removed to look up the remainder of the name in
  the global namespace. This also achieves compatibility and has the
  least impact on the code and error messages. The drawback of this
  approach is that those functions that are really pg-specific and
  merely present to please ORMs looking at `pg_proc` now also appear
  in the auto-generated docs for the global namespace.

The last solution was deemed to offer the best trade-off and was thus
adopted for the time being. Remaining issues with documentation, if
any, can be addressed in a subsequent patch by adding a boolean to
mark the builtin definition of those pg compatibility functions we do
not want to promote.
@knz knz force-pushed the sanitize-function-namespaces branch from d218669 to 4ee37ed Compare February 10, 2017 07:49
@knz
Copy link
Contributor Author

knz commented Feb 10, 2017

TFYR!


Review status: 15 of 16 files reviewed at latest revision, 1 unresolved discussion.


pkg/sql/parser/builtins.go, line 1488 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd keep this. The section comments are useful.

Done.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: wrong name resolution order
3 participants