Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use search_path instead of db name on pg connections #13502

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

proddata
Copy link
Member

@proddata proddata commented Jan 19, 2023

Summary of the changes / Why this improves CrateDB

  • Changed the behavior of PostgreSQL wire protocol connections: Instead of using the database name provided in the connection as the search_path, CrateDB now uses the provided search_path session setting. This change is in line with PostgreSQL and improves tool compatibility. To set the search_path, one can now e.g. use the JDBC driver's currentSchema property.

doc is the default schema in CrateDB, similar to public in PostgreSQL. On http-connections this gets automatically set in the search_path. On pg-connections, if a database/catalog name is supplied, the database/catalog name is used instead. Some tools require this to be set to the correct and only database/catalog name in CrateDB crate. However this alters the search_path to crate as well. Tools expecting to be able to set the search path, by providing a search_path property on connection - as e.g. possible with PostgreSQL JDBC using currentSchema - will likely end up with a deviating search_path. This is e.g. problematic in tools like DBeaver.


This is likely a left-over from the MySQL-comaptibility days

relates to ##13472

Checklist

  • Added an entry in CHANGES.txt for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in CHANGES.txt
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@proddata
Copy link
Member Author

This would probably need some additional changes in the documentation (also the clients specific one), but I would like to get feedback if we go ahead with this.

@seut
Copy link
Member

seut commented Mar 16, 2023

@proddata
Sorry for delay, this somehow slipped through.
I think we should do this change, but should carefully check all drivers compatibility before merging.
Beside of documentation, tests should also be added to validate the behaviour.

@proddata
Copy link
Member Author

@proddata Sorry for delay, this somehow slipped through. I think we should do this change, but should carefully check all drivers compatibility before merging. Beside of documentation, tests should also be added to validate the behaviour.

No worries, I wanted to get feedback first if this is feasible, before spending more time on it.
This change does break some existing tests. I will try to come up with something, also in regards what might need to be tested.

@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented Aug 8, 2023

Hi @proddata, thanks for the PR!

We added it to 5.5 board as we should proceed with this change. We need to test that crash, crate-python work and all tests are passing and we don't break tools compatibility.

@proddata
Copy link
Member Author

proddata commented Aug 8, 2023

Hi @proddata, thanks for the PR!

We added it to 5.5 board as we should proceed with this change. We need to test that crash, crate-python and all tests are passing and we don't break tools compatibility.

This will improve compatibility with some tools, as the setting of a search_path on connection by e.g. JDBC based, but also other tools has been ignored. Right now you need to put a "wrong" database name to set a search path to a session, however this also implies, that if you use "crate" (i.e. the correct database/catalog name) you end up with crate in the search_path. While this is documented in some places, it also is contradicting.

@matriv
Copy link
Contributor

matriv commented Aug 8, 2023

This will improve compatibility with some tools, as the setting of a search_path on connection by e.g. JDBC based, but also other tools has been ignored. Right now you need to put a "wrong" database name to set a search path to a session, however this also implies, that if you use "crate" (i.e. the correct database/catalog name) you end up with crate in the search_path. While this is documented in some places, it also is contradicting.

I agree, and I guess that tools for PG wouldn't expect the db name (jdbc:postgresql:host:port/<dbname>) to become part of search_path so I think we don't have an issue of breaking compatibility there, so it's mainly to check and maybe adjust our tools as @BaurzhanSakhariev suggested (crash, crate-python, etc.). right?

@proddata
Copy link
Member Author

proddata commented Aug 8, 2023

our tools as @BaurzhanSakhariev suggested (crash, crate-python, etc.). right?

This should neither effect crash, nor crate-python as they both use the http-endpoint which is unaffected.

@amotl
Copy link
Member

amotl commented Aug 8, 2023

Hi. Do you think we would need to check the JavaScript node-postgres driver, and the Python asyncpg and psycopg drivers?

@hlcianfagna
Copy link
Contributor

For people that may have applications configured relying on the current behavior (used "database name" to point the application to tables on a certain schema) it would be useful to have the ability to set default session settings for a user as proposed in #12109 so that they can upgrade CrateDB without having to reconfigure the applications (the applications may not have settings available to set the default schema/search_path or they may be under the control of a different team or require a different change control process).

@proddata
Copy link
Member Author

proddata commented Aug 9, 2023

Hi. Do you think we would need to check the JavaScript node-postgres driver, and the Python asyncpg and psycopg drivers?

What do you want to check? :)
A change in behaviour is expected for both.

For people that may have applications configured relying on the current behavior (used "database name" to point the application to tables on a certain schema)

I think it is a rather theoretical problem. Any tool, where one would expect PostgreSQL compatibility potentially currently breaks (e.g. DBeaver). Any custom tool could use a SET search_path = 'my_value' right after opening a connection (already possible today). Also this would only affect users using a db-name other than doc

@hammerhead
Copy link
Member

In PostgreSQL, search_path defaults to "$user", public. If we stick to the same pattern and have it include doc by default (as our equivalent to public), I imagine this will help limit the impact on tool compatibility. Querying tables in doc would then work independently of the database name provided in the connection URL.

@proddata
Copy link
Member Author

proddata commented Aug 9, 2023

In PostgreSQL, search_path defaults to "$user", public. If we stick to the same pattern and have it include doc by default (as our equivalent to public), I imagine this will help limit the impact on tool compatibility. Querying tables in doc would then work independently of the database name provided in the connection URL.

Agreed. That is how this change would behave. If search_path is not set on connection it will default to doc

% psql -h 'localhost' -U 'crate' -W 
crate=> SHOW search_path;
 setting 
---------
 doc
(1 row)

and i.e. will have the same behaviour as crash

% crash
cr>  SHOW search_path;
+---------+
| setting |
+---------+
| doc     |
+---------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Should
Development

Successfully merging this pull request may close these issues.

None yet

7 participants