New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select from system tables when table based on table function. #55540
Select from system tables when table based on table function. #55540
Conversation
This is an automated comment for commit 517ce45 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
src/Storages/StorageProxy.h
Outdated
@@ -152,7 +152,7 @@ class StorageProxy : public IStorage | |||
CheckResults checkData(const ASTPtr & query, ContextPtr context) override { return getNested()->checkData(query, context); } | |||
void checkTableCanBeDropped([[ maybe_unused ]] ContextPtr query_context) const override { getNested()->checkTableCanBeDropped(query_context); } | |||
bool storesDataOnDisk() const override { return getNested()->storesDataOnDisk(); } | |||
Strings getDataPaths() const override { return getNested()->getDataPaths(); } | |||
Strings getDataPaths() const override { return {}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it will return wrong result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood StorageProxy
keeps tables based on table functions and with StorageMaterializedMySQL
engine. Seems like the second one might have its own data. Can we always return empty value for StorageTableFunctionProxy
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, see, for example, table function file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file
function have not empty value:
CREATE TABLE test_file AS file('test.tsv', 'TSV', 'column1 UInt32, column2 UInt32, column3 UInt32')
SELECT * FROM system.tables WHERE name='test_file'\G
Query id: a3de9511-6340-463a-a349-fabe502285c9
Row 1:
──────
data_paths: ['/var/lib/clickhouse/user_files/test.tsv']
Will look for different solution.
I am trying to understand the change, but the test succeeds on my machine even without your changes. The I am not super familiar with the implementation of postgres/mysql table functions, that's why I was happy to review this PR, so I can learn. As far as I can see in case of PostgreSQL/MySQL, the The sources I used to collect the above informations:
Do I miss something? |
The motivation of this PR is to make selecting the
This is strange. Have tested with build from the master branch:
Also with the latest ClickHouse build from
Yes. The problem is to get the |
Okay, based on your example I assume you mean when the data source is not available, right? Because in that case I noticed that issue. It wasn't clear for me from the description of the PR. Thanks for the clarification. I will try to think about this, but cannot promise anything. |
@antaljanosbenjamin Hi! Could you review the PR? Thx in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed your last commit. Thanks for pinging.
src/Storages/StorageTableFunction.h
Outdated
if (nested) | ||
return nested->getName(); | ||
return StorageProxy::getName(); | ||
return getNested()->getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change really necessary? It has the consequence of starting up the nested storage and also call renameInMemory
compared to previous version. What is the reasoning behind it?
Second, if it stays as is, you don't need to lock the nested_mutex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning behind it?
Without these changes, the first selection of the engine
row from system.tables will return StorageProxy
, instead of the actual engine name, because nested class does not created. And also does not matter remote source accessible or not.
Example:
CREATE table test_pg (id Int, value Int) AS postgresql('localhost:5432', 'test_db', 'test_table', 'test1', '123456')
SELECT engine FROM system.tables WHERE name='test_pg'\G
Row 1:
──────
engine: StorageProxy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second, if it stays as is, you don't need to lock the nested_mutex.
Yeh, will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR it was intentionally done like this to avoid constructing a nested storage just to get its type. mostly because we may fail to construct it because the remote source may be unavailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted it in the last commit
@@ -0,0 +1,34 @@ | |||
-- Tags: no-fasttest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is not suitable for fasttest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood table function disabled in the Fasttest. In the first commit CI failed because of:
executeQuery: Code: 46. DB::Exception: Unknown table function postgresql. (UNKNOWN_FUNCTION)
DETACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | ||
|
||
|
||
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01; | ||
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DETACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | |
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01; | |
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | |
DETACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | |
DETACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc03; | |
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01; | |
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02; | |
ATTACH TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc03; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you modified also TableFunctionSQLite
, it is good to check that also in my opinion. It shouldn't hurt the test but the extra checks can find some errors (if there is any).
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc04 (a int) AS mongodb('127.0.0.1:27017','test', 'my_collection', 'test_user', 'password', 'a Int'); | ||
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc05 (a int) AS redis('127.0.0.1:6379', 'key', 'key UInt32'); | ||
|
||
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01_without_schema AS postgresql('127.121.0.1:5432', 'postgres_db', 'postgres_table', 'postgres_user', '124444'); -- { serverError 614 } | ||
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02_without_schema AS mysql('127.123.0.1:3306', 'mysql_db', 'mysql_table', 'mysql_user','123123'); -- {serverError 279 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc04 (a int) AS mongodb('127.0.0.1:27017','test', 'my_collection', 'test_user', 'password', 'a Int'); | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc05 (a int) AS redis('127.0.0.1:6379', 'key', 'key UInt32'); | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01_without_schema AS postgresql('127.121.0.1:5432', 'postgres_db', 'postgres_table', 'postgres_user', '124444'); -- { serverError 614 } | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02_without_schema AS mysql('127.123.0.1:3306', 'mysql_db', 'mysql_table', 'mysql_user','123123'); -- {serverError 279 } | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc04 (a int) AS mongodb('127.0.0.1:27017','test', 'my_collection', 'test_user', 'password', 'a Int'); | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc05 (a int) AS redis('127.0.0.1:6379', 'key', 'key UInt32'); | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc01_without_schema AS postgresql('127.121.0.1:5432', 'postgres_db', 'postgres_table', 'postgres_user', '124444'); -- { serverError 614 } | |
CREATE TABLE {CLICKHOUSE_DATABASE:Identifier}.tablefunc02_without_schema AS mysql('127.123.0.1:3306', 'mysql_db', 'mysql_table', 'mysql_user','123123'); -- {serverError 279 } |
Just tiny cosmetic changes, only necessary if you modify anything, otherwise it doesn't worth to run the CI again just because of these.
Maybe one more thing. The changelog item is not the best in my opinion:
|
@antaljanosbenjamin ping |
@tavplubix, propose to back port the fix to supported ClickHouse versions. Thanks in advance! |
…462f6499f076cbeee887046694e3e Cherry pick #55540 to 23.8: Select from system tables when table based on table function.
…462f6499f076cbeee887046694e3e Cherry pick #55540 to 23.9: Select from system tables when table based on table function.
…3462f6499f076cbeee887046694e3e Cherry pick #55540 to 23.10: Select from system tables when table based on table function.
Backport #55540 to 23.10: Select from system tables when table based on table function.
Backport #55540 to 23.9: Select from system tables when table based on table function.
Backport #55540 to 23.8: Select from system tables when table based on table function.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Prevent reference to a remote data source for the
data_paths
column insystem.tables
if the table is created with a table function using explicit column description.Example:
Result:
Documentation entry for user-facing changes