Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with query result on db different than the default #12489

Open
natalia-hw opened this issue Jul 14, 2020 · 7 comments
Open

Problem with query result on db different than the default #12489

natalia-hw opened this issue Jul 14, 2020 · 7 comments
Labels
comp-distributed Distributed tables st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken unexpected behaviour

Comments

@natalia-hw
Copy link

natalia-hw commented Jul 14, 2020

We have a problem with data counted for a specific query run in DB another than default in our ClickHouse cluster (sharding and replication).

Describe the bug
We have two DBs: default and test.
In each DB we have the same tables:

  • foo (ReplicatedMergeTree)
  • bar (Distributed on bar_shard) and bar_shard (ReplicatedMergeTree)

We fill tables with the same data:

  • for foo, separately on each shard
  • for bar, by a distributed table

We run the query:
SELECT count(something_else) FROM bar WHERE something_else IN (SELECT something FROM foo);

  • on DB default result count looks as expected
  • on DB test result count is equal to count of items from bar_shard (not bar as we expect)

Maybe there is some misunderstanding from our side, but for DB default works just as we expected, only with DB different than default we face the issue.

How to reproduce
I've scripted the reproduction steps on the LTS version 20.3.10.75. Please see https://github.com/nfpp/another_db_issue_repro. It creates the DBs and tables, inserts some data, and performs the queries.

Expected behavior
The same result of query regardless of DB in usage.

Additional context
For us, it looks like the query is partially counted on the default shard (without default DB in the cluster, the query fails).
Temporary we fix it by replacing foo by "{db_name}".foo in the query.
Worth to mention that with GLOBAL IN result is correct, but we by design put those data on each shard separately.

@natalia-hw natalia-hw added the bug Confirmed user-visible misbehaviour in official release label Jul 14, 2020
@den-crane den-crane added the comp-distributed Distributed tables label Jul 14, 2020
@den-crane
Copy link
Contributor

den-crane commented Jul 14, 2020

probably the same as #10471
default database is used on shards
Technically it's not a bug, because this behavior existed from the beginning, though I believe it's not documented.

use system

select * from remote(two_shards,system,one) where dummy in (select dummy from one);
(version 19.13.7): DB::Exception: Table default.one doesn't exist..


select * from remote(two_shards,system,one) where dummy in (select dummy from system.one);
┌─dummy─┐
│     0 │
└───────┘
┌─dummy─┐
│     0 │
└───────┘

select * from remote(two_shards,system,one) where dummy global in (select dummy from one);
┌─dummy─┐
│     0 │
└───────┘
┌─dummy─┐
│     0 │
└───────┘

@natalia-hw
Copy link
Author

@den-crane I understand, but from our point of view, this looks like rather as a critical bug, because the query is a run in the wrong database. Especially when some kind of ORM is in usage is really inconvenient to replace the usual table name with table and database name. Are you planning to change this behavior? If this is a duplicate I could close it and watch appropriate issue.

@alexey-milovidov
Copy link
Member

It's non obvious how to solve it because there is no guarantee that the database exists on the remote server.

@alexey-milovidov alexey-milovidov added the st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken label Nov 18, 2020
@alexey-milovidov
Copy link
Member

Maybe we can always substitute the current database when sending a query to remote server?
It will break some queries but will look more logical.

@dgzdot
Copy link

dgzdot commented Jan 20, 2021

@alexey-milovidov Are there any plans for this issue?

@alexey-milovidov
Copy link
Member

No. It is in "discussion" stage.

@filimonov
Copy link
Contributor

One more option is to pass the current database with query and make the target server look fist in that db, and later in it's default database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-distributed Distributed tables st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken unexpected behaviour
Projects
None yet
Development

No branches or pull requests

5 participants