Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result in a distributed query #7666

Closed
SaltTan opened this issue Nov 6, 2019 · 17 comments
Closed

Unexpected result in a distributed query #7666

SaltTan opened this issue Nov 6, 2019 · 17 comments
Labels
st-wontfix Known issue, no plans to fix it currenlty

Comments

@SaltTan
Copy link
Contributor

SaltTan commented Nov 6, 2019

I installed 19.16.2 on one node of the cluster that runs 19.13.7, and I observe unexpected behaviour when I try to run a distributed query across the cluster:

Connected to ClickHouse server version 19.16.2 revision 54427.

select version(), timezone(), uptime() from cluster(segmented,system,one)

┌─version()─┬─timezone()─┬─uptime()─┐
│ 19.16.2.2 │ Etc/UTC    │    24322 │
│ 19.16.2.2 │ Etc/UTC    │    24322 │
│ 19.16.2.2 │ Etc/UTC    │    24322 │
│ 19.16.2.2 │ Etc/UTC    │    24322 │
│ 19.16.2.2 │ Etc/UTC    │    24322 │
│ 19.16.2.2 │ Etc/UTC    │    24322 │
└───────────┴────────────┴──────────┘

Connected to ClickHouse server version 19.13.7 revision 54425.

select version(), timezone(), uptime(), dummy from cluster(segmented,system,one)

┌─version()──┬─timezone()─┬─uptime()─┬─dummy─┐
│ 19.13.7.57 │ Etc/UTC    │    28025 │     0 │
│ 19.13.7.57 │ Etc/UTC    │    28231 │     0 │
│ 19.13.7.57 │ Etc/UTC    │    28204 │     0 │
│ 19.13.7.57 │ Etc/UTC    │    28215 │     0 │
│ 19.13.7.57 │ Etc/UTC    │    28195 │     0 │
│ 19.16.2.2  │ Etc/UTC    │    24321 │     0 │
└────────────┴────────────┴──────────┴───────┘

Sometimes when I run this query on 19.16.2 I get an exception:

2019.11.06 16:50:44.050025 [ 92 ] {b622279b-5799-4313-bfb5-ed5f3af698de} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Cannot convert column `uptime()` because it is constant but values of constants are different in source and result (version 19.16.2.2 (official build)) (from [::1]:35134) (in query: select version(), timezone(), uptime() from cluster(segmented,system,one)), Stack trace:

0. 0x55588af4c7b0 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55588af4c585 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x55588ad49828 ? /usr/bin/clickhouse
3. 0x55588e2b9d44 DB::InterpreterSelectQuery::unifyStreams(DB::InterpreterSelectQuery::Pipeline&, DB::Block) /usr/bin/clickhouse
4. 0x55588e2c68d6 DB::InterpreterSelectQuery::executeWithMultipleStreams() /usr/bin/clickhouse
5. 0x55588e2e3b73 DB::InterpreterSelectWithUnionQuery::executeWithMultipleStreams() /usr/bin/clickhouse
6. 0x55588e2e5a58 DB::InterpreterSelectWithUnionQuery::execute() /usr/bin/clickhouse
7. 0x55588e3f1586 ? /usr/bin/clickhouse
8. 0x55588e3f274e DB::executeQuery(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) /usr/bin/clickhouse
9. 0x55588afd4964 DB::TCPHandler::runImpl() /usr/bin/clickhouse
10. 0x55588afd539b DB::TCPHandler::run() /usr/bin/clickhouse
11. 0x55588ee6dfe0 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
12. 0x55588ee6e6fd Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
13. 0x5558905448d1 Poco::PooledThread::run() /usr/bin/clickhouse
14. 0x55589054267c Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
15. 0x555890cb41e0 ? /usr/bin/clickhouse
16. 0x7f10171cc4a4 start_thread /lib/x86_64-linux-gnu/libpthread-2.24.so
17. 0x7f1016b02d0f clone /lib/x86_64-linux-gnu/libc-2.24.so

@amosbird @KochetovNicolai could you please help with investigation?

@SaltTan SaltTan added the bug Confirmed user-visible misbehaviour in official release label Nov 6, 2019
@amosbird
Copy link
Collaborator

amosbird commented Nov 7, 2019

It seems some scalar function is not suitable for scalar optimization. Try setting enable_scalar_subquery_optimization to false for the moment.

@SaltTan
Copy link
Contributor Author

SaltTan commented Nov 7, 2019

It seems some scalar function is not suitable for scalar optimization. Try setting enable_scalar_subquery_optimization to false for the moment.

There is no such setting in 19.16.2

@SaltTan
Copy link
Contributor Author

SaltTan commented Nov 14, 2019

I tried 19.17.2.4 and got an exception for both version() and uptime():

DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: '19.13.7.57', expected: '19.17.2.4'

DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: 705665, expected: 424

2019.11.14 17:47:27.664726 [ 84 ] {19d373a7-2d4c-4c51-9783-58870d6068d4} <Error> HTTPHandler: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: '19.13.7.57', expected: '19.17.2.4', Stack trace:

0. 0x5633aae35c40 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x5633aae35a15 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x5633aab2486a ? /usr/bin/clickhouse
3. 0x5633aab2548c ? /usr/bin/clickhouse
4. 0x5633ae7fbeff DB::MergingSortedBlockInputStream::init(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&) /usr/bin/clickhouse
5. 0x5633ae7fd78f DB::MergingSortedBlockInputStream::readImpl() /usr/bin/clickhouse
6. 0x5633ae08bbda DB::IBlockInputStream::read() /usr/bin/clickhouse
7. 0x5633ae7d1a6b DB::ExpressionBlockInputStream::readImpl() /usr/bin/clickhouse
8. 0x5633ae08bbda DB::IBlockInputStream::read() /usr/bin/clickhouse
9. 0x5633ae0aa0cb DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*) /usr/bin/clickhouse
10. 0x5633ae31df97 DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) /usr/bin/clickhouse
11. 0x5633aaeca12e DB::HTTPHandler::processQuery(Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) /usr/bin/clickhouse
12. 0x5633aaecc4a9 DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) /usr/bin/clickhouse
13. 0x5633aedbb219 Poco::Net::HTTPServerConnection::run() /usr/bin/clickhouse
14. 0x5633aedb7790 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
15. 0x5633aedb7ead Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
16. 0x5633b048e021 Poco::PooledThread::run() /usr/bin/clickhouse
17. 0x5633b048bdcc Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
18. 0x5633b0c01f40 ? /usr/bin/clickhouse
19. 0x7ff6d6f1a4a4 start_thread /lib/x86_64-linux-gnu/libpthread-2.24.so
20. 0x7ff6d6850d0f clone /lib/x86_64-linux-gnu/libc-2.24.so
 (version 19.17.2.4 (official build))

@KochetovNicolai
Copy link
Member

Suppose I need to continue fixing it.
@alexey-milovidov maybe the correct way to solve it is to expect that version() and uptime() are not constant even in scope of query?

@SaltTan
Copy link
Contributor Author

SaltTan commented Dec 20, 2019

I have just realized that the exception only happens when I sort the data:

SELECT
    version(),
    uptime()
FROM cluster(segmented, system, one)

┌─version()──┬─uptime()─┐
│ 19.16.6.17 │  1383195 │
│ 19.13.7.57 │  3813312 │
│ 19.13.7.57 │  1744510 │
│ 19.13.7.57 │  3813483 │
│ 19.13.7.57 │  1742709 │
│ 19.16.9.37 │     5344 │
└────────────┴──────────┘

6 rows in set. Elapsed: 0.403 sec.

:) select version(), uptime() from cluster(segmented,system,one) order by rand()

SELECT
    version(),
    uptime()
FROM cluster(segmented, system, one)
ORDER BY rand() ASC

↗ Progress: 5.00 rows, 5.00 B (46.34 rows/s., 46.34 B/s.) Received exception from server (version 19.16.6):
Code: 171. DB::Exception: Received from localhost:9000. DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: '19.13.7.57', expected: '19.16.6.17'.

The same happens with dictGet called with a constant argument:

:) select dictGetString('node', 'name', toUInt64(1)) from cluster(segmented,system,one) order by rand()

SELECT dictGetString('node', 'name', toUInt64(1))
FROM cluster(segmented, system, one)
ORDER BY rand() ASC

↘ Progress: 5.00 rows, 5.00 B (46.10 rows/s., 46.10 B/s.) Received exception from server (version 19.16.6):
Code: 171. DB::Exception: Received from localhost:9000. DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: 's794', expected: 's797'.

@vapongit
Copy link

vapongit commented Jan 3, 2020

some more examples (19.17.6.36).

  1. Correct
SELECT 
    version() AS v, 
    hostName() AS h, 
    uptime() AS u
FROM cluster(cluster, system, one)
┌─v──────────┬─h────────────┬──────u─┐
│ 19.17.6.36 │ clickhouse05 │ 609636 │
└────────────┴──────────────┴────────┘
┌─v──────────┬─h────────────┬──────u─┐
│ 19.17.6.36 │ clickhouse04 │ 610319 │
└────────────┴──────────────┴────────┘
┌─v──────────┬─h────────────┬──────u─┐
│ 19.17.6.36 │ clickhouse06 │ 610290 │
└────────────┴──────────────┴────────┘
  1. Add sorting. Incorrect
SELECT 
    version() AS v, 
    hostName() AS h, 
    uptime() AS u
FROM cluster(cluster, system, one)
ORDER BY h ASC
Code: 171. DB::Exception: Received from localhost:9000. DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants, actual: 610324, expected: 609641. 
  1. Add GROUP BY. Incorrect
SELECT 
    version() AS v, 
    hostName() AS h, 
    uptime() AS u
FROM cluster(cluster, system, one)
GROUP BY h
┌─v──────────┬─h────────────┬──────u─┐
│ 19.17.6.36 │ clickhouse04 │ 609981 │
│ 19.17.6.36 │ clickhouse06 │ 609981 │
│ 19.17.6.36 │ clickhouse05 │ 609981 │
└────────────┴──────────────┴────────┘

NOTE: incorrect result and there are no errors 'Column X is not under aggregate function and not in GROUP BY' for columns v and u

  1. Using COUNT() and system.one. Correct
SELECT 
   hostName() AS h, 
   count()
FROM cluster(cluster, system, one)
GROUP BY h
┌─h────────────┬─count()─┐
│ clickhouse04 │       1 │
│ clickhouse06 │       1 │
│ clickhouse05 │       1 │
└──────────────┴─────────┘
  1. Using COUNT() and non-sytem table. Incorrect
SELECT 
    hostName() AS h, 
    count()
FROM cluster(cluster, default, raw_events_log_shard)
GROUP BY h
Code: 8. DB::Exception: Received from localhost:9000. DB::Exception: Cannot find column `hostName()` in source stream. 
  1. Replace COUNT() with SUM(1). Correct
SELECT 
    hostName() AS h, 
    sum(1)
FROM cluster(cluster, default, raw_events_log_shard)
GROUP BY h
┌─h────────────┬──────sum(1)─┐
│ clickhouse04 │ 20740513053 │
│ clickhouse06 │ 20732537512 │
│ clickhouse05 │ 20764740906 │
└──────────────┴─────────────┘
  1. uptime() + COUNT()+ GROUP BY. Incorrect for system and non-sytem tables
    a) system table
SELECT 
    uptime() AS u, 
    count()
FROM cluster(cluster, system, one)
GROUP BY u
┌──────u─┬─count()─┐
│ 611037 │       3 │
└────────┴─────────┘

b) non-system table

SELECT 
    uptime() AS u, 
    count()
FROM cluster(cluster, default, raw_events_log_shard)
GROUP BY u
┌──────u─┬─────count()─┐
│ 611288 │ 62243852157 │
└────────┴─────────────┘

NOTE: correct result for count() but incorrect for grouping column

  1. Add materialize to uptime() + COUNT()+ GROUP BY. Correct for system table, incorrect for non-system table
    a) system table
SELECT 
    materialize(uptime()) AS u, 
    count()
FROM cluster(cluster, system, one)
GROUP BY u
┌──────u─┬─count()─┐
│ 611706 │       1 │
│ 611052 │       1 │
│ 611735 │       1 │
└────────┴─────────┘

b) non-system table

SELECT 
    materialize(uptime()) AS u, 
    count()
FROM cluster(cluster, default, raw_events_log_shard)
GROUP BY u
Code: 8. DB::Exception: Received from localhost:9000. DB::Exception: Cannot find column `materialize(uptime())` in source stream.

@SaltTan
Copy link
Contributor Author

SaltTan commented Aug 6, 2020

I cannot reproduce the sorting issue anymore (DB::Exception: Block structure mismatch in MergingSorted stream: different values of constants).
I tried 20.3.11, 20.3.13, 20.4.7, 20.5.2, 20.5.4

@SaltTan
Copy link
Contributor Author

SaltTan commented Aug 6, 2020

The grouping issue:

SELECT
    version() AS h,
    count()
FROM remote('localhost,remotehost', system, one)
GROUP BY h

┌─h─────────┬─count()─┐
│ 20.5.4.40 │       2 │
└───────────┴─────────┘


SELECT
    materialize(version()) AS h,
    count()
FROM remote('localhost,remotehost', system, one)
GROUP BY h

┌─h─────────┬─count()─┐
│ 20.5.2.7  │       1 │
│ 20.5.4.40 │       1 │
└───────────┴─────────┘

@KochetovNicolai
Copy link
Member

@alexey-milovidov the problem is the same as I showed for randConstant and now64.
Maybe just remove isDeterministicInScopeOfQuery for it?

@alexey-milovidov
Copy link
Member

@KochetovNicolai Ok, let's do it.

I see that the isDeterministicInScopeOfQuery property is only used for JIT and RedundantFunctionsInOrderByVisitor.

@SaltTan
Copy link
Contributor Author

SaltTan commented Oct 15, 2020

The grouping issue is still relevant in 20.9.3.45

@SaltTan SaltTan changed the title 19.16 Unexpected result or exception in a distributed query Unexpected result or exception in a distributed query Oct 28, 2020
@azat
Copy link
Collaborator

azat commented Nov 19, 2020

Looks like has been fixed in 983aaa3/93e4802fa4919894 (#16618, 20.12+)

@alexey-milovidov the problem is the same as I showed for randConstant and now64.
Maybe just remove isDeterministicInScopeOfQuery for it?

@KochetovNicolai but isn't it isSuitableForConstantFolding?

@KochetovNicolai
Copy link
Member

but isn't it isSuitableForConstantFolding?

I think it is not suitable. Because otherwise we could replace this function calculation by constant (it has not arguments and return constant result). It is hard to detect if we have some other node which returns different version result.

@azat
Copy link
Collaborator

azat commented Nov 20, 2020

I think it is not suitable

I'm talking about that, that query, with GROUP BY version(), does not work because version returns true for isSuitableForConstantFolding before 93e4802.

In other words isSuitableForConstantFolding returning true looks like a fix for the problem.

@SaltTan
Copy link
Contributor Author

SaltTan commented May 10, 2021

This is broken again in 21.3

select version(), uptime() from cluster(segmented,system,one)

version() | uptime()
----------+---------
21.4.3.21 |   344985
20.9.7.11 |   344985
21.4.3.21 |   344985
20.9.7.11 |   344985
21.4.3.21 |   344985

@alexey-milovidov alexey-milovidov changed the title Unexpected result or exception in a distributed query Unexpected result in a distributed query May 10, 2021
@alexey-milovidov
Copy link
Member

Workaround:
SELECT any(version()), any(uptime()) FROM cluster(segmented, system, one) GROUP BY hostName()

@alexey-milovidov alexey-milovidov added st-wontfix Known issue, no plans to fix it currenlty and removed bug Confirmed user-visible misbehaviour in official release labels Jun 14, 2021
@alexey-milovidov
Copy link
Member

The place where the expression in SQL query is executed is not specified for distributed queries.
That's why I remove the "bug" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
st-wontfix Known issue, no plans to fix it currenlty
Projects
None yet
Development

No branches or pull requests

8 participants