Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer size for trace collection #21020

Merged

Conversation

azat
Copy link
Collaborator

@azat azat commented Feb 20, 2021

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Feb 20, 2021
@alexey-milovidov
Copy link
Member

BTW query profiler can profile itself (for example Real time profiler can measure how frequently CPU time profiler was active). This is somewhat useful property.

@alexey-milovidov alexey-milovidov self-assigned this Feb 20, 2021
@alexey-milovidov alexey-milovidov added the st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken label Feb 21, 2021
@azat
Copy link
Collaborator Author

azat commented Feb 22, 2021

BTW query profiler can profile itself (for example Real time profiler can measure how frequently CPU time profiler was active). This is somewhat useful property.

I see, thanks. Okay, let's truncate the buffer size to 4K then.

@azat azat changed the title Prohibit recursive trace collections Do not exceed PIPE_BUF for trace collector (to avoid overlaps) Feb 22, 2021
@azat azat force-pushed the trace-collector-non-recursive branch from 072521c to 88ebfa2 Compare February 22, 2021 18:53
@azat azat changed the title Do not exceed PIPE_BUF for trace collector (to avoid overlaps) Fix buffer size for trace collection Feb 22, 2021
@azat azat force-pushed the trace-collector-non-recursive branch from 88ebfa2 to 7525103 Compare February 22, 2021 19:42
@azat azat force-pushed the trace-collector-non-recursive branch from 7525103 to 89c3119 Compare February 22, 2021 19:45
@alexey-milovidov
Copy link
Member

#21137

@alexey-milovidov
Copy link
Member

@azat Do you know why fuzzer did not stop after first segfault?

@azat
Copy link
Collaborator Author

azat commented Feb 24, 2021

@azat Do you know why fuzzer did not stop after first segfault?

It should be because SIGSEGV handler does not call _exit, so when thread got SIGSEGV it just send information about the signal to the pipe, and the server will be terminated only once this message in pipe will be served, but this may take a while, in the meantime SIGSEGV in another handler may happen.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Feb 24, 2021

But they are from different queries (check query_id) and fuzzer is sending queries sequentially.

@alexey-milovidov alexey-milovidov merged commit b20efdc into ClickHouse:master Feb 24, 2021
@alexey-milovidov alexey-milovidov removed the st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken label Feb 24, 2021
@azat
Copy link
Collaborator Author

azat commented Feb 24, 2021

But they are from different queries (check query_id) and fuzzer is sending queries sequentially.

Yep, but the problem is that the first query failed with different error - Data compressed with different methods:

Error on processing query 'SELECT item_id FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) AS l FULL OUTER JOIN (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) AS r USING (item_id)': Code: 271, e.displayText() = DB::Exception: Data compressed with different methods, given method byte 0x6e, previous method byte 0x82: while receiving packet from localhost:9000, Stack trace (when copying this message, always include the lines below):
Error on processing query 'SELECT item_id FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) AS l FULL OUTER JOIN (SELECT substring(randomString(0), NULL), item_id FROM t GROUP BY item_id WITH TOTALS) AS r USING (item_id)': Code: 271, e.displayText() = DB::Exception: Data compressed with different methods, given method byte 0x6e, previous method byte 0x82: while receiving packet from localhost:9000, Stack trace (when copying this message, always include the lines below):
Error on processing query 'SELECT item_id FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) AS l FULL OUTER JOIN (SELECT arrayMap(x -> reinterpretAsUInt8(substring(2, randomString(NULL), x + 65536)), range(1048577)), item_id FROM t GROUP BY item_id WITH TOTALS) AS r USING (item_id)': Code: 271, e.displayText() = DB::Exception: Data compressed with different methods, given method byte 0x6e, previous method byte 0x82: while receiving packet from localhost:9000, Stack trace (when copying this message, always include the lines below):
Error on processing query 'SELECT item_id FROM (SELECT item_id FROM t GROUP BY arrayJoin(arrayMap(x -> reinterpretAsUInt8(substring(randomString(1048576), x + 256, NULL)), range(256))), item_id WITH TOTALS) AS l FULL OUTER JOIN (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) AS r USING (item_id)': Code: 271, e.displayText() = DB::Exception: Data compressed with different methods, given method byte 0x6e, previous method byte 0x82: while receiving packet from localhost:9000, Stack trace (when copying this message, always include the lines below):

This is because fatal thread handler was called when the TCPHandler was sending totals:

4  0x00007ffff7671909 in sleepForSeconds (seconds=20) at sleep.cpp:64
5  0x00007ffff7e97674 in signalHandler (sig=6, info=0x7fff4effecf0, context=0x7fff4effebc0) at BaseDaemon.cpp:155
6  <signal handler called>
7  0x00007ffff5fb5615 in raise () from /usr/lib/libc.so.6
8  0x00007ffff5f9e862 in abort () from /usr/lib/libc.so.6
9  0x00007ffff796b182 in DB::handle_error_code (msg="Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnNullable", code=49) at Exception.cpp:47
10 0x00007ffff796b26e in DB::Exception::Exception (this=0x7fff415d9c80, msg="Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnNullable", code=49, remote_=false) at Exception.cpp:57
11 0x00007ffff3639f3f in assert_cast<DB::ColumnNullable const&, DB::IColumn const&> (from=...) at assert_cast.h:47
12 0x00007fffe270edbc in DB::DataTypeNullable::serializeBinaryBulkWithMultipleStreamsImpl () at DataTypeNullable.cpp:92
13 0x00007fffe2779680 in DB::IDataType::serializeBinaryBulkWithMultipleStreams () at IDataType.cpp:286
14 0x00007fffe2af2485 in DB::NativeBlockOutputStream::writeData () at NativeBlockOutputStream.cpp:58
15 0x00007fffe2af2956 in DB::NativeBlockOutputStream::write () at NativeBlockOutputStream.cpp:124
16 0x00007fffdcabc820 in DB::TCPHandler::sendTotals () at TCPHandler.cpp:763

And client will have 20 seconds to send other queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants