Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not mark file for distributed send as broken on EOF #19290

Merged
merged 2 commits into from Jan 21, 2021

Conversation

azat
Copy link
Collaborator

@azat azat commented Jan 19, 2021

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Do not mark file for distributed send as broken on EOF

Detailed description / Documentation draft:
ATTEMPT_TO_READ_AFTER_EOF should not be ignored if we receive it from remote (receiver), since:

2021.01.15 08:59:57.379143 [ 8271 ] {} <Error> default.dist.DirectoryMonitor: Failed to send batch due to: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof: while receiving packet from ch2.net:9000, Stack trace (when copying this message, always include the lines below): 
0. DB::throwReadAfterEOF() @ 0x86e5020 in /usr/bin/clickhouse
1. ? @ 0x870934d in /usr/bin/clickhouse
2. DB::Connection::receivePacket(std::__1::function<void (Poco::Net::Socket&)>) @ 0xf965556 in /usr/bin/clickhouse
3. DB::RemoteBlockOutputStream::RemoteBlockOutputStream(DB::Connection&, DB::ConnectionTimeouts const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Settings const&, DB::ClientInfo const&) @ 0xf6ecdbc in /usr/bin/clickhou
se
4. DB::StorageDistributedDirectoryMonitor::Batch::send() @ 0xf6e7775 in /usr/bin/clickhouse
5. DB::StorageDistributedDirectoryMonitor::processFilesWithBatching(std::__1::map<unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<unsigned long>, std::__1::allocator<std::__1::pair<unsigned long const, std 
::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&) @ 0xf6e2e04 in /usr/bin/clickhouse
6. DB::StorageDistributedDirectoryMonitor::run() @ 0xf6e0fd6 in /usr/bin/clickhouse
7. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0xecc31e0 in /usr/bin/clickhouse
8. DB::BackgroundSchedulePool::threadFunction() @ 0xecc51d2 in /usr/bin/clickhouse
9. ? @ 0xecc5f92 in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x870df5e in /usr/bin/clickhouse
11. ? @ 0x87119d3 in /usr/bin/clickhouse
12. start_thread @ 0x8ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
13. clone @ 0xfdeaf in /lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.1.1.5646)
2021.01.15 08:59:57.379211 [ 8271 ] {} <Error> default.dist.DirectoryMonitor: Marking a batch of 28 files as broken.

Cc: @alexey-milovidov (maybe you have some details on why ATTEMPT_TO_READ_AFTER_EOF has been added?)

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Jan 19, 2021
@azat azat marked this pull request as draft January 19, 2021 19:55
@azat
Copy link
Collaborator Author

azat commented Jan 19, 2021

and I don't see how ATTEMPT_TO_READ_AFTER_EOF can be received while reading local file

Ok, I see them now (tons of throwReadAfterEOF in ReadHelpers.h and ReadBuffer.h), will rewrite the patch a little

- the sender will got ATTEMPT_TO_READ_AFTER_EOF (added in
  946c275) when the client just go
  away, i.e. server had been restarted, and this is incorrect to mark the
  file as broken in this case.

- since ClickHouse#18853 the file will be checked on the sender locally, and
  in case the file was truncated CANNOT_READ_ALL_DATA will be thrown.
  But before ClickHouse#18853 the sender will not receive
  ATTEMPT_TO_READ_AFTER_EOF from the client in case of file was truncated
  on the sender, since the client will just wait for more data, IOW just hang.

- and I don't see how ATTEMPT_TO_READ_AFTER_EOF can be received while
  reading local file.
@azat azat marked this pull request as ready for review January 19, 2021 22:17
@alexey-milovidov alexey-milovidov self-assigned this Jan 21, 2021
@alexey-milovidov
Copy link
Member

AST fuzzer (ASan) — Lost connection to server. See the logs

#19108

@alexey-milovidov alexey-milovidov merged commit 062f00a into ClickHouse:master Jan 21, 2021
@azat azat deleted the dist-broken-on-EOF-fix branch January 22, 2021 18:14
vitlibar pushed a commit that referenced this pull request Feb 1, 2021
…94f500ff2ed55f45936f65a16c224c

Cherry pick #19290 to 20.12: Do not mark file for distributed send as broken on EOF
vitlibar pushed a commit that referenced this pull request Feb 1, 2021
…94f500ff2ed55f45936f65a16c224c

Cherry pick #19290 to 20.11: Do not mark file for distributed send as broken on EOF
vitlibar pushed a commit that referenced this pull request Feb 1, 2021
…4f500ff2ed55f45936f65a16c224c

Cherry pick #19290 to 21.1: Do not mark file for distributed send as broken on EOF
robot-clickhouse pushed a commit that referenced this pull request Feb 17, 2021
robot-clickhouse pushed a commit that referenced this pull request Feb 17, 2021
robot-clickhouse pushed a commit that referenced this pull request Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants