Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse raise an Assertion when execute select on hdfs engine on 21.8.3-lts #29251

Closed
mo-avatar opened this issue Sep 22, 2021 · 3 comments · Fixed by #29276
Closed

ClickHouse raise an Assertion when execute select on hdfs engine on 21.8.3-lts #29251

mo-avatar opened this issue Sep 22, 2021 · 3 comments · Fixed by #29276
Assignees
Labels
comp-hdfs potential bug To be reviewed by developers and confirmed/rejected.

Comments

@mo-avatar
Copy link
Contributor

You have to provide the following information whenever possible.

Describe what's wrong
Clickhouse raise an Assertion when execute select on HDFS engine.

Does it reproduce on recent release?

The list of releases
very likely can be reproduced on recent release.

How to reproduce
Execute select on hdfs engine on 21.8.3.-lts with kerberos authentication.

  • Which ClickHouse server version to use : v21.8.3.44-lts
  • Which interface to use, if matters : clickhouse-client
  • Non-default settings, if any:
<hdfs>
    <hadoop_kerberos_keytab>/opt/keytab/user.keytab</hadoop_kerberos_keytab>
    <hadoop_kerberos_principal>test@HADOOP.COM</hadoop_kerberos_principal>
    <hadoop_security_authentication>kerberos</hadoop_security_authentication>
</hdfs>

<hdfs_root>
    <hadoop_kerberos_principal>test@HADOOP.COM</hadoop_kerberos_principal>
</hdfs_root>
  • CREATE TABLE statements for all tables involved
create TABLE hdfs_table
(
    `id` UInt64,
    `city` String
)
ENGINE = HDFS('hdfs://ip:port/test_dir/clickhouse_hdfs_file3', 'CSV')
  • Sample data for all these tables, use [clickhouse-obfuscator]
    Any data that fits table above with csv format.
  • Queries to run that lead to unexpected result
select * from hdfs_table;

Expected behavior

Expect the select statement to work normally.

Error message and/or stacktrace

2021.09.22 15:33:25.261907 [ 4011704 ] {797885a2-410f-436e-9894-26897d78e2e8} <Trace> ParallelParsingInputFormat: Parallel parsing i  s used
clickhouse-server: ../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion `!hasPendingData()' failed.
2021.09.22 15:33:25.272138 [ 3986937 ] {} <Trace> BaseDaemon: Received signal 6
2021.09.22 15:33:25.272675 [ 4012596 ] {} <Fatal> BaseDaemon: ########################################
2021.09.22 15:33:25.273419 [ 4012596 ] {} <Fatal> BaseDaemon: (version 21.8.3.1, build id: 2FED2C94CBD7F5B606308079A9DE55766D7C0B90)   (from thread 4011707) (query_id: 797885a2-410f-436e-9894-26897d78e2e8) Received signal Aborted (6)
2021.09.22 15:33:25.273838 [ 4012596 ] {} <Fatal> BaseDaemon:
2021.09.22 15:33:25.274214 [ 4012596 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f0144d0377b 0x7f0144d04aa1 0x7f0144cfc03a 0x7f0144cfc0  b2 0x7f0147c68c31 0x7f012ee73418 0x7f0147c68cb5 0x7f014754c61e 0x7f01251a7863 0x7f01251ae0c7 0x7f01251ad801 0x7f01252a92ea 0x7f01252  a08ba 0x7f01252a9774 0x7f01252b1d10 0x7f01252af98b 0x7f014748a26d 0x7f01474842a4 0x7f0147487026 0x7f0147493b5b 0x7f014510ff4b 0x7f01  44dc27ef
2021.09.22 15:33:25.274671 [ 4012596 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x3977b in /usr/lib64/libc-2.28.so
2021.09.22 15:33:25.275039 [ 4012596 ] {} <Fatal> BaseDaemon: 5. abort @ 0x3aaa1 in /usr/lib64/libc-2.28.so
2021.09.22 15:33:25.275152 [ 4012596 ] {} <Fatal> BaseDaemon: 6. ? @ 0x3203a in /usr/lib64/libc-2.28.so
2021.09.22 15:33:25.275481 [ 4012596 ] {} <Fatal> BaseDaemon: 7. ? @ 0x320b2 in /usr/lib64/libc-2.28.so
2021.09.22 15:33:26.000203 [ 3987456 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 735.31 MiB, peak 735.31 MiB, will set to   744.02 MiB (RSS), difference: 8.70 MiB
2021.09.22 15:33:26.268361 [ 4012596 ] {} <Fatal> BaseDaemon: 8. /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/../src/  IO/ReadBuffer.h:59: DB::ReadBuffer::next() @ 0xe0c31 in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/programs/server/  libclickhouse-server-libd.so
2021.09.22 15:33:26.307842 [ 4012596 ] {} <Fatal> BaseDaemon: 9. /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/../src/  Storages/HDFS/ReadBufferFromHDFS.cpp:132: DB::ReadBufferFromHDFS::nextImpl() @ 0xb25418 in /opt/gitCode/ClickHouse-21.8.3.44-  lts/build-with-gcc/src/libdbmsd.so
2021.09.22 15:33:27.000206 [ 3987456 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 744.02 MiB, peak 744.02 MiB, will set to   756.39 MiB (RSS), difference: 12.38 MiB
2021.09.22 15:33:27.308603 [ 4012596 ] {} <Fatal> BaseDaemon: 10. /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/../src  /IO/ReadBuffer.h:62: DB::ReadBuffer::next() @ 0xe0cb5 in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/programs/server  /libclickhouse-server-libd.so
2021.09.22 15:33:27.365392 [ 4012596 ] {} <Fatal> BaseDaemon: 11.1. inlined from /opt/gitCode/ClickHouse-21.8.3.44-lts/build-  with-gcc/../src/IO/ReadBuffer.h:93: DB::ReadBuffer::eof()
2021.09.22 15:33:27.365617 [ 4012596 ] {} <Fatal> BaseDaemon: 11. ../src/IO/ReadHelpers.cpp:1151: DB::loadAtPosition(DB::ReadBuffer&  , DB::Memory<Allocator<false, false> >&, char*&) @ 0x40e61e in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/src/libcl  ickhouse_common_iod.so
2021.09.22 15:33:27.496870 [ 4012596 ] {} <Fatal> BaseDaemon: 12. /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/../src  /Processors/Formats/Impl/CSVRowInputFormat.cpp:481: DB::fileSegmentationEngineCSVImpl(DB::ReadBuffer&, DB::Memory<Allocator<false, f  alse> >&, unsigned long) @ 0x348863 in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/src/libclickhouse_processors_form  ats_impld.so
2021.09.22 15:33:27.600645 [ 4012596 ] {} <Fatal> BaseDaemon: 13.1. inlined from /opt/gitCode/ClickHouse-21.8.3.44-lts/build-  with-gcc/../contrib/libcxx/include/type_traits:3676: decltype(forward<std::__1::pair<bool, unsigned long> (*&)(DB::ReadBuffer&, DB::  Memory<Allocator<false, false> >&, unsigned long)>(fp)(forward<DB::ReadBuffer&>(fp0), forward<DB::Memory<Allocator<false, false> >&>  (fp0), forward<unsigned long>(fp0))) std::__1::__invoke<std::__1::pair<bool, unsigned long> (*&)(DB::ReadBuffer&, DB::Memory<Allocat  or<false, false> >&, unsigned long), DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long>(std::__1::pair<bool, uns  igned long> (*&)(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long), DB::ReadBuffer&, DB::Memory<Allocator<false  , false> >&, unsigned long&&)
2021.09.22 15:33:27.600934 [ 4012596 ] {} <Fatal> BaseDaemon: 13. ../contrib/libcxx/include/__functional_base:317: std::__1::pair<bo  ol, unsigned long> std::__1::__invoke_void_return_wrapper<std::__1::pair<bool, unsigned long> >::__call<std::__1::pair<bool, unsigne  d long> (*&)(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long), DB::ReadBuffer&, DB::Memory<Allocator<false, fa  lse> >&, unsigned long>(std::__1::pair<bool, unsigned long> (*&)(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned lo  ng), DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long&&) @ 0x34f0c7 in /opt/gitCode/ClickHouse-21.8.3.44  -lts/build-with-gcc/src/libclickhouse_processors_formats_impld.so
2021.09.22 15:33:27.707490 [ 4012596 ] {} <Fatal> BaseDaemon: 14.1. inlined from /opt/gitCode/ClickHouse-21.8.3.44-lts/build-  with-gcc/../contrib/libcxx/include/functional:1608: std::__1::__function::__default_alloc_func<std::__1::pair<bool, unsigned long> (  *)(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long), std::__1::pair<bool, unsigned long> (DB::ReadBuffer&, DB:  :Memory<Allocator<false, false> >&, unsigned long)>::operator()(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned lon  g&&)
2021.09.22 15:33:27.707734 [ 4012596 ] {} <Fatal> BaseDaemon: 14. ../contrib/libcxx/include/functional:2089: std::__1::pair<bool, un  signed long> std::__1::__function::__policy_invoker<std::__1::pair<bool, unsigned long> (DB::ReadBuffer&, DB::Memory<Allocator<false  , false> >&, unsigned long)>::__call_impl<std::__1::__function::__default_alloc_func<std::__1::pair<bool, unsigned long> (*)(DB::Rea  dBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long), std::__1::pair<bool, unsigned long> (DB::ReadBuffer&, DB::Memory<Al  locator<false, false> >&, unsigned long)> >(std::__1::__function::__policy_storage const*, DB::ReadBuffer&, DB::Memory<Allocator<fal  se, false> >&, unsigned long) @ 0x34e801 in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/src/libclickhouse_processors  _formats_impld.so
2021.09.22 15:33:27.956323 [ 4012596 ] {} <Fatal> BaseDaemon: 15.1. inlined from /opt/gitCode/ClickHouse-21.8.3.44-lts/build-  with-gcc/../contrib/libcxx/include/functional:2221: std::__1::__function::__policy_func<std::__1::pair<bool, unsigned long> (DB::Rea  dBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long)>::operator()(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&,   unsigned long&&) const
2021.09.22 15:33:27.956652 [ 4012596 ] {} <Fatal> BaseDaemon: 15. ../contrib/libcxx/include/functional:2560: std::__1::function<std:  :__1::pair<bool, unsigned long> (DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long)>::operator()(DB::ReadBuffer&  , DB::Memory<Allocator<false, false> >&, unsigned long) const @ 0x44a2ea in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-  gcc/src/libclickhouse_processors_formats_impld.so
2021.09.22 15:33:28.000211 [ 3987456 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 756.39 MiB, peak 756.39 MiB, will set to   781.14 MiB (RSS), difference: 24.75 MiB
2021.09.22 15:33:28.216025 [ 4012596 ] {} <Fatal> BaseDaemon: 16. /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/../src  /Processors/Formats/Impl/ParallelParsingInputFormat.cpp:41: DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shar  ed_ptr<DB::ThreadGroupStatus>) @ 0x4418ba in /opt/gitCode/ClickHouse-21.8.3.44-lts/build-with-gcc/src/libclickhouse_processor  s_formats_impld.so
2021.09.22 15:33:28.482083 [ 4012596 ] {} <Fatal> BaseDaemon: 17.1. inlined from /opt/gitCode/ClickHouse-21.8.3.44-lts/build-  with-gcc/../contrib/libcxx/include/type_traits:3624: decltype(*(forward<DB::ParallelParsingInputFormat*&>(fp0)).*fp(forward<std::__1  ::shared_ptr<DB::ThreadGroupStatus>&>(fp1))) std::__1::__invoke_constexpr<void (DB::ParallelParsingInputFormat::*&)(std::__1::shared  _ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&, std::__1::shared_ptr<DB::ThreadGroupStatus>&, void>(void (DB::Parall  elParsingInputFormat::*&)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&, std::__1::shared_ptr<DB::T  hreadGroupStatus>&)

Additional context

After reading the code, I find that ReadBufferFromHDFS is a subclass of ReadBuffer , and ReadBuffeer is a subclass of BufferBase. In the mean time ReadBufferFromHDFS has an internal class ReadBufferFromHDFSImpl, which is also a subclass of ReadBuffer.
So things become tricky when we execute ReadHelpers.cpp:loadAtPosition.

bool loadAtPosition(ReadBuffer & in, DB::Memory<> & memory, char * & current)
{
    assert(current <= in.buffer().end());

    if (current < in.buffer().end())
        return true;

    saveUpToPosition(in, memory, current);

    bool loaded_more = !in.eof();
    // A sanity check. Buffer position may be in the beginning of the buffer
    // (normal case), or have some offset from it (AIO).
    assert(in.position() >= in.buffer().begin());
    assert(in.position() <= in.buffer().end());
    current = in.position();

    return loaded_more;
}

saveUpToPosition just update the pos of BufferBase of ReadBufferFromHDFS , but not the BufferBase of ReadBufferFromHDFSImpl.
so when we execute eof function , the first hasPendingData and the next method are invoke from diffierent object 。hasPendingData is invoked on BufferBase of ReadBufferFromHDFS, and next is invoked on BufferBase of ReadBufferFromHDFSImpl which leads to the assertion.

    bool ALWAYS_INLINE eof()
    {
        return !hasPendingData() && !next();
    }

May be we can solve the problem some how by rewriting the ReadBufferFromHDFS::nextImpl as below. But as I am not an expert on this , I can't make sure if it is the right way.
bool ReadBufferFromHDFS::nextImpl()
{
auto result = impl->next();

if (result)
{
    working_buffer = internal_buffer = impl->buffer();
    impl->position() = working_buffer.end();
    pos = working_buffer.begin();
}
else
    return false;
return true;

}

@mo-avatar mo-avatar added the potential bug To be reviewed by developers and confirmed/rejected. label Sep 22, 2021
@kssenii kssenii self-assigned this Sep 22, 2021
@mo-avatar
Copy link
Contributor Author

mo-avatar commented Sep 23, 2021

More infomation:
The problem can be reproduced on debug mode with cmake option below。

 cmake .. -DENABLE_READLINE=1 -DENABLE_PARQUET=1 -DENABLE_ORC=1 -DENABLE_PROTOBUF=1 -DENABLE_SSL=1 -DENABLE_JEMALLOC=ON -DENABLE_MYSQL=1 -DENABLE_DATA_SQLITE=0 -DPOCO_ENABLE_SQL_SQLITE=0 -DENABLE_ODBC=1 -DENABLE_CLICKHOUSE_ODBC_BRIDGE=0 -DUSE_INTERNAL_BOOST_LIBRARY=1 -DUSE_INTERNAL_ODBC_LIBRARY=1 -DENABLE_EMBEDDED_COMPILER=0 -DNO_WERROR=1 -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_BUILD_TYPE=Debug -DUSE_STATIC_LIBRARIES=0 -DMAKE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1 -DDISABLE_COLORED_BUILD=0 -DCOMPILER_PIPE=1  -DENABLE_HDFS=1 -DUSE_INTERNAL_HDFS3_LIBRARY=1 -G Ninja

Both clang and gcc can reproduce the problem on debug mode, but it is weird that it can't be reproduced with gcc on release mode on my machine, And I haven't try clang on release mode yet(May be it is because the assertion has been removed in release mode).

Also I have change some code of libhdfs and ligsasl in order to extend the functionality, but I think it is irrelevant to this problem.

So my recommendation is : fix it only if you can reproduce it as I do(I still believe the bug do exists as I analyzed theoretically above). And if it's convenient,please let me know whether it can be reproduced or not. Thanks!
@kssenii

@mo-avatar mo-avatar reopened this Sep 23, 2021
@kssenii
Copy link
Member

kssenii commented Sep 23, 2021

Actually it can be reproduced much easier than that. Just a small modification of libhdfs3 (ClickHouse/libhdfs3#13) and a debug build. In this case all our integration tests with hdfs failed with assertion.

Also I have change some code of libhdfs in order to extend the functionality

btw, if you want you can make a PR with these changes here https://github.com/ClickHouse-Extras/libhdfs3. Extending functionality of libhdfs might be useful.

@mo-avatar
Copy link
Contributor Author

Glad to here that, the work is still in progress now, we would consider to make a PR if we get things done at last。 And thanks for your efforts in solving this bug. It looks much professional than update impl->position() directly. 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-hdfs potential bug To be reviewed by developers and confirmed/rejected.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants