Skip to content

modify impala-shell introduction when using LDAP#2

Closed
davidxdh wants to merge 1 commit intoapache:masterfrom
davidxdh:mydev
Closed

modify impala-shell introduction when using LDAP#2
davidxdh wants to merge 1 commit intoapache:masterfrom
davidxdh:mydev

Conversation

@davidxdh
Copy link

@davidxdh davidxdh commented Jun 12, 2017

When using impala-shell with LDAP, introduction has redundant '\n' as following:
\nLDAP authentication is enabled, but the connection to Impala is not secured by TLS.
ALL PASSWORDS WILL BE SENT IN THE CLEAR TO IMPALA.

This PR deletes the redundant '\n' in introduction.

Change-Id: Ic44fe3a72a3a86c85b807959e33b31923c031ab0

Change-Id: Ic44fe3a72a3a86c85b807959e33b31923c031ab0
@davidxdh
Copy link
Author

@lekv hello. I find can't submit issue to cloudera impala because there is no impala avaliable in project input text. If there is a limit?
Would you please submit this issue and attach this PR?
Thanks a lot.

@timarmstrong
Copy link
Contributor

We don't accepts PRs through github. We track issues at https://issues.apache.org/jira/projects/IMPALA and do code reviews through gerrit.cloudera.org (see https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala)

@davidxdh
Copy link
Author

@timarmstrong, Thank you. I have submit code to https://gerrit.cloudera.org

@pono pono closed this Jun 13, 2017
michaelhkw pushed a commit to michaelhkw/impala that referenced this pull request Jun 21, 2017
This change fixes three issues:
1. File handle caching is expected to be disabled for
remote files (using exclusive HDFS file handles),
however the file handles are still being cached.
2. The retry logic for exclusive file handles is broken,
leading the number of open files to be incorrect.
3. There is no test coverage for disabling the file
handle cache.

To fix issue apache#1, when a scan range is requesting an
exclusive file handle from the cache, it will always
request a newly opened file handle. It also will destroy
the file handle when the scan range is closed.

To fix issue apache#2, exclusive file handles will no longer
retry IOs. Since the exclusive file handle is always
a fresh file handle, it will never have a bad file
handle from the cache. This returns the logic to
its state before IMPALA-4623 in these cases. If a
file handle is borrowed from the cache, then the
code will continue to retry once with a fresh handle.

To fix issue apache#3, custom_cluster/test_hdfs_fd_caching.py
now does both positive and negative tests for the file
handle cache. It verifies that setting
max_cached_file_handles to zero disables caching. It
also verifies that caching is disabled on remote
files. (This change will resolve IMPALA-5390.)

Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
Reviewed-on: http://gerrit.cloudera.org:8080/7181
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
ttmahdy pushed a commit to ttmahdy/incubator-impala that referenced this pull request Jul 14, 2017
Patch set apache#2 to bring this gerrit review
up to date with subsequent changes to .ditamap
file in master.

Also incorporate <topicref> and corresponding
<keydef> for SCRATCH_LIMIT query option.

Change-Id: I4d3098356e1b112ba08bfaf7386c3a1f30306223
Reviewed-on: http://gerrit.cloudera.org:8080/5599
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
ttmahdy pushed a commit to ttmahdy/incubator-impala that referenced this pull request Jul 14, 2017
Here is a basic summary of the changes:
Frontend looks for conjuncts that operate on a single slot and pass a
map from slot id to the conjunct index through thrift to the backend.
The conjunct indices are the indices into the normal PlanNode conjuncts list.
The conjuncts need to satisfy certain conditions:
1. They are bound on a single slot
2. They are deterministic (no random functions)
3. They evaluate to FALSE on a NULL input. This is because the dictionary
does not include NULLs, so any condition that evaluates to TRUE on NULL
cannot be evaluated by looking only at the dictionary.

The backend converts the indices into ExprContexts. These are cloned in
the scanner threads.

The dictionary read codepath has been removed from ReadDataPage into its
own function, InitDictionary. This has also been turned into its own step
in row group initialization. ReadDataPage will not see any dictionary
pages unless the parquet file is invalid.

For dictionary filtering, we initialize dictionaries only as needed to evaluate
the conjuncts. The Parquet scanner evaluates the dictionary filter conjuncts on the
dictionary to see if any dictionary entry passes. If no entry passes, the row
group is eliminated. If the row group passes the dictionary filtering, then we
initialize all remaining dictionaries.

Dictionary filtering is controlled by a new query option,
parquet_dictionary_filtering, which is on by default.

Since column chunks can have a mixture of encodings, dictionary filtering
uses three tests to determine whether this is purely dictionary encoded:
1. If the encoding_stats is in the parquet file, then use it to determine if
there are only dictionary encoded pages (i.e. there are no data pages with
an encoding other than PLAIN_DICTIONARY).
-OR-
2. If the encoding stats are not present, then look at the encodings. The column
is purely dictionary encoded if:
a) PLAIN_DICTIONARY is present
AND
b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed
-OR-
3. If this file was written by an older version of Impala, then we know that
dictionary failover happens when the dictionary reaches 40,000 values.
Dictionary filtering can proceed as long as the dictionary is smaller than
that.

parquet-mr writes the encoding list correctly in the current version in our
environment (1.5.0). This means that check apache#2 works on some existing files
(potentially most existing parquet-mr files).
parquet-mr writes the encoding stats starting in 1.9.0. This is the version
where check apache#1 will start working.

Impala's parquet writer now implements both, so either check above will work.

Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Reviewed-on: http://gerrit.cloudera.org:8080/5904
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
ttmahdy pushed a commit to ttmahdy/incubator-impala that referenced this pull request Jul 14, 2017
This suppresses a Thrift undefined behavior error in which a negative
value is left-shifted. See THRIFT-2026 for tracking.

One example backtrace from the expr-test backend test:

    /thrift/protocol/TCompactProtocol.tcc:375:13: runtime error: left shift of negative value -1
    #0 0x2b02fe247996 in apache::thrift::protocol::TCompactProtocolT<apache::thrift::transport::TMemoryBuffer>::i64ToZigzag(long) /thrift/protocol/TCompactProtocol.tcc:375:13
    apache#1 0x2b02fe247674 in apache::thrift::protocol::TCompactProtocolT<apache::thrift::transport::TMemoryBuffer>::writeI64(long) /thrift/protocol/TCompactProtocol.tcc:242:24
    apache#2 0x2b02fe239504 in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TCompactProtocolT<apache::thrift::transport::TMemoryBuffer>, apache::thrift::protocol::TProtocolDefaults>::writeI64_virt(long) /thrift/protocol/TVirtualProtocol.h:409:12
    apache#3 0x2b03014a3a06 in apache::thrift::protocol::TProtocol::writeI64(long) /thrift/protocol/TProtocol.h:453:12
    apache#4 0x2b0301d55c02 in impala::TRuntimeProfileNode::write(apache::thrift::protocol::TProtocol*) const /generated-sources/gen-cpp/RuntimeProfile_types.cpp:827:11
    apache#5 0x2b0301d59e34 in impala::TRuntimeProfileTree::write(apache::thrift::protocol::TProtocol*) const /generated-sources/gen-cpp/RuntimeProfile_types.cpp:1017:15
    apache#6 0x2b02f6f8c7be in impala::Status impala::ThriftSerializer::Serialize<impala::TRuntimeProfileTree>(impala::TRuntimeProfileTree*, unsigned int*, unsigned char**) /src/rpc/thrift-util.h:67:7
    apache#7 0x2b02f6f0d23e in impala::Status impala::ThriftSerializer::Serialize<impala::TRuntimeProfileTree>(impala::TRuntimeProfileTree*, std::vector<unsigned char, std::allocator<unsigned char> >*) /src/rpc/thrift-util.h:54:31
    apache#8 0x2b02f6eec934 in impala::RuntimeProfile::SerializeToArchiveString(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*) const /src/util/runtime-profile.cc:727:19
    apache#9 0x2b02f6eec640 in impala::RuntimeProfile::SerializeToArchiveString() const /src/util/runtime-profile.cc:718:3
    apache#10 0x2b02feda626c in impala::ImpalaServer::ArchiveQuery(impala::ImpalaServer::QueryExecState const&) /src/service/impala-server.cc:671:39
    apache#11 0x2b02fedb57b0 in impala::ImpalaServer::UnregisterQuery(impala::TUniqueId const&, bool, impala::Status const*) /src/service/impala-server.cc:972:3
    apache#12 0x2b02ff15b666 in impala::ImpalaServer::close(beeswax::QueryHandle const&) /src/service/impala-beeswax-server.cc:236:29
    apache#13 0x2b030177dc14 in beeswax::BeeswaxServiceProcessor::process_close(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) /generated-sources/gen-cpp/BeeswaxService.cpp:3543:5
    apache#14 0x2b0301763825 in beeswax::BeeswaxServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) /generated-sources/gen-cpp/BeeswaxService.cpp:2952:3
    apache#15 0x2b03016bc2d6 in impala::ImpalaServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) /generated-sources/gen-cpp/ImpalaService.cpp:1673:12
    apache#16 0x2b02f650138e in apache::thrift::TDispatchProcessor::process(boost::shared_ptr<apache::thrift::protocol::TProtocol>, boost::shared_ptr<apache::thrift::protocol::TProtocol>, void*) /thrift/TDispatchProcessor.h:121:12
    apache#17 0x1d4080a in apache::thrift::server::TThreadPoolServer::Task::run() (/build/debug/exprs/expr-test+0x1d4080a)
    apache#18 0x1d23e38 in apache::thrift::concurrency::ThreadManager::Worker::run() (/build/debug/exprs/expr-test+0x1d23e38)
    apache#19 0x2b02fe2be520 in impala::ThriftThread::RunRunnable(boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*) /src/rpc/thrift-thread.cc:64:3
    apache#20 0x2b02fe2c4c6b in boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>::operator()(impala::ThriftThread*, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*) const /boost/bind/mem_fn_template.hpp:280:16
    apache#21 0x2b02fe2c498a in void boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> >::operator()<boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>&, boost::_bi::list0&, int) /boost/bind/bind.hpp:392:9
    apache#22 0x2b02fe2c444b in boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> > >::operator()() /boost/bind/bind_template.hpp:20:16
    apache#23 0x2b02fe2c3709 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long>*> > >, void>::invoke(boost::detail::function::function_buffer&) /boost/function/function_template.hpp:153:11
    apache#24 0x2b02f70085d4 in boost::function0<void>::operator()() const /boost/function/function_template.hpp:766:14
    apache#25 0x2b02f6ff9710 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) /src/util/thread.cc:325:3
    apache#26 0x2b02f7021783 in void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) /boost/bind/bind.hpp:457:9
    apache#27 0x2b02f70210ab in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() /boost/bind/bind_template.hpp:20:16
    apache#28 0x2b02f701faf5 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() /boost/thread/detail/thread.hpp:116:17
    apache#29 0xfaa189 in thread_proxy (/build/debug/exprs/expr-test+0xfaa189)
    apache#30 0x2b0305c5e183 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8183)
    apache#31 0x2b0305f6e37c in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfa37c)

Change-Id: I74ece2157048e8cd24c2536c0a292d9c21f719b9
Reviewed-on: http://gerrit.cloudera.org:8080/6425
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
smukil referenced this pull request in smukil/incubator-impala Oct 30, 2017
A recent patch for IMPALA-5129 introduced a use-after-free bug in
thrift-server-test. It is fixed in this patch.

Change-Id: I2cd434757de2cd384def5b360a479e51812cccca
smukil referenced this pull request in smukil/incubator-impala Oct 31, 2017
A recent patch for IMPALA-5129 introduced a use-after-free bug in
thrift-server-test. It is fixed in this patch.

Change-Id: I2cd434757de2cd384def5b360a479e51812cccca
asfgit pushed a commit that referenced this pull request Jan 11, 2018
…ter dynamic linking.

Replaces boost::algorithm::trim() with std::string methods when parsing
/proc/self/smaps and adds a trivial unit test for MemInfo::ParseSmaps().

I did *not* replace other uses of trim() with equivalents from
be/src/gutil/strings/strip.h at this moment.

The backstory here is that
TestAdmissionControllerStress::test_admission_controller_with_flags
fails occasionally on dynamically linked builds of Impala. I was able
to reproduce the failure reliably (within 3 tries) with the following:

  $ ./buildall.sh -notests -so -noclean
  $ bin/start-impala-cluster.py  --impalad_args="--memory_maintenance_sleep_time_ms=1"
  $ impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over (order by bigint_col desc) c6 from functional.alltypes) t;'

The stack trace looks like:

  (gdb) bt
  #0  0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
  #1  0x00007fe230df402a in __GI_abort () at abort.c:89
  #2  0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
  #3  0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
  #4  0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57
  #5  0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87
  #6  0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63
  #7  0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale const&) (__loc=...) at /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137
  #8  0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::string&, std::locale const&) ()
  #9  0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip/src/Impala/be/src/util/mem-info.cc:132
  ...

My best theory is that there's a race/bug, wherein the std::locale* static initialization
work is getting somehow 'reset' by the dynamic linker, when more libraries are linked
in as a result of the query. My evidence to support this theory is scant, but
I do notice that LD_DEBUG=all prints the following when the query is executed
(but not right at startup):

  binding file /home/philip/src/Impala/toolchain/gcc-4.9.2/lib64/libstdc++.so.6 [0] to
  /home/philip/src/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.so.2.2 [0]:
  normal symbol `std::locale::facet::_S_destroy_c_locale(__locale_struct*&)'

Note that there are BSS segments for some of std::locale::facet::* inside
of libgflags.so.

  $nm toolchain/gflags-2.2.0-p1/lib/libgflags.so | c++filt | grep facet | grep ' B '
  00000000002e2d10 B std::locale::facet::_S_c_locale
  00000000002e2d0c B std::locale::facet::_S_once

I'm not the first to run into variants of these issues, though the results
are fairly unhelpful:

  http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/faq.html
  https://stackoverflow.com/questions/26990412/c-boost-crashes-while-using-locale
  https://svn.boost.org/trac10/ticket/4671
  http://clang-developers.42468.n3.nabble.com/std-use-facet-lt-std-ctype-lt-char-gt-gt-crashes-on-linux-td4033967.html
  https://unix.stackexchange.com/questions/719/can-we-get-compiler-information-from-an-elf-binary
  https://stackoverflow.com/questions/42376100/linking-with-library-causes-collate-facet-to-be-missing-from-char
  http://lists.llvm.org/pipermail/cfe-dev/2012-July/023289.html
  https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html

Change-Id: I8dd807f869a9359d991ba515177fb2298054520e
Reviewed-on: http://gerrit.cloudera.org:8080/8888
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
asfgit pushed a commit that referenced this pull request Jul 23, 2018
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior
sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

/exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3
    #1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10
    #2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49
    #3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53
    #4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12
    #5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24
    #6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44

Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2
Reviewed-on: http://gerrit.cloudera.org:8080/10948
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
jbapple-cloudera added a commit to jbapple-cloudera/incubator-impala that referenced this pull request Jul 25, 2018
In expr.shift, the C++ standard says of right shifts:

    The behavior is undefined if the right operand is negative, or
    greater than or equal to the length in bits of the promoted left
    operand.

In HdfsAvroScannerTest.DecimalTest, this is triggered, and the
interesting part of the backtrace is:

exec/hdfs-avro-scanner-ir.cc:272:18: runtime error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int')
    #0 0x1786f65 in HdfsAvroScanner::ReadAvroDecimal(int, unsigned char**, unsigned char*, bool, void*, MemPool*) exec/hdfs-avro-scanner-ir.cc:272:18
    apache#1 0x1617778 in void HdfsAvroScannerTest::TestReadAvroType<DecimalValue<int>, bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long>(bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long, unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:88:20
    apache#2 0x1605705 in void HdfsAvroScannerTest::TestReadAvroDecimal<int>(unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:184:5

Change-Id: Ic5be92912198af2a5e00053ceb9a4fab43ca6bb8
asfgit pushed a commit that referenced this pull request Jul 25, 2018
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

be/src/util/bitmap.h:78:12: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x2ccb59b in Bitmap::SetAllBits(bool) be/src/util/bitmap.h:78:5
    #1 0x2cb6b9e in NestedLoopJoinNode::ResetMatchingBuildRows(RuntimeState*, long) be/src/exec/nested-loop-join-node.cc:176:27
    #2 0x2cb5ad6 in NestedLoopJoinNode::Open(RuntimeState*) be/src/exec/nested-loop-join-node.cc:90:43

Change-Id: I804f642f4be3b74c24f871f656c5147ee226d2c8
Reviewed-on: http://gerrit.cloudera.org:8080/11042
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 27, 2018
In expr.shift, the C++ standard says of right shifts:

    The behavior is undefined if the right operand is negative, or
    greater than or equal to the length in bits of the promoted left
    operand.

In HdfsAvroScannerTest.DecimalTest, this is triggered, and the
interesting part of the backtrace is:

exec/hdfs-avro-scanner-ir.cc:272:18: runtime error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int')
    #0 0x1786f65 in HdfsAvroScanner::ReadAvroDecimal(int, unsigned char**, unsigned char*, bool, void*, MemPool*) exec/hdfs-avro-scanner-ir.cc:272:18
    #1 0x1617778 in void HdfsAvroScannerTest::TestReadAvroType<DecimalValue<int>, bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long>(bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long, unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:88:20
    #2 0x1605705 in void HdfsAvroScannerTest::TestReadAvroDecimal<int>(unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:184:5

Change-Id: Ic5be92912198af2a5e00053ceb9a4fab43ca6bb8
Reviewed-on: http://gerrit.cloudera.org:8080/11047
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Aug 7, 2018
In expr.add, the standard says:

    When an expression that has integral type is added to or
    subtracted from a pointer, the result has the type of the pointer
    operand. ... If both the pointer operand and the result point to
    elements of the same array object, or one past the last element of
    the array object, the evaluation shall not produce an overflow;
    otherwise, the behavior is undefined.

In the end-to-end tests this is triggered, and the interesting part of
the backtrace is:

include/c++/4.9.2/bits/stl_iterator.h:782:45: runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffe8
    #0 __normal_iterator<MemPool::ChunkInfo*, vector<MemPool::ChunkInfo>>::operator+(long) const stl_iterator.h:782:45
    #1 MemPool::AcquireData(MemPool*, bool) runtime/mem-pool.cc:190:62
    #2 RowBatch::TransferResourceOwnership(RowBatch*) runtime/row-batch.cc:444:26
    #3 RowBatch::AcquireState(RowBatch*) runtime/row-batch.cc:505:8
    #4 HdfsScanNode::GetNextInternal(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:105:16
    #5 HdfsScanNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:81:19
    #6 StreamingAggregationNode::GetRowsStreaming(RuntimeState*, RowBatch*) exec/streaming-aggregation-node.cc:116:51
    #7 StreamingAggregationNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/streaming-aggregation-node.cc:92:41

Change-Id: I3d28a80763adb62572b3dd81ea732d18d957d248
Reviewed-on: http://gerrit.cloudera.org:8080/11118
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Jim Apple <jbapple-impala@apache.org>
asfgit pushed a commit that referenced this pull request Aug 14, 2018
This was found in ExprTest.LiteralExprs. The rules vioalted are:

1. "reference binding to null pointer of type 'long'". This violates
the standard's [dcl.ref]:

    a null reference cannot exist in a well-defined program, because
    the only way to create such a reference would be to bind it to the
    "object" obtained by indirection through a null pointer, which
    causes undefined behavior.

The interesting part of the backtrace is:

    #1 RuntimeProfile::TimeSeriesCounter::ToThrift(TTimeSeriesCounter*) util/runtime-profile.cc:1117:11
    #2 RuntimeProfile::ToThrift(std::vector<TRuntimeProfileNode>*) const util/runtime-profile.cc:905:21
    #3 RuntimeProfile::ToThrift(TRuntimeProfileTree*) const util/runtime-profile.cc:847:3
    #4 QueryState::ReportExecStatusAux(bool, Status const&, FragmentInstanceState*, bool) runtime/query-state.cc:281:21
    #5 QueryState::ReportExecStatus(bool, Status const&, FragmentInstanceState*) runtime/query-state.cc:250:3
    #6 FragmentInstanceState::SendReport(bool, Status const&) runtime/fragment-instance-state.cc:406:17
    #7 FragmentInstanceState::Finalize(Status const&) runtime/fragment-instance-state.cc:496:3

2. The use of a null pointer when calling memcpy. According to "7.1.4
Use of library functions" in the C99 standard (which is included in
C++14 in section [intro.refs]:

    If an argument to a function has an invalid value (such as a value
    outside the domain of the function, or a pointer outside the
    address space of the program, or a null pointer, or a pointer to
    non-modifiable storage when the corresponding parameter is not
    const-qualified) or a type (after promotion) not expected by a
    function with variable number of arguments, the behavior is
    undefined.

The interesting part of the backtrace is the same as above.

Change-Id: I3c8a6624918389396789a83b32dbf068b9327f76
Reviewed-on: http://gerrit.cloudera.org:8080/11195
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
twmarshall pushed a commit to twmarshall/impala that referenced this pull request Aug 22, 2018
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior
sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

/exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3
    apache#1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10
    apache#2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49
    apache#3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53
    apache#4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12
    apache#5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24
    apache#6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44

Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2
Reviewed-on: http://gerrit.cloudera.org:8080/10948
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
twmarshall pushed a commit to twmarshall/impala that referenced this pull request Aug 22, 2018
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

be/src/util/bitmap.h:78:12: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x2ccb59b in Bitmap::SetAllBits(bool) be/src/util/bitmap.h:78:5
    apache#1 0x2cb6b9e in NestedLoopJoinNode::ResetMatchingBuildRows(RuntimeState*, long) be/src/exec/nested-loop-join-node.cc:176:27
    apache#2 0x2cb5ad6 in NestedLoopJoinNode::Open(RuntimeState*) be/src/exec/nested-loop-join-node.cc:90:43

Change-Id: I804f642f4be3b74c24f871f656c5147ee226d2c8
Reviewed-on: http://gerrit.cloudera.org:8080/11042
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
twmarshall pushed a commit to twmarshall/impala that referenced this pull request Aug 22, 2018
In expr.shift, the C++ standard says of right shifts:

    The behavior is undefined if the right operand is negative, or
    greater than or equal to the length in bits of the promoted left
    operand.

In HdfsAvroScannerTest.DecimalTest, this is triggered, and the
interesting part of the backtrace is:

exec/hdfs-avro-scanner-ir.cc:272:18: runtime error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int')
    #0 0x1786f65 in HdfsAvroScanner::ReadAvroDecimal(int, unsigned char**, unsigned char*, bool, void*, MemPool*) exec/hdfs-avro-scanner-ir.cc:272:18
    apache#1 0x1617778 in void HdfsAvroScannerTest::TestReadAvroType<DecimalValue<int>, bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long>(bool (HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, MemPool*), unsigned long, unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:88:20
    apache#2 0x1605705 in void HdfsAvroScannerTest::TestReadAvroDecimal<int>(unsigned char*, long, DecimalValue<int>, int, TErrorCode::type) exec/hdfs-avro-scanner-test.cc:184:5

Change-Id: Ic5be92912198af2a5e00053ceb9a4fab43ca6bb8
Reviewed-on: http://gerrit.cloudera.org:8080/11047
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
twmarshall pushed a commit to twmarshall/impala that referenced this pull request Aug 22, 2018
In expr.add, the standard says:

    When an expression that has integral type is added to or
    subtracted from a pointer, the result has the type of the pointer
    operand. ... If both the pointer operand and the result point to
    elements of the same array object, or one past the last element of
    the array object, the evaluation shall not produce an overflow;
    otherwise, the behavior is undefined.

In the end-to-end tests this is triggered, and the interesting part of
the backtrace is:

include/c++/4.9.2/bits/stl_iterator.h:782:45: runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffe8
    #0 __normal_iterator<MemPool::ChunkInfo*, vector<MemPool::ChunkInfo>>::operator+(long) const stl_iterator.h:782:45
    apache#1 MemPool::AcquireData(MemPool*, bool) runtime/mem-pool.cc:190:62
    apache#2 RowBatch::TransferResourceOwnership(RowBatch*) runtime/row-batch.cc:444:26
    apache#3 RowBatch::AcquireState(RowBatch*) runtime/row-batch.cc:505:8
    apache#4 HdfsScanNode::GetNextInternal(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:105:16
    apache#5 HdfsScanNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:81:19
    apache#6 StreamingAggregationNode::GetRowsStreaming(RuntimeState*, RowBatch*) exec/streaming-aggregation-node.cc:116:51
    apache#7 StreamingAggregationNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/streaming-aggregation-node.cc:92:41

Change-Id: I3d28a80763adb62572b3dd81ea732d18d957d248
Reviewed-on: http://gerrit.cloudera.org:8080/11118
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Jim Apple <jbapple-impala@apache.org>
asfgit pushed a commit that referenced this pull request Sep 5, 2018
This implements cache invalidation inside CatalogdMetaProvider. The
design is as follows:

- when the catalogd collects updates into the statestore topic, it now
  adds an additional entry for each table and database. These additional
  entries are minimal - they only include the object's name, but no
  metadata. This new behavior is conditional on a new flag
  --catalog_topic_mode. The default mode is to keep the old style, but
  it can be configured to mixed (support both v1 and v2) or v2-only.

- the old-style topic entries are prefixed with a '1:' whereas the new
  minimal entries are prefixed with a '2:'. The impalad will subscribe
  to one or the other prefix depending on whether it is running with
  --use_local_catalog. Thus, old impalads will not be confused by the
  new entries and vice versa.

- when the impalad gets these topic updates, it forwards them through to
  the catalog implementation. The LocalCatalog implementation forwards
  them to the CatalogdMetaProvider, which uses them to invalidate
  cached metadata as appropriate.

This patch includes some basic unit tests. I also did some manual
testing by connecting to different impalads and verifying that a session
connected to impalad #1 saw the effects of DDLs made by impalad #2
within a short period of time (the statestore topic update frequency).

Existing end-to-end tests cover these code paths pretty thoroughly:

- if we didn't automatically invalidate the cache on a coordinator
  in response to DDL operations, then any test which expects to
  "read its own writes" (eg access a table after creating one)
  would fail
- if we didn't propagate invalidations via the statestore, then
  all of the tests that use sync_ddl would fail.

I verified the test coverage above using some of the tests in
test_ddl.py -- I selectively commented out a few of the invalidation
code paths in the new code and verified that tests failed until I
re-introduced them. Along the way I also improved test_ddl so that, when
this code is broken, it properly fails with a timeout. It also has a bit
of expanded coverage for both the SYNC_DDL and non-SYNC cases.

I also wrote a new custom-cluster test for LocalCatalog that verifies
a few of the specific edge cases like detecting catalogd restart,
SYNC_DDL behavior in mixed mode, etc.

One notable exception here is the implementation of INVALIDATE METADATA
This turned out to be complex to implement, so I left a lengthy TODO
describing the issue and filed a JIRA.

Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984
Reviewed-on: http://gerrit.cloudera.org:8080/11280
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Todd Lipcon <todd@apache.org>
asfgit pushed a commit that referenced this pull request Oct 2, 2018
This patch fixes all remaining UBSAN "null pointer passed as argument"
errors in the backend tests. These are undefined behavior according to
"7.1.4 Use of library functions" in the C99 standard (which is
included in C++14 in section [intro.refs]):

    If an argument to a function has an invalid value (such as a value
    outside the domain of the function, or a pointer outside the
    address space of the program, or a null pointer, or a pointer to
    non-modifiable storage when the corresponding parameter is not
    const-qualified) or a type (after promotion) not expected by a
    function with variable number of arguments, the behavior is
    undefined.

The interesting parts of the backtraces for the errors fixed in this
patch are below:

exprs/string-functions-ir.cc:311:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::Replace(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::StringVal const&) exprs/string-functions-ir.cc:311:5
    #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    #2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    #4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    #5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #20 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:868:15: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:868:3
    #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:270
    #2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    #4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    #5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #20 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:871:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:871:5
    #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10
    #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95
    #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    #5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #21 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:873:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:873:5
    #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10
    #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95
    #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    #5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #21 thread_proxy (exprs/expr-test+0x55ca939)

runtime/raw-value.cc:159:27: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 RawValue::Write(void const*, void*, ColumnType const&, MemPool*) runtime/raw-value.cc:159:9
    #1 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:225:7
    #2 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #3 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #4 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #5 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #6 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #7 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #8 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #9 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #16 thread_proxy (exprs/expr-test+0x55ca939)

udf/udf.cc:521:24: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 impala_udf::StringVal::CopyFrom(impala_udf::FunctionContext*, unsigned char const*, unsigned long) udf/udf.cc:521:5
    #1 AnyValUtil::FromBuffer(impala_udf::FunctionContext*, char const*, int) exprs/anyval-util.h:241:12
    #2 StringFunctions::RegexpExtract(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::BigIntVal const&) exprs/string-functions-ir.cc:726:10
    #3 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    #4 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    #6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    #8 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    #9 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #10 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #11 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    #12 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    #13 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    #14 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    #15 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    #22 thread_proxy (exprs/expr-test+0x55ca939)

util/coding-util-test.cc:45:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 TestUrl(string const&, string const&, bool) util/coding-util-test.cc:45:3
    #1 UrlCodingTest_BlankString_Test::TestBody() util/coding-util-test.cc:88:3
    #2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/coding-util-test+0x6630f42)
    #8 main util/coding-util-test.cc:123:192

util/decompress-test.cc:126:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:126:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9
    #2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3
    #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    #9 main util/decompress-test.cc:479:47

util/decompress-test.cc:148:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:148:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9
    #2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3
    #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    #9 main util/decompress-test.cc:479:47

util/decompress-test.cc:269:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompressNoOutputAllocated(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:269:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:71:7
    #2 DecompressorTest_LZ4_Test::TestBody() util/decompress-test.cc:381:3
    #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    #9 main util/decompress-test.cc:479:47

util/decompress-test.cc:221:329: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::StreamingDecompress(Codec*, long, unsigned char*, long, unsigned char*, bool, long*) util/decompress-test.cc:221:322
    #1 DecompressorTest::CompressAndStreamingDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:245:35
    #2 DecompressorTest::RunTestStreaming(THdfsCompression::type) util/decompress-test.cc:104:5
    #3 DecompressorTest_Gzip_Test::TestBody() util/decompress-test.cc:386:3
    #4 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    #10 main util/decompress-test.cc:479:47

util/streaming-sampler.h:55:22: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StreamingSampler<long, 64>::StreamingSampler(int, vector<long> const&) util/streaming-sampler.h:55:5
    #1 RuntimeProfile::TimeSeriesCounter::TimeSeriesCounter(string const&, TUnit::type, int, vector<long> const&) util/runtime-profile-counters.h:401:53
    #2 RuntimeProfile::Update(vector<TRuntimeProfileNode> const&, int*) util/runtime-profile.cc:310:28
    #3 RuntimeProfile::Update(TRuntimeProfileTree const&) util/runtime-profile.cc:245:3
    #4 Coordinator::BackendState::InstanceStats::Update(TFragmentInstanceExecStatus const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:473:13
    #5 Coordinator::BackendState::ApplyExecStatusReport(TReportExecStatusParams const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:286:21
    #6 Coordinator::UpdateBackendExecStatus(TReportExecStatusParams const&) runtime/coordinator.cc:678:22
    #7 ClientRequestState::UpdateBackendExecStatus(TReportExecStatusParams const&) service/client-request-state.cc:1253:18
    #8 ImpalaServer::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-server.cc:1343:18
    #9 ImpalaInternalService::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-internal-service.cc:87:19
    #24 thread_proxy (exprs/expr-test+0x55ca939)

Change-Id: I317ccc99549744a26d65f3e07242079faad0355a
Reviewed-on: http://gerrit.cloudera.org:8080/11545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Nov 15, 2018
This removes two signed integer overflows when using the 'conv'
builtin. Signed integer overflow is undefined behavior according to
the C++ standard. The interesting parts of the backtraces are:

    exprs/math-functions-ir.cc:405:13: runtime error: signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 'long'
    exprs/math-functions-ir.cc:404:24: runtime error: signed integer overflow: 2 * 4738381338321616896 cannot be represented in type 'long'

    #0 MathFunctions::DecimalInBaseToDecimal(long, signed char, long*) exprs/math-functions-ir.cc:404:24
    #1 MathFunctions::ConvInt(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, impala_udf::TinyIntVal const&) exprs/math-functions-ir.cc:327:10
    #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45

These were triggered in the backend test
ExprTest.MathConversionFunctions.

Change-Id: I0d97dfcf42072750c16e41175765cd9a468a3c39
Reviewed-on: http://gerrit.cloudera.org:8080/11876
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Nov 20, 2018
This patch fixes a signed overflow in the backend test
DecimalTest.Overflow. The interesting part of the backtrace is:

runtime/decimal-value.inline.h:254:17: runtime error: signed integer
         overflow: 0x4b3b4ca85a86c47a098a223fffffffff +
         0x4b3b4ca85a86c47a098a223fffffffff cannot be represented in
         type '__int128'
    #0 detail::AddLarge(__int128, int, __int128, int, int, bool,
         bool*) runtime/decimal-value.inline.h:254:17
    #1 DecimalValue<__int128> DecimalValue<__int128>::Add<__int128>(
         int, DecimalValue<__int128> const&, int, int, int, bool,
         bool*) const runtime/decimal-value.inline.h:371:14
    #2 DecimalTest_Overflow_Test::TestBody()
         runtime/decimal-test.cc:540:9

Change-Id: I146bcf35d34cc0e14be0633427d3e4bd0e5a261e
Reviewed-on: http://gerrit.cloudera.org:8080/11917
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Feb 10, 2019
UBSAN finds the following in ParquetBoolDecoder.TestDecodeAndSkipping:

util/bit-stream-utils.inline.h:156:25: runtime error: left shift of 42 by 28 places cannot be represented in type 'int'
    #0 BatchedBitReader::GetUleb128Int(unsigned int*) util/bit-stream-utils.inline.h:156:25
    #1 RleBatchDecoder<bool>::NextCounts() util/rle-encoding.h:778:40
    #2 RleBatchDecoder<bool>::NextNumRepeats() util/rle-encoding.h:622:28
    #3 RleBatchDecoder<bool>::GetValues(int, bool*) util/rle-encoding.h:858:27
    #4 bool ParquetBoolDecoder::DecodeValue<(parquet::Encoding::type)3>(bool*) exec/parquet/parquet-bool-decoder.h:85:24
    #5 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int)::$_0::operator()() const exec/parquet/parquet-bool-decoder-test.cc:59
    #6 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int) exec/parquet/parquet-bool-decoder-test.cc:69:221
    #7 ParquetBoolDecoder_TestDecodeAndSkipping_Test::TestBody() exec/parquet/parquet-bool-decoder-test.cc:85:5
    #9 testing::Test::Run() (/home/ubuntu/Impala/be/build/debug/exec/parquet/parquet-bool-decoder-test+0x6ee4f09)

The problem is the line

    *v |= (byte & 0x7F) << shift;

byte is an uint8_t and 0x7F is an int. The standard section
[expr.bit.and] then applies the "usual arithmetic conversions"
specified in [expr], which applies "if the type of the operand with
signed integer type can represent all of the values of the type of the
operand with unsigned integer type, the operand with unsigned integer
type shall be converted to the type of the operand with signed integer
type." That makes byte & 0x7F a signed integer type, and [expr.shift]
says that "if E1 has a signed type and non-negative value, and E1×2^E2
is representable in the corresponding unsigned type of the result
type, then that value, converted to the result type, is the resulting
value; otherwise, the behavior is undefined."

Change-Id: Ie6e0b956751090f3f8aadd6783b5e06e55e57abe
Reviewed-on: http://gerrit.cloudera.org:8080/12346
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Feb 12, 2019
Tis patch fixes a signed overflow in the test
StringToDecimal.LargeDecimals. The interesting part of the backtrace
is:

util/string-parser.h:397:14: runtime error: signed integer overflow:
  0x4b3b4ca85a86c47a098a223fffffffff * 10 cannot be represented in
  type '__int128'
    #0 void StringParser::ApplyExponent<__int128>(int, int,
      signed char, __int128*, int*, int*) util/string-parser.h:397:14
    #1 DecimalValue<__int128> StringParser::StringToDecimal<__int128>
      (char const*, int, int, int, bool, StringParser::ParseResult*)
      util/string-parser.h:221:5
    #2 void VerifyParse<__int128>(string const&, int, int, bool,
      DecimalValue<__int128> const&, StringParser::ParseResult)
      runtime/decimal-test.cc:53:25
    #3 void VerifyParse<__int128>(string const&, int, int,
      DecimalValue<__int128> const&, StringParser::ParseResult)
      runtime/decimal-test.cc:65:3
    #4 StringToDecimal_LargeDecimals_Test::TestBody()
      runtime/decimal-test.cc:443:3

Change-Id: Ifb4effa291e1e4dac62b84251c74c483d99b06e7
Reviewed-on: http://gerrit.cloudera.org:8080/12426
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
anuragmantri pushed a commit to anuragmantri/impala that referenced this pull request Feb 12, 2019
Tis patch fixes a signed overflow in the test
StringToDecimal.LargeDecimals. The interesting part of the backtrace
is:

util/string-parser.h:397:14: runtime error: signed integer overflow:
  0x4b3b4ca85a86c47a098a223fffffffff * 10 cannot be represented in
  type '__int128'
    #0 void StringParser::ApplyExponent<__int128>(int, int,
      signed char, __int128*, int*, int*) util/string-parser.h:397:14
    apache#1 DecimalValue<__int128> StringParser::StringToDecimal<__int128>
      (char const*, int, int, int, bool, StringParser::ParseResult*)
      util/string-parser.h:221:5
    apache#2 void VerifyParse<__int128>(string const&, int, int, bool,
      DecimalValue<__int128> const&, StringParser::ParseResult)
      runtime/decimal-test.cc:53:25
    apache#3 void VerifyParse<__int128>(string const&, int, int,
      DecimalValue<__int128> const&, StringParser::ParseResult)
      runtime/decimal-test.cc:65:3
    apache#4 StringToDecimal_LargeDecimals_Test::TestBody()
      runtime/decimal-test.cc:443:3

Change-Id: Ifb4effa291e1e4dac62b84251c74c483d99b06e7
Reviewed-on: http://gerrit.cloudera.org:8080/12426
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Feb 20, 2019
Standard section [expr.shift] says that E1 << E2 is undefined if E1 is
of signed type and the result cannot be represented in the
corresponding unsigned type. We can't simply change 1 << bit_width to
1u << bit_width, though, becuase it is the second argument of the
modulo operator, and following [expr.mul], "If the second operand of /
or % is zero the behavior is undefined."

This expression is tripped in RleTest.ValueSkippingFuzzy, with the
following backtrace:

util/rle-test.cc:304:29: runtime error: shift exponent 32 is too large
      for 32-bit type 'int'
    #0 RleTest::MakeRandomSequence(unsigned int, int, int, int)::
         {lambda(int)#1}::operator()(int) const util/rle-test.cc:304:29
    #1 RleTest::MakeRandomSequence(unsigned int, int, int, int)
         util/rle-test.cc:315:15
    #2 RleTest_ValueSkippingFuzzy_Test::TestBody()
         util/rle-test.cc:392:25

Change-Id: I7debbd1ca5bd3ae640701ce0e95a12c5059abfd7
Reviewed-on: http://gerrit.cloudera.org:8080/12514
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 3, 2019
This undefined behavior was caught with UBSAN in the end-to-end
tests. The interesting part of the backtrace is:

    gutil/atomicops-internals-x86.h:283:15: runtime error: signed
       integer overflow: -9223370395229620599 + -9223371946660462582
       cannot be represented in type 'long'
    #0 base::subtle::Barrier_AtomicIncrement(long volatile*, long)
       gutil/atomicops-internals-x86.h:283:15
    #1 internal::AtomicInt<long>::Add(long) common/atomic.h:93:12
    #2 RuntimeProfile::Counter::Add(long) util/runtime-profile.h:93
    #3 HdfsOrcScanner::AssembleRows(RowBatch*)
       exec/hdfs-orc-scanner.cc:636:50
    #4 HdfsOrcScanner::GetNextInternal(RowBatch*)
       exec/hdfs-orc-scanner.cc:507:19
    #5 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:426:21
    #6 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
       MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #7 HdfsScanNode::ScannerThread(bool, long)
       exec/hdfs-scan-node.cc:415:7
    #8 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::
       $_0::operator()() const exec/hdfs-scan-node.cc:337:13

Change-Id: Ic638ff4959eaaffc79caa3453dbccaaabcbe95c9
Reviewed-on: http://gerrit.cloudera.org:8080/13433
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 3, 2019
Counterintuitively, even passing 0 as the third argument of memcpy
does not avoid undefined behavior. This occurred during an end-to-end
test. The interesting part of the backtrace is:

    util/dict-encoding.h:451:20: runtime error: null pointer passed
       as argument 2, which is declared to never be null
    /usr/include/string.h:43:45: note: nonnull attribute specified
       here
    #0 DictEncoder<StringValue>::AddToTable(StringValue const&,
       unsigned short*) util/dict-encoding.h:451:3
    #1 DictEncoder<StringValue>::Put(StringValue const&)
       util/dict-encoding.h:422:10
    #2 HdfsParquetTableWriter::ColumnWriter<StringValue>::
       ProcessValue(void*, long*)
       exec/parquet/hdfs-parquet-table-writer.cc:436:38
    #3 HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow*)
       exec/parquet/hdfs-parquet-table-writer.cc:662:9
    #4 HdfsParquetTableWriter::AppendRows(RowBatch*,
       vector<int> const&, bool*)
       exec/parquet/hdfs-parquet-table-writer.cc:1192:60
    #5 HdfsTableSink::WriteRowsToPartition(RuntimeState*, RowBatch*,
       pair<unique_ptr<OutputPartition>, vector<int>>*)
       exec/hdfs-table-sink.cc:253:71
    #6 HdfsTableSink::Send(RuntimeState*, RowBatch*)
       exec/hdfs-table-sink.cc:588:45

Change-Id: I2e8e57c34c2848f0dc7dbf32892cc6e86df63506
Reviewed-on: http://gerrit.cloudera.org:8080/13434
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 3, 2019
This patch fixes UBSAN "null pointer passed as argument" errors in the
end-to-end tests. These are undefined behavior according to "7.1.4 Use
of library functions" in the C99 standard (which is included in C++14
in section [intro.refs]):

    If an argument to a function has an invalid value (such as a value
    outside the domain of the function, or a pointer outside the
    address space of the program, or a null pointer, or a pointer to
    non-modifiable storage when the corresponding parameter is not
    const-qualified) or a type (after promotion) not expected by a
    function with variable number of arguments, the behavior is
    undefined.

The interesting parts of the backtraces are:

    runtime/sorter.cc:575:18: runtime error: null pointer passed as
       argument 2, which is declared to never be null
    /usr/include/string.h:43:45: note: nonnull attribute specified
       here
    #0 Sorter::Run::CopyVarLenData(vector<StringValue*> const&,
       unsigned char*) runtime/sorter.cc:575:5
    #1 Status Sorter::Run::AddBatchInternal<true, true>(RowBatch*,
       int, int*) runtime/sorter.cc:232:11
    #2 Sorter::Run::AddInputBatch(RowBatch*, int, int*)
       runtime/sorter.cc:660:12
    #3 Sorter::AddBatchNoSpill(RowBatch*, int, int*)
       runtime/sorter.cc:882:58
    #4 Sorter::AddBatch(RowBatch*) runtime/sorter.cc:862:45
    #5 SortNode::SortInput(RuntimeState*) exec/sort-node.cc:177:54
    #6 SortNode::Open(RuntimeState*) exec/sort-node.cc:90:43

    runtime/tuple.cc:105:25: runtime error: null pointer passed as
       argument 2, which is declared to never be null
    /usr/include/string.h:43:45: note: nonnull attribute specified
       here
    #0 Tuple::DeepCopyVarlenData(TupleDescriptor const&, MemPool*)
       runtime/tuple.cc:105:5
    #1 Tuple::DeepCopy(Tuple*, TupleDescriptor const&, MemPool*)
       runtime/tuple.cc:94:35
    #2 Tuple::DeepCopy(TupleDescriptor const&, MemPool*)
       runtime/tuple.cc:85:3
    #3 KrpcDataStreamSender::Channel::AddRow(TupleRow*)
       runtime/krpc-data-stream-sender.cc:509:43
    #4 KrpcDataStreamSender::AddRowToChannel(int, TupleRow*)
       runtime/krpc-data-stream-sender.cc:846
    #5 (<unknown module>)

    runtime/tuple.cc:146:19: runtime error: null pointer passed as
       argument 2, which is declared to never be null
    /usr/include/string.h:43:45: note: nonnull attribute specified
       here
    #0 Tuple::DeepCopyVarlenData(TupleDescriptor const&, char**, int*,
       bool) runtime/tuple.cc:146:5
    #1 Tuple::DeepCopy(TupleDescriptor const&, char**, int*, bool)
       runtime/tuple.cc:135:35
    #2 RowBatch::SerializeInternal(long, FixedSizeHashTable<Tuple*,
       int>*, vector<int>*, string*) runtime/row-batch.cc:392:14
    #3 RowBatch::Serialize(bool, vector<int>*, string*, long*, bool*)
       runtime/row-batch.cc:290:45
    #4 RowBatch::Serialize(OutboundRowBatch*)
       runtime/row-batch.cc:259:43
    #5 KrpcDataStreamSender::SerializeBatch(RowBatch*,
       OutboundRowBatch*, int) runtime/krpc-data-stream-sender.cc:955:50
    #6 KrpcDataStreamSender::Send(RuntimeState*, RowBatch*)
       runtime/krpc-data-stream-sender.cc:870:45

    runtime/tuple.h:106:12: runtime error: null pointer passed as
       argument 1, which is declared to never be null
    /usr/include/string.h:62:79: note: nonnull attribute specified
       here
    #0 Tuple::ClearNullBits(int, int) runtime/tuple.h:106:5
    #1 HdfsScanner::InitTuple(TupleDescriptor const*, Tuple*, Tuple*)
       exec/hdfs-scanner.h:512:14
    #2 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:742:7
    #3 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:375:20
    #4 OrcStructReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:322:52
    #5 OrcListReader::ReadChildrenValue(int, int, Tuple*, MemPool*)
       const exec/orc-column-readers.cc:473:52
    #6 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:743:60
    #7 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:375:20
    #8 OrcStructReader::TransferTuple(Tuple*, MemPool*)
       exec/orc-column-readers.cc:346:52
    #9 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*,
       RowBatch*) exec/hdfs-orc-scanner.cc:669:58
    #10 HdfsOrcScanner::AssembleRows(RowBatch*)
        exec/hdfs-orc-scanner.cc:629:45
    #11 HdfsOrcScanner::GetNextInternal(RowBatch*)
        exec/hdfs-orc-scanner.cc:507:19
    #12 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:426:21
    #13 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
        MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #14 HdfsScanNode::ScannerThread(bool, long)
        exec/hdfs-scan-node.cc:415:7
    #15 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)
        ::$_0::operator()() const exec/hdfs-scan-node.cc:337:13

    runtime/collection-value-builder.h:75:25: runtime error: null
       pointer passed as argument 2, which is declared to never be null
    /usr/include/string.h:43:28: note: nonnull attribute specified
       here
    #0 CollectionValueBuilder::GetFreeMemory(Tuple**, int*)
       runtime/collection-value-builder.h:75:9
    #1 HdfsScanner::GetCollectionMemory(CollectionValueBuilder*,
       MemPool**, Tuple**, TupleRow**, long*)
       exec/hdfs-scanner.cc:194:3
    #2 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:733:9
    #3 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:710:7
    #4 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:710:7
    #5 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:375:20
    #6 OrcStructReader::TransferTuple(Tuple*, MemPool*)
       exec/orc-column-readers.cc:346:5
    #7 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*,
       RowBatch*) exec/hdfs-orc-scanner.cc:669:5
    #8 HdfsOrcScanner::AssembleRows(RowBatch*)
       exec/hdfs-orc-scanner.cc:629:5
    #9 HdfsOrcScanner::GetNextInternal(RowBatch*)
       exec/hdfs-orc-scanner.cc:507:19
    #10 HdfsScanner::GetNext(RowBatch*) exec/hdfs-scanner.h:133:12
    #11 HdfsScanNodeMt::GetNext(RuntimeState*, RowBatch*, bool*)
        exec/hdfs-scan-node-mt.cc:106:29
    #12 SubplanNode::GetNext(RuntimeState*, RowBatch*, bool*)
        exec/subplan-node.cc:129:7
    #13 AggregationNode::Open(RuntimeState*)
        exec/aggregation-node.cc:67:5

Change-Id: I9362ce6b9ba470ed90e5bd2dc313b66ebd8c6af5
Reviewed-on: http://gerrit.cloudera.org:8080/13436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 10, 2019
Fix remaining signed overflow undefined behaviors in end-to-end
tests. The interesting part of the backtraces:

    exprs/aggregate-functions-ir.cc:464:25: runtime error: signed
       integer overflow: 0x5a4728ca063b522c0b728f8000000000 +
       0x3c2f7086aed236c807a1b50000000000 cannot be represented in
       type '__int128'
    #0 AggregateFunctions::DecimalAvgMerge(
       impala_udf::FunctionContext*, impala_udf::StringVal const&,
       impala_udf::StringVal*) exprs/aggregate-functions-ir.cc:464:25
    #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*)
       exprs/agg-fn-evaluator.cc:327:7
    #2 AggFnEvaluator::Add(TupleRow const*, Tuple*)
       exprs/agg-fn-evaluator.h:257:3
    #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*, bool)
       exec/aggregator.cc:167:24
    #4 NonGroupingAggregator::AddBatchImpl(RowBatch*)
       exec/non-grouping-aggregator-ir.cc:27:5
    #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*)
       exec/non-grouping-aggregator.cc:124:45
    #6 AggregationNode::Open(RuntimeState*)
       exec/aggregation-node.cc:70:57

    exprs/aggregate-functions-ir.cc:513:12: runtime error: signed
       integer overflow: -8282081183197145958 + -4473782455107795527
       cannot be represented in type 'long'
    #0 void AggregateFunctions::SumUpdate<impala_udf::BigIntVal,
       impala_udf::BigIntVal>(impala_udf::FunctionContext*,
       impala_udf::BigIntVal const&, impala_udf::BigIntVal*)
       exprs/aggregate-functions-ir.cc:513:12
    #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*)
       exprs/agg-fn-evaluator.cc:327:7
    #2 AggFnEvaluator::Add(TupleRow const*, Tuple*)
       exprs/agg-fn-evaluator.h:257:3
    #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*,
       bool) exec/aggregator.cc:167:24
    #4 NonGroupingAggregator::AddBatchImpl(RowBatch*)
       exec/non-grouping-aggregator-ir.cc:27:5
    #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*)
       exec/non-grouping-aggregator.cc:124:45
    #6 AggregationNode::Open(RuntimeState*)
       exec/aggregation-node.cc:70:57

    exprs/aggregate-functions-ir.cc:585:14: runtime error: signed
       integer overflow: 0x5a4728ca063b522c0b728f8000000000 +
       0x3c2f7086aed236c807a1b50000000000 cannot be represented in
       type '__int128'
    #0 AggregateFunctions::SumDecimalMerge(
       impala_udf::FunctionContext*, impala_udf::DecimalVal const&,
       impala_udf::DecimalVal*) exprs/aggregate-functions-ir.cc:585:14
    #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*)
       exprs/agg-fn-evaluator.cc:327:7
    #2 AggFnEvaluator::Add(TupleRow const*, Tuple*)
       exprs/agg-fn-evaluator.h:257:3
    #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*, bool)
       exec/aggregator.cc:167:24
    #4 NonGroupingAggregator::AddBatchImpl(RowBatch*)
       exec/non-grouping-aggregator-ir.cc:27:5
    #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*)
       exec/non-grouping-aggregator.cc:124:45
    #6 AggregationNode::Open(RuntimeState*)
       exec/aggregation-node.cc:70:57

    runtime/decimal-value.inline.h:145:12: runtime error: signed
       integer overflow: 18 * 0x0785ee10d5da46d900f436a000000000 cannot
       be represented in type '__int128'
    #0 DecimalValue<__int128>::ScaleTo(int, int, int, bool*) const
       runtime/decimal-value.inline.h:145:12
    #1 DecimalOperators::ScaleDecimalValue(
      impala_udf::FunctionContext*, DecimalValue<int> const&, int,
      int, int) exprs/decimal-operators-ir.cc:132:41
    #2 DecimalOperators::RoundDecimal(impala_udf::FunctionContext*,
       impala_udf::DecimalVal const&, int, int, int, int,
       DecimalOperators::DecimalRoundOp const&)
       exprs/decimal-operators-ir.cc:465:16
    #3 DecimalOperators::RoundDecimal(impala_udf::FunctionContext*,
       impala_udf::DecimalVal const&, DecimalOperators::DecimalRoundOp
       const&) exprs/decimal-operators-ir.cc:519:10
    #4 DecimalOperators::CastToDecimalVal(
       impala_udf::FunctionContext*, impala_udf::DecimalVal const&)
       exprs/decimal-operators-ir.cc:529:10
    #5 impala_udf::DecimalVal ScalarFnCall::InterpretEval
       <impala_udf::DecimalVal>(ScalarExprEvaluator*, TupleRow const*)
       const exprs/scalar-fn-call.cc:485:208
    #6 ScalarFnCall::GetDecimalVal(ScalarExprEvaluator*, TupleRow
       const*) const exprs/scalar-fn-call.cc:618:44
    #7 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow
       const*) exprs/scalar-expr-evaluator.cc:321:27
    #8 ScalarExprEvaluator::GetValue(TupleRow const*)
       exprs/scalar-expr-evaluator.cc:251:10
    #9 Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow
       service/fe-support.cc:246:26
    #10 (<unknown module>)

    runtime/multi-precision.h:116:21: runtime error: negation of
       0x80000000000000000000000000000000 cannot be represented in
       type 'int128_t' (aka '__int128'); cast to an unsigned type to
       negate this value to itself
    #0 ConvertToInt128(boost::multiprecision::number
       <boost::multiprecision::backends::cpp_int_backend<256u, 256u,
       (boost::multiprecision::cpp_integer_type)1,
       (boost::multiprecision::cpp_int_check_type)0, void>,
       (boost::multiprecision::expression_template_option)0>,
       __int128, bool*) runtime/multi-precision.h:116:21
    #1 DecimalValue<__int128>
       DecimalValue<__int128>::Multiply<__int128>(int,
       DecimalValue<__int128> const&, int, int, int, bool, bool*) const
       runtime/decimal-value.inline.h:438:16
    #2 DecimalOperators::Multiply_DecimalVal_DecimalVal(
       impala_udf::FunctionContext*, impala_udf::DecimalVal const&,
       impala_udf::DecimalVal const&)
       exprs/decimal-operators-ir.cc:859:3336
    #3 impala_udf::DecimalVal ScalarFnCall::InterpretEval
       <impala_udf::DecimalVal>(ScalarExprEvaluator*, TupleRow const*)
       const exprs/scalar-fn-call.cc:485:376
    #4 ScalarFnCall::GetDecimalVal(ScalarExprEvaluator*, TupleRow
       const*) const exprs/scalar-fn-call.cc:618:44
    #5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow
       const*) exprs/scalar-expr-evaluator.cc:321:27
    #6 ScalarExprEvaluator::GetValue(TupleRow const*)
       exprs/scalar-expr-evaluator.cc:251:10
    #7 Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow
       service/fe-support.cc:246:26
    #8 (<unknown module>)

    util/runtime-profile-counters.h:194:24: runtime error: signed
       integer overflow: -1263418397011577524 + -9223370798768111350
       cannot be represented in type 'long'
    #0 RuntimeProfile::AveragedCounter::UpdateCounter
       (RuntimeProfile::Counter*)
       util/runtime-profile-counters.h:194:24
    #1 RuntimeProfile::UpdateAverage(RuntimeProfile*)
       util/runtime-profile.cc:199:20
    #2 RuntimeProfile::UpdateAverage(RuntimeProfile*)
       util/runtime-profile.cc:245:14
    #3 Coordinator::BackendState::UpdateExecStats
       (vector<Coordinator::FragmentStats*,
       allocator<Coordinator::FragmentStats*> > const&)
       runtime/coordinator-backend-state.cc:429:22
    #4 Coordinator::ComputeQuerySummary()
       runtime/coordinator.cc:775:20
    #5 Coordinator::HandleExecStateTransition(Coordinator::ExecState,
       Coordinator::ExecState) runtime/coordinator.cc:567:3
    #6 Coordinator::SetNonErrorTerminalState(Coordinator::ExecState)
       runtime/coordinator.cc:484:3
    #7 Coordinator::GetNext(QueryResultSet*, int, bool*)
       runtime/coordinator.cc:657:53
    #8 ClientRequestState::FetchRowsInternal(int, QueryResultSet*)
       service/client-request-state.cc:943:34
    #9 ClientRequestState::FetchRows(int, QueryResultSet*)
       service/client-request-state.cc:835:36
    #10 ImpalaServer::FetchInternal(TUniqueId const&, bool, int,
        beeswax::Results*) service/impala-beeswax-server.cc:545:40
    #11 ImpalaServer::fetch(beeswax::Results&, beeswax::QueryHandle
        const&, bool, int) service/impala-beeswax-server.cc:178:19
    #12 beeswax::BeeswaxServiceProcessor::process_fetch(int,
        apache::thrift::protocol::TProtocol*,
        apache::thrift::protocol::TProtocol*, void*)
        generated-sources/gen-cpp/BeeswaxService.cpp:3398:13
    #13 beeswax::BeeswaxServiceProcessor::dispatchCall
        (apache::thrift::protocol::TProtocol*,
        apache::thrift::protocol::TProtocol*, string const&, int,
        void*) generated-sources/gen-cpp/BeeswaxService.cpp:3200:3
    #14 ImpalaServiceProcessor::dispatchCall
        (apache::thrift::protocol::TProtocol*,
        apache::thrift::protocol::TProtocol*, string const&, int,
        void*) generated-sources/gen-cpp/ImpalaService.cpp:1824:48
    #15 apache::thrift::TDispatchProcessor::process
        (boost::shared_ptr<apache::thrift::protocol::TProtocol>,
        boost::shared_ptr<apache::thrift::protocol::TProtocol>, void*)
        toolchain/thrift-0.9.3-p5/include/thrift/TDispatchProcessor.h:121:12

Change-Id: I73dd6802ec1023275d09a99a2950f3558313fc8e
Reviewed-on: http://gerrit.cloudera.org:8080/13437
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
stiga-huang pushed a commit to stiga-huang/impala that referenced this pull request Jun 16, 2019
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior
sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

/exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3
    apache#1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10
    apache#2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49
    apache#3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53
    apache#4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12
    apache#5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24
    apache#6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44

Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2
Reviewed-on: http://gerrit.cloudera.org:8080/10948
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
stiga-huang pushed a commit to stiga-huang/impala that referenced this pull request Jun 16, 2019
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

be/src/util/bitmap.h:78:12: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x2ccb59b in Bitmap::SetAllBits(bool) be/src/util/bitmap.h:78:5
    apache#1 0x2cb6b9e in NestedLoopJoinNode::ResetMatchingBuildRows(RuntimeState*, long) be/src/exec/nested-loop-join-node.cc:176:27
    apache#2 0x2cb5ad6 in NestedLoopJoinNode::Open(RuntimeState*) be/src/exec/nested-loop-join-node.cc:90:43

Change-Id: I804f642f4be3b74c24f871f656c5147ee226d2c8
Reviewed-on: http://gerrit.cloudera.org:8080/11042
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 25, 2019
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior
sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

/exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3
    #1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10
    #2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49
    #3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53
    #4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12
    #5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24
    #6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44

Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2
Reviewed-on: http://gerrit.cloudera.org:8080/10948
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 25, 2019
memset has undefined behavior when its first argument is NULL. The
instance fixed here was found by Clang's undefined behavior sanitizer.

It was found in the end-to-end tests. The interesting part of the
stack trace is:

be/src/util/bitmap.h:78:12: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:79: note: nonnull attribute specified here
    #0 0x2ccb59b in Bitmap::SetAllBits(bool) be/src/util/bitmap.h:78:5
    #1 0x2cb6b9e in NestedLoopJoinNode::ResetMatchingBuildRows(RuntimeState*, long) be/src/exec/nested-loop-join-node.cc:176:27
    #2 0x2cb5ad6 in NestedLoopJoinNode::Open(RuntimeState*) be/src/exec/nested-loop-join-node.cc:90:43

Change-Id: I804f642f4be3b74c24f871f656c5147ee226d2c8
Reviewed-on: http://gerrit.cloudera.org:8080/11042
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 16, 2019
This eliminates an out-of-range enum value in the end-to-end
tests. The interesting part of the backtrace is:

exec/parquet/parquet-column-readers.cc:1530:112: runtime error: load
  of value 38, which is not a valid value for type 'Encoding::type'
    #0 BaseScalarColumnReader::ReadDataPage()
       exec/parquet/parquet-column-readers.cc:1530:112
    #1 BaseScalarColumnReader::NextPage()
       exec/parquet/parquet-column-readers.cc:1769:28
    #2 bool ScalarColumnReader<long, (parquet::Type::type)2, true>
       ::ReadValueBatch<false>(int, int, unsigned char*, int*)
       exec/parquet/parquet-column-readers.cc:459:12
    #3 ScalarColumnReader<long, (parquet::Type::type)2, true>
       ::ReadNonRepeatedValueBatch(MemPool*, int, int, unsigned char*,
       int*) exec/parquet/parquet-column-readers.cc:106:12
    #4 HdfsParquetScanner::AssembleRows(vector<ParquetColumnReader*>
       const&, RowBatch*, bool*)
       exec/parquet/hdfs-parquet-scanner.cc:1113:42
    #5 HdfsParquetScanner::GetNextInternal(RowBatch*)
       exec/parquet/hdfs-parquet-scanner.cc:456:19
    #6 HdfsParquetScanner::ProcessSplit()
       exec/parquet/hdfs-parquet-scanner.cc:353:21
    #7 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
       MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #8 HdfsScanNode::ScannerThread(bool, long)
       exec/hdfs-scan-node.cc:415:7
    #9 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::$_0
       ::operator()() const exec/hdfs-scan-node.cc:337:13

Change-Id: Ia86de44daaf56a941fb95b15d5dfd7b5a2752129
Reviewed-on: http://gerrit.cloudera.org:8080/13804
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 21, 2019
According to [expr.post] in the C++14 standard, a call to a member
function like a->b() is interpreted as (a->b)(). In other words, the
dereferencing is done separately from the call. This makes calling
member functions on nullptr undefined behavior, since the dereference
invokes undefined behavior.

This fixes an error in hdfs-scanner.h in the end-to-end tests. The
interesting part of the backtrace is:

exec/hdfs-scanner.h:512:14: runtime error: member call on null pointer
  of type 'Tuple'
    #0 HdfsScanner::InitTuple(TupleDescriptor const*, Tuple*, Tuple*)
       exec/hdfs-scanner.h:512:14
    #1 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader
       const&, int, CollectionValueBuilder*)
       exec/hdfs-orc-scanner.cc:743:7
    #2 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:375:20
    #3 OrcStructReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:322:52
    #4 OrcStructReader::ReadValue(int, Tuple*, MemPool*)
       exec/orc-column-readers.cc:322:52
    #5 OrcStructReader::TransferTuple(Tuple*, MemPool*)
       exec/orc-column-readers.cc:346:52
    #6 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*,
       RowBatch*) exec/hdfs-orc-scanner.cc:670:58
    #7 HdfsOrcScanner::AssembleRows(RowBatch*)
       exec/hdfs-orc-scanner.cc:630:45
    #8 HdfsOrcScanner::GetNextInternal(RowBatch*)
       exec/hdfs-orc-scanner.cc:508:19
    #9 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:427:21
    #10 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
        MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #11 HdfsScanNode::ScannerThread(bool, long)
        exec/hdfs-scan-node.cc:415:7
    #12 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::$_0
        ::operator()() const exec/hdfs-scan-node.cc:337:13

Change-Id: I7e5b130848a3c8f11d9010b3378f4054a35e1612
Reviewed-on: http://gerrit.cloudera.org:8080/13803
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 21, 2019
According to [expr.post] in the C++14 standard, a call to a member
function like a->b() is interpreted as (a->b)(). In other words, the
dereferencing is done separately from the call. This makes calling
member functions on nullptr undefined behavior, since the dereference
invokes undefined behavior.

This fixes such an error in exec-node.cc in the end-to-end tests. The
interesting part of the backtrace is:

exec/exec-node.cc:396:27: runtime error: member call on null pointer
  of type 'MemTracker'
    #0 in ExecNode::ExecDebugActionImpl(TExecNodePhase::type,
       RuntimeState*) exec/exec-node.cc:396:27
    #1 in ExecNode::ExecDebugAction(TExecNodePhase::type,
       RuntimeState*) exec/exec-node.h:379:12
    #2 in ExecNode::Prepare(RuntimeState*) exec/exec-node.cc:106:43
    #3 in TopNNode::Prepare(RuntimeState*) exec/topn-node.cc:75:53

Change-Id: Id62d1c504a273451dc1be6831a473f6c7115b403
Reviewed-on: http://gerrit.cloudera.org:8080/13769
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 21, 2019
This patch fixes an out-of-range enum value in the end-to-end tests.

The [expr] section of the C++14 standard indicates that out-of-range
enum values are undefined behavior: "If during the evaluation of an
expression, the result is not mathematically defined or not in the
range of representable values for its type, the behavior is
undefined."

The [decl.enum] section explains what values are "in the range of
representable values for its type": "[F]or an enumeration where emin
is the smallest enumerator and emax is the largest, the values of the
enumeration are the values in the range bmin to bmax, defined as
follows: Let K be 1 for a two's complement representation and 0 for a
one's complement or sign-magnitude representation. bmax is the
smallest value greater than or equal to max(|emin| - K, |emax|) and
equal to 2M-1, where M is a non-negative integer. bmin is zero if emin
is non-negative and -(bmax+K) otherwise."

The Parquet PageType enum has emin = 0 and emax = 3, so bmin = 0 and
bmax = 3. The out-of-range value in the tests is 4294967249, and is
therefore undefined behavior. The interesting part of the backtrace
is:

parquet/parquet-column-readers.cc:1269:24: runtime error: load of
  value 4294967249, which is not a valid value for type
  'PageType::type'
    #0 BaseScalarColumnReader::InitDictionary()
       parquet/parquet-column-readers.cc:1269:24
    #1 BaseScalarColumnReader::InitDictionaries(
       vector<BaseScalarColumnReader*>)
       parquet/parquet-column-readers.cc:1381:53
    #2 HdfsParquetScanner::NextRowGroup()
       parquet/hdfs-parquet-scanner.cc:678:14
    #3 HdfsParquetScanner::GetNextInternal(RowBatch*)
       parquet/hdfs-parquet-scanner.cc:437:45
    #4 HdfsParquetScanner::ProcessSplit()
       parquet/hdfs-parquet-scanner.cc:353:21
    #5 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
       MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #6 HdfsScanNode::ScannerThread(bool, long)
       hdfs-scan-node.cc:415:7
    #7 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)
       ::$_0::operator()() const hdfs-scan-node.cc:337:13

Change-Id: I2d126a8f3e5910d23088a3f916c4cf31aac28d95
Reviewed-on: http://gerrit.cloudera.org:8080/13805
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jul 25, 2019
In expr.add, the standard says:

    When an expression that has integral type is added to or
    subtracted from a pointer, the result has the type of the pointer
    operand. ... If both the pointer operand and the result point to
    elements of the same array object, or one past the last element of
    the array object, the evaluation shall not produce an overflow;
    otherwise, the behavior is undefined.

This is triggered in the end-to-end tests.h The interesting part of
the backtrace is:

exec/parquet/hdfs-parquet-scanner.cc:1405:45: runtime error: pointer
  index expression with base 0x00001300e0e9 overflowed to
  0xffffffff1300e0ea
    #0 HdfsParquetScanner::ProcessFooter()
       exec/parquet/hdfs-parquet-scanner.cc:1405:45
    #1 HdfsParquetScanner::Open(ScannerContext*)
       exec/parquet/hdfs-parquet-scanner.cc:186:26
    #2 HdfsScanNodeBase::CreateAndOpenScannerHelper(
       HdfsPartitionDescriptor*, ScannerContext*,
       scoped_ptr<HdfsScanner>*) exec/hdfs-scan-node-base.cc:721:59
    #3 HdfsScanNodeMt::CreateAndOpenScanner(HdfsPartitionDescriptor*,
       ScannerContext*, scoped_ptr<HdfsScanner>*)
       exec/hdfs-scan-node-mt.cc:127:19
    #4 HdfsScanNodeMt::GetNext(RuntimeState*, RowBatch*, bool*)
       exec/hdfs-scan-node-mt.cc:97:21

Change-Id: I81c7db75b564045106edf3d46e2c4a62be77359f
Reviewed-on: http://gerrit.cloudera.org:8080/13889
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Sep 9, 2019
This fixes an instance of undefined behavior in the end-to-end tests
in which an enum value is outside of the allowable values for that
enum according to the C++14 standard.

Representative backtrace:

exec/parquet/parquet-metadata-utils.cc:293:26: runtime error: load of
  value 49, which is not a valid value for type 'Type::type'
    #0 ParquetMetadataUtils::ValidateRowGroupColumn(
       parquet::FileMetaData const&, char const*, int, int,
       parquet::SchemaElement const&, RuntimeState*)
       exec/parquet/parquet-metadata-utils.cc:293:26
    #1 BaseScalarColumnReader::Reset(HdfsFileDesc const&,
       parquet::ColumnChunk const&, int)
       exec/parquet/parquet-column-readers.cc:1077:43
    #2 HdfsParquetScanner::InitScalarColumns()
       exec/parquet/hdfs-parquet-scanner.cc:1679:60
    #3 HdfsParquetScanner::NextRowGroup()
       exec/parquet/hdfs-parquet-scanner.cc:648:45
    #4 HdfsParquetScanner::GetNextInternal(RowBatch*)
       exec/parquet/hdfs-parquet-scanner.cc:437:45
    #5 HdfsParquetScanner::ProcessSplit()
       exec/parquet/hdfs-parquet-scanner.cc:353:21
    #6 HdfsScanNode::ProcessSplit(vector<FilterContext> const&,
       MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21
    #7 HdfsScanNode::ScannerThread(bool, long)
       exec/hdfs-scan-node.cc:415:7
    #8 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::
       $_0::operator()() const exec/hdfs-scan-node.cc:337:13

Change-Id: I48090e8e0c6c6f18bb1ad3c32c1f5fbffc908844
Reviewed-on: http://gerrit.cloudera.org:8080/13940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jun 19, 2020
Fixes the following TSAN data races that come up when running custom
cluster tests. The immediate goal is to fix all remaining data races in
custom cluster tests and then enable custom cluster tests in the TSAN
builds. This patch fixes about half of the remaining data races reported
during a TSAN build of custom cluster tests.

SUMMARY: ThreadSanitizer: data race util/stopwatch.h:186:9 in impala::MonotonicStopWatch::RunningTime() const
  Read of size 8 at 0x7b580000dba8 by thread T342:
    #0 impala::MonotonicStopWatch::RunningTime() const util/stopwatch.h:186:9
    #1 impala::MonotonicStopWatch::Reset() util/stopwatch.h:136:20
    #2 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35
  Previous write of size 8 at 0x7b580000dba8 by thread T341:
    #0 impala::MonotonicStopWatch::Reset() util/stopwatch.h:139:21 (impalad+0x1f744ab)
    #1 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35

SUMMARY: ThreadSanitizer: data race status.h:220:10 in impala::Status::operator=(impala::Status&&)
  Write of size 8 at 0x7b50002e01e0 by thread T341 (mutexes: write M17919):
    #0 impala::Status::operator=(impala::Status&&) common/status.h:220:10
    #1 impala::RuntimeState::SetQueryStatus(std::string const&) runtime/runtime-state.h:250
    #2 impala_udf::FunctionContext::SetError(char const*) udf/udf.cc:423:47
  Previous read of size 8 at 0x7b50002e01e0 by thread T342:
    #0 impala::Status::ok() const common/status.h:236:42
    #1 impala::RuntimeState::GetQueryStatus() runtime/runtime-state.h:15
    #2 impala::HdfsScanner::CommitRows(int, impala::RowBatch*) exec/hdfs-scanner.cc:218:3

SUMMARY: ThreadSanitizer: data race hashtable.h:370:58
  Read of size 8 at 0x7b2400091df8 by thread T338 (mutexes: write M106814410723061456):
...
    #3 impala::MetricGroup::CMCompatibleCallback() util/metrics.cc:185:40
...
    #9 impala::Webserver::RenderUrlWithTemplate() util/webserver.cc:801:3
    #10 impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) util/webserver.cc:696:5
  Previous write of size 8 at 0x7b2400091df8 by thread T364 (mutexes: write M600803201008047112, write M1046659357959855584):
...
    #4 impala::AtomicMetric<(impala::TMetricKind::type)0>* impala::MetricGroup::RegisterMetric<> >() util/metrics.h:366:5
    #5 impala::MetricGroup::AddGauge(std::string const&, long, std::string const&) util/metrics.h:384:12
    #6 impala::AdmissionController::PoolStats::InitMetrics() scheduling/admission-controller.cc:1714:55

Testing:
* Ran core tests
* Re-ran TSAN tests and made sure issues were resolved
* Ran single_node_perf_run for workload TPC-H scale factor 30;
  no regressions detected

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 7.36    | -1.77%     | 5.01       | -1.61%         |
+----------+-----------------------+---------+------------+------------+----------------+

Change-Id: Id4244c9a7f971c96b8b8dc7d5262904a0a4b77c1
Reviewed-on: http://gerrit.cloudera.org:8080/16079
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Sep 10, 2020
This patch fixes the remaining TSAN errors reported while running custom
cluster tests. After this patch, TSAN can be enabled for custom cluster
tests (currently it is only run for be tests).

Adds a data race suppression for
HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret, which
usually calls a codegen function. TSAN currently does not support
codegen functions, so this warning needs to be suppressed. The call
stack of this warning is:

    #0 kudu::BlockBloomFilter::Find(unsigned int) const kudu/util/block_bloom_filter.cc:257:7
    #1 <null> <null> (0x7f19af1c74cd)
    #2 impala::HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:106:10
    #3 impala::HdfsColumnarScanner::TransferScratchTuples(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:66:34

Fixes a data race in DmlExecState::FinalizeHdfsInsert where a local
HdfsFsCache::HdfsFsMap is unsafely passed between threads of a
HdfsOperationSet. HdfsOperationSet instances are run in a
HdfsOpThreadPool and each operation is run in one of the threads from
the pool. Each operation uses HdfsFsCache::GetConnection to get a hdfsFs
instance. GetConnection can take in a 'local_cache' of hdfsFs instances
before using the global map. The race condition is that the same local
cache is used for all operations in HdfsOperationSet.

Testing:
* Re-ran TSAN tests and confirmed the data races have disappeared

Change-Id: If1658a9b56d220e2cfd1f8b958604edcdf7757f4
Reviewed-on: http://gerrit.cloudera.org:8080/16426
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ColdZoo pushed a commit to ColdZoo/impala that referenced this pull request Mar 30, 2021
Fixes the following TSAN data races that come up when running custom
cluster tests. The immediate goal is to fix all remaining data races in
custom cluster tests and then enable custom cluster tests in the TSAN
builds. This patch fixes about half of the remaining data races reported
during a TSAN build of custom cluster tests.

SUMMARY: ThreadSanitizer: data race util/stopwatch.h:186:9 in impala::MonotonicStopWatch::RunningTime() const
  Read of size 8 at 0x7b580000dba8 by thread T342:
    #0 impala::MonotonicStopWatch::RunningTime() const util/stopwatch.h:186:9
    #1 impala::MonotonicStopWatch::Reset() util/stopwatch.h:136:20
    apache#2 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35
  Previous write of size 8 at 0x7b580000dba8 by thread T341:
    #0 impala::MonotonicStopWatch::Reset() util/stopwatch.h:139:21 (impalad+0x1f744ab)
    #1 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35

SUMMARY: ThreadSanitizer: data race status.h:220:10 in impala::Status::operator=(impala::Status&&)
  Write of size 8 at 0x7b50002e01e0 by thread T341 (mutexes: write M17919):
    #0 impala::Status::operator=(impala::Status&&) common/status.h:220:10
    #1 impala::RuntimeState::SetQueryStatus(std::string const&) runtime/runtime-state.h:250
    apache#2 impala_udf::FunctionContext::SetError(char const*) udf/udf.cc:423:47
  Previous read of size 8 at 0x7b50002e01e0 by thread T342:
    #0 impala::Status::ok() const common/status.h:236:42
    #1 impala::RuntimeState::GetQueryStatus() runtime/runtime-state.h:15
    apache#2 impala::HdfsScanner::CommitRows(int, impala::RowBatch*) exec/hdfs-scanner.cc:218:3

SUMMARY: ThreadSanitizer: data race hashtable.h:370:58
  Read of size 8 at 0x7b2400091df8 by thread T338 (mutexes: write M106814410723061456):
...
    apache#3 impala::MetricGroup::CMCompatibleCallback() util/metrics.cc:185:40
...
    apache#9 impala::Webserver::RenderUrlWithTemplate() util/webserver.cc:801:3
    apache#10 impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) util/webserver.cc:696:5
  Previous write of size 8 at 0x7b2400091df8 by thread T364 (mutexes: write M600803201008047112, write M1046659357959855584):
...
    apache#4 impala::AtomicMetric<(impala::TMetricKind::type)0>* impala::MetricGroup::RegisterMetric<> >() util/metrics.h:366:5
    apache#5 impala::MetricGroup::AddGauge(std::string const&, long, std::string const&) util/metrics.h:384:12
    apache#6 impala::AdmissionController::PoolStats::InitMetrics() scheduling/admission-controller.cc:1714:55

Testing:
* Ran core tests
* Re-ran TSAN tests and made sure issues were resolved
* Ran single_node_perf_run for workload TPC-H scale factor 30;
  no regressions detected

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 7.36    | -1.77%     | 5.01       | -1.61%         |
+----------+-----------------------+---------+------------+------------+----------------+

Change-Id: Id4244c9a7f971c96b8b8dc7d5262904a0a4b77c1
Reviewed-on: http://gerrit.cloudera.org:8080/16079
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ColdZoo pushed a commit to ColdZoo/impala that referenced this pull request Mar 30, 2021
This patch fixes the remaining TSAN errors reported while running custom
cluster tests. After this patch, TSAN can be enabled for custom cluster
tests (currently it is only run for be tests).

Adds a data race suppression for
HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret, which
usually calls a codegen function. TSAN currently does not support
codegen functions, so this warning needs to be suppressed. The call
stack of this warning is:

    #0 kudu::BlockBloomFilter::Find(unsigned int) const kudu/util/block_bloom_filter.cc:257:7
    #1 <null> <null> (0x7f19af1c74cd)
    apache#2 impala::HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:106:10
    apache#3 impala::HdfsColumnarScanner::TransferScratchTuples(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:66:34

Fixes a data race in DmlExecState::FinalizeHdfsInsert where a local
HdfsFsCache::HdfsFsMap is unsafely passed between threads of a
HdfsOperationSet. HdfsOperationSet instances are run in a
HdfsOpThreadPool and each operation is run in one of the threads from
the pool. Each operation uses HdfsFsCache::GetConnection to get a hdfsFs
instance. GetConnection can take in a 'local_cache' of hdfsFs instances
before using the global map. The race condition is that the same local
cache is used for all operations in HdfsOperationSet.

Testing:
* Re-ran TSAN tests and confirmed the data races have disappeared

Change-Id: If1658a9b56d220e2cfd1f8b958604edcdf7757f4
Reviewed-on: http://gerrit.cloudera.org:8080/16426
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
sourabh912 pushed a commit to sourabh912/impala that referenced this pull request Dec 9, 2021
In the current design, catalogd cache gets updated from 2 sources:
1. Impala shell
2. MetastoreEventProcessor

The updates from the Impala shell are applied in place whereas
MetastoreEventProcessor runs as a background thread which polls HMS
events from notifications log table and apply them asynchronously.
These two stream of updates cause consistency issues. For example
consider the following sequence of alter table events on a table
t1 as per HMS:

1. alter table t1 from source s1 say other Impala cluster
2. alter table t1 from source s2 say other Hive cluster
3. alter table t1 from local Impala cluster

The apache#3 alter table ddl operation would get reflected in the local
cache immediately. However, later on event processor would process
events from apache#1 and apache#2 above and try to alter the table. In an ideal
scenario, these alters should have been applied before apache#3 i.e in the
same order as they appear in HMS notification log. This leaves table
t1 in an inconsistent state.

Proposed solution:

The main idea of the solution is to keep track of the last event id
for a given table as eventId which the catalogd has synced to in the
Db/Table object. The events processor ignores any event whose EVENT_ID
is less than or equal to the eventId stored in the table. Once the
events processor successfully processes a given event, it updates the
value of eventId in the table before releasing the table lock. Also,
any DDL or refresh operation on the catalogd from both catalog HMS
metastore server and Impala shell) will follow the following steps
to update the event id for the table:

1. Acquire write lock on the table
2. Perform ddl operation in HMS
3. Sync table till the latest event id (as per HMS) since its last
   synced event id

The above steps ensure that any concurrent updates applied on a same
db/table from multiple sources like Hive, Impala or say multiple
Impala clusters, get reflected in the local catalogd cache in the
same order as they appear in HMS thus removing any inconsistencies.
Also the solution relies on the existing locking mechanism in the
catalogd to prevent any other concurrent updates to the table (even
via EventsProcessor). In case of database objects, we will also have
a similar eventId which represents the events on the database object
(CREATE, DROP, ALTER database) to which the catalogd as synced to.

This patch addresses the following:
1. Add a new flag enable_sync_to_latest_event_on_ddls to enable/disable
   this improvement. It is turned off by default.
2. If flag in apache#1 is enabled then apart from Impala shell and
   MetastoreEventProcessor the cache would also get updated for ddls
   executed via catalog HMS endpoints. And while excuting a ddl,
   db/table will be synced till latest event id.
3. Event processor skips processing an event if db/table is already
   synced till that event id. Sets that event id in db/table if
   the event is processed.
4. When EventProcessor detects a self event, it sets the last synced
   event id in db/table before skipping the processing of an event.
5. Full table refresh sets the last event processed in table cache.

Future Work:
1. Sync db/table to latest event id for ddls executed from Impala
   shell (execDdlRequest() in catalogOpExecutor)

Testing:

1. Added new unit tests and modified existing ones
2. Ran exhaustive tests with flag both turned on and off

Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
asfgit pushed a commit that referenced this pull request Dec 15, 2021
In the current design, catalogd cache gets updated from 2 sources:
1. Impala shell
2. MetastoreEventProcessor

The updates from the Impala shell are applied in place whereas
MetastoreEventProcessor runs as a background thread which polls HMS
events from notifications log table and apply them asynchronously.
These two stream of updates cause consistency issues. For example
consider the following sequence of alter table events on a table
t1 as per HMS:

1. alter table t1 from source s1 say other Impala cluster
2. alter table t1 from source s2 say other Hive cluster
3. alter table t1 from local Impala cluster

The #3 alter table ddl operation would get reflected in the local
cache immediately. However, later on event processor would process
events from #1 and #2 above and try to alter the table. In an ideal
scenario, these alters should have been applied before #3 i.e in the
same order as they appear in HMS notification log. This leaves table
t1 in an inconsistent state.

Proposed solution:

The main idea of the solution is to keep track of the last event id
for a given table as eventId which the catalogd has synced to in the
Db/Table object. The events processor ignores any event whose EVENT_ID
is less than or equal to the eventId stored in the table. Once the
events processor successfully processes a given event, it updates the
value of eventId in the table before releasing the table lock. Also,
any DDL or refresh operation on the catalogd from both catalog HMS
metastore server and Impala shell) will follow the following steps
to update the event id for the table:

1. Acquire write lock on the table
2. Perform ddl operation in HMS
3. Sync table till the latest event id (as per HMS) since its last
   synced event id

The above steps ensure that any concurrent updates applied on a same
db/table from multiple sources like Hive, Impala or say multiple
Impala clusters, get reflected in the local catalogd cache in the
same order as they appear in HMS thus removing any inconsistencies.
Also the solution relies on the existing locking mechanism in the
catalogd to prevent any other concurrent updates to the table (even
via EventsProcessor). In case of database objects, we will also have
a similar eventId which represents the events on the database object
(CREATE, DROP, ALTER database) to which the catalogd as synced to.

This patch addresses the following:
1. Add a new flag enable_sync_to_latest_event_on_ddls to enable/disable
   this improvement. It is turned off by default.
2. If flag in #1 is enabled then apart from Impala shell and
   MetastoreEventProcessor the cache would also get updated for ddls
   executed via catalog HMS endpoints. And while excuting a ddl,
   db/table will be synced till latest event id.
3. Event processor skips processing an event if db/table is already
   synced till that event id. Sets that event id in db/table if
   the event is processed.
4. When EventProcessor detects a self event, it sets the last synced
   event id in db/table before skipping the processing of an event.
5. Full table refresh sets the last event processed in table cache.

Future Work:
1. Sync db/table to latest event id for ddls executed from Impala
   shell (execDdlRequest() in catalogOpExecutor)

Testing:

1. Added new unit tests and modified existing ones
2. Ran exhaustive tests with flag both turned on and off

Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
Reviewed-on: http://gerrit.cloudera.org:8080/17859
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Jan 7, 2022
As part of moving to a newer protobuf, this updates the Kudu version
to get the fix for KUDU-3334. With this newer Kudu version, Clang
builds hit an error while linking:
lib/libLLVMCodeGen.a(TargetPassConfig.cpp.o):TargetPassConfig.cpp:
  function llvm::TargetPassConfig::createRegAllocPass(bool):
    error: relocation refers to global symbol "std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#2}::_FUN()",
    which is defined in a discarded section
  section group signature: "_ZZSt9call_onceIRFvvEJEEvRSt9once_flagOT_DpOT0_ENKUlvE0_clEv"
  prevailing definition is from ../../build/debug/security/libsecurity.a(openssl_util.cc.o)
(This is from a newer binutils that will be pursued separately.)

As a hack to get around this error, this adds the calloncehack
shared library. The shared library publicly defines the symbol that
was coming from kudu_client. By linking it ahead of kudu_client, the
linker uses that rather than the one from kudu_client. This fixes
the Clang builds.

The new Kudu also requires a minor change to the flags for tserver
startup.

Testing:
 - Ran debug tests and verified calloncehack is not used
 - Ran ASAN tests

Change-Id: Ieccbe284f11445e1de792352ebc7c9e1fa2ca0c3
Reviewed-on: http://gerrit.cloudera.org:8080/18129
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
This was found in ExprTest.LiteralExprs. The rules vioalted are:

1. "reference binding to null pointer of type 'long'". This violates
the standard's [dcl.ref]:

    a null reference cannot exist in a well-defined program, because
    the only way to create such a reference would be to bind it to the
    "object" obtained by indirection through a null pointer, which
    causes undefined behavior.

The interesting part of the backtrace is:

    #1 RuntimeProfile::TimeSeriesCounter::ToThrift(TTimeSeriesCounter*) util/runtime-profile.cc:1117:11
    apache#2 RuntimeProfile::ToThrift(std::vector<TRuntimeProfileNode>*) const util/runtime-profile.cc:905:21
    apache#3 RuntimeProfile::ToThrift(TRuntimeProfileTree*) const util/runtime-profile.cc:847:3
    apache#4 QueryState::ReportExecStatusAux(bool, Status const&, FragmentInstanceState*, bool) runtime/query-state.cc:281:21
    apache#5 QueryState::ReportExecStatus(bool, Status const&, FragmentInstanceState*) runtime/query-state.cc:250:3
    apache#6 FragmentInstanceState::SendReport(bool, Status const&) runtime/fragment-instance-state.cc:406:17
    apache#7 FragmentInstanceState::Finalize(Status const&) runtime/fragment-instance-state.cc:496:3

2. The use of a null pointer when calling memcpy. According to "7.1.4
Use of library functions" in the C99 standard (which is included in
C++14 in section [intro.refs]:

    If an argument to a function has an invalid value (such as a value
    outside the domain of the function, or a pointer outside the
    address space of the program, or a null pointer, or a pointer to
    non-modifiable storage when the corresponding parameter is not
    const-qualified) or a type (after promotion) not expected by a
    function with variable number of arguments, the behavior is
    undefined.

The interesting part of the backtrace is the same as above.

Change-Id: I3c8a6624918389396789a83b32dbf068b9327f76
Reviewed-on: http://gerrit.cloudera.org:8080/11195
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
This implements cache invalidation inside CatalogdMetaProvider. The
design is as follows:

- when the catalogd collects updates into the statestore topic, it now
  adds an additional entry for each table and database. These additional
  entries are minimal - they only include the object's name, but no
  metadata. This new behavior is conditional on a new flag
  --catalog_topic_mode. The default mode is to keep the old style, but
  it can be configured to mixed (support both v1 and v2) or v2-only.

- the old-style topic entries are prefixed with a '1:' whereas the new
  minimal entries are prefixed with a '2:'. The impalad will subscribe
  to one or the other prefix depending on whether it is running with
  --use_local_catalog. Thus, old impalads will not be confused by the
  new entries and vice versa.

- when the impalad gets these topic updates, it forwards them through to
  the catalog implementation. The LocalCatalog implementation forwards
  them to the CatalogdMetaProvider, which uses them to invalidate
  cached metadata as appropriate.

This patch includes some basic unit tests. I also did some manual
testing by connecting to different impalads and verifying that a session
connected to impalad #1 saw the effects of DDLs made by impalad apache#2
within a short period of time (the statestore topic update frequency).

Existing end-to-end tests cover these code paths pretty thoroughly:

- if we didn't automatically invalidate the cache on a coordinator
  in response to DDL operations, then any test which expects to
  "read its own writes" (eg access a table after creating one)
  would fail
- if we didn't propagate invalidations via the statestore, then
  all of the tests that use sync_ddl would fail.

I verified the test coverage above using some of the tests in
test_ddl.py -- I selectively commented out a few of the invalidation
code paths in the new code and verified that tests failed until I
re-introduced them. Along the way I also improved test_ddl so that, when
this code is broken, it properly fails with a timeout. It also has a bit
of expanded coverage for both the SYNC_DDL and non-SYNC cases.

I also wrote a new custom-cluster test for LocalCatalog that verifies
a few of the specific edge cases like detecting catalogd restart,
SYNC_DDL behavior in mixed mode, etc.

One notable exception here is the implementation of INVALIDATE METADATA
This turned out to be complex to implement, so I left a lengthy TODO
describing the issue and filed a JIRA.

Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984
Reviewed-on: http://gerrit.cloudera.org:8080/11280
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Todd Lipcon <todd@apache.org>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
This patch fixes all remaining UBSAN "null pointer passed as argument"
errors in the backend tests. These are undefined behavior according to
"7.1.4 Use of library functions" in the C99 standard (which is
included in C++14 in section [intro.refs]):

    If an argument to a function has an invalid value (such as a value
    outside the domain of the function, or a pointer outside the
    address space of the program, or a null pointer, or a pointer to
    non-modifiable storage when the corresponding parameter is not
    const-qualified) or a type (after promotion) not expected by a
    function with variable number of arguments, the behavior is
    undefined.

The interesting parts of the backtraces for the errors fixed in this
patch are below:

exprs/string-functions-ir.cc:311:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::Replace(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::StringVal const&) exprs/string-functions-ir.cc:311:5
    #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    apache#2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    apache#4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    apache#5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#20 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:868:15: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:868:3
    #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:270
    apache#2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    apache#4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    apache#5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#20 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:871:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:871:5
    #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10
    apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95
    apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    apache#5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#21 thread_proxy (exprs/expr-test+0x55ca939)

exprs/string-functions-ir.cc:873:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:873:5
    #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10
    apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95
    apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    apache#5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#21 thread_proxy (exprs/expr-test+0x55ca939)

runtime/raw-value.cc:159:27: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 RawValue::Write(void const*, void*, ColumnType const&, MemPool*) runtime/raw-value.cc:159:9
    #1 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:225:7
    apache#2 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#3 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#4 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#5 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#6 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#7 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#8 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#9 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#16 thread_proxy (exprs/expr-test+0x55ca939)

udf/udf.cc:521:24: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 impala_udf::StringVal::CopyFrom(impala_udf::FunctionContext*, unsigned char const*, unsigned long) udf/udf.cc:521:5
    #1 AnyValUtil::FromBuffer(impala_udf::FunctionContext*, char const*, int) exprs/anyval-util.h:241:12
    apache#2 StringFunctions::RegexpExtract(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::BigIntVal const&) exprs/string-functions-ir.cc:726:10
    apache#3 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    apache#4 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38
    apache#6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10
    apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27
    apache#8 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5
    apache#9 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#10 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#11 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45
    apache#12 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59
    apache#13 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14
    apache#14 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24
    apache#15 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35
    apache#22 thread_proxy (exprs/expr-test+0x55ca939)

util/coding-util-test.cc:45:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 TestUrl(string const&, string const&, bool) util/coding-util-test.cc:45:3
    #1 UrlCodingTest_BlankString_Test::TestBody() util/coding-util-test.cc:88:3
    apache#2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/coding-util-test+0x6630f42)
    apache#8 main util/coding-util-test.cc:123:192

util/decompress-test.cc:126:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:126:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9
    apache#2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3
    apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    apache#9 main util/decompress-test.cc:479:47

util/decompress-test.cc:148:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:148:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9
    apache#2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3
    apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    apache#9 main util/decompress-test.cc:479:47

util/decompress-test.cc:269:261: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::CompressAndDecompressNoOutputAllocated(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:269:254
    #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:71:7
    apache#2 DecompressorTest_LZ4_Test::TestBody() util/decompress-test.cc:381:3
    apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    apache#9 main util/decompress-test.cc:479:47

util/decompress-test.cc:221:329: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:66:58: note: nonnull attribute specified here
    #0 DecompressorTest::StreamingDecompress(Codec*, long, unsigned char*, long, unsigned char*, bool, long*) util/decompress-test.cc:221:322
    #1 DecompressorTest::CompressAndStreamingDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:245:35
    apache#2 DecompressorTest::RunTestStreaming(THdfsCompression::type) util/decompress-test.cc:104:5
    apache#3 DecompressorTest_Gzip_Test::TestBody() util/decompress-test.cc:386:3
    apache#4 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2)
    apache#10 main util/decompress-test.cc:479:47

util/streaming-sampler.h:55:22: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:45: note: nonnull attribute specified here
    #0 StreamingSampler<long, 64>::StreamingSampler(int, vector<long> const&) util/streaming-sampler.h:55:5
    #1 RuntimeProfile::TimeSeriesCounter::TimeSeriesCounter(string const&, TUnit::type, int, vector<long> const&) util/runtime-profile-counters.h:401:53
    apache#2 RuntimeProfile::Update(vector<TRuntimeProfileNode> const&, int*) util/runtime-profile.cc:310:28
    apache#3 RuntimeProfile::Update(TRuntimeProfileTree const&) util/runtime-profile.cc:245:3
    apache#4 Coordinator::BackendState::InstanceStats::Update(TFragmentInstanceExecStatus const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:473:13
    apache#5 Coordinator::BackendState::ApplyExecStatusReport(TReportExecStatusParams const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:286:21
    apache#6 Coordinator::UpdateBackendExecStatus(TReportExecStatusParams const&) runtime/coordinator.cc:678:22
    apache#7 ClientRequestState::UpdateBackendExecStatus(TReportExecStatusParams const&) service/client-request-state.cc:1253:18
    apache#8 ImpalaServer::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-server.cc:1343:18
    apache#9 ImpalaInternalService::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-internal-service.cc:87:19
    apache#24 thread_proxy (exprs/expr-test+0x55ca939)

Change-Id: I317ccc99549744a26d65f3e07242079faad0355a
Reviewed-on: http://gerrit.cloudera.org:8080/11545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
This removes two signed integer overflows when using the 'conv'
builtin. Signed integer overflow is undefined behavior according to
the C++ standard. The interesting parts of the backtraces are:

    exprs/math-functions-ir.cc:405:13: runtime error: signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 'long'
    exprs/math-functions-ir.cc:404:24: runtime error: signed integer overflow: 2 * 4738381338321616896 cannot be represented in type 'long'

    #0 MathFunctions::DecimalInBaseToDecimal(long, signed char, long*) exprs/math-functions-ir.cc:404:24
    #1 MathFunctions::ConvInt(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, impala_udf::TinyIntVal const&) exprs/math-functions-ir.cc:327:10
    apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580
    apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44
    apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14
    apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5
    apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45

These were triggered in the backend test
ExprTest.MathConversionFunctions.

Change-Id: I0d97dfcf42072750c16e41175765cd9a468a3c39
Reviewed-on: http://gerrit.cloudera.org:8080/11876
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
This patch fixes a signed overflow in the backend test
DecimalTest.Overflow. The interesting part of the backtrace is:

runtime/decimal-value.inline.h:254:17: runtime error: signed integer
         overflow: 0x4b3b4ca85a86c47a098a223fffffffff +
         0x4b3b4ca85a86c47a098a223fffffffff cannot be represented in
         type '__int128'
    #0 detail::AddLarge(__int128, int, __int128, int, int, bool,
         bool*) runtime/decimal-value.inline.h:254:17
    #1 DecimalValue<__int128> DecimalValue<__int128>::Add<__int128>(
         int, DecimalValue<__int128> const&, int, int, int, bool,
         bool*) const runtime/decimal-value.inline.h:371:14
    apache#2 DecimalTest_Overflow_Test::TestBody()
         runtime/decimal-test.cc:540:9

Change-Id: I146bcf35d34cc0e14be0633427d3e4bd0e5a261e
Reviewed-on: http://gerrit.cloudera.org:8080/11917
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
AoJ pushed a commit to AoJ/impala that referenced this pull request Mar 14, 2024
UBSAN finds the following in ParquetBoolDecoder.TestDecodeAndSkipping:

util/bit-stream-utils.inline.h:156:25: runtime error: left shift of 42 by 28 places cannot be represented in type 'int'
    #0 BatchedBitReader::GetUleb128Int(unsigned int*) util/bit-stream-utils.inline.h:156:25
    #1 RleBatchDecoder<bool>::NextCounts() util/rle-encoding.h:778:40
    apache#2 RleBatchDecoder<bool>::NextNumRepeats() util/rle-encoding.h:622:28
    apache#3 RleBatchDecoder<bool>::GetValues(int, bool*) util/rle-encoding.h:858:27
    apache#4 bool ParquetBoolDecoder::DecodeValue<(parquet::Encoding::type)3>(bool*) exec/parquet/parquet-bool-decoder.h:85:24
    apache#5 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int)::$_0::operator()() const exec/parquet/parquet-bool-decoder-test.cc:59
    apache#6 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int) exec/parquet/parquet-bool-decoder-test.cc:69:221
    apache#7 ParquetBoolDecoder_TestDecodeAndSkipping_Test::TestBody() exec/parquet/parquet-bool-decoder-test.cc:85:5
    apache#9 testing::Test::Run() (/home/ubuntu/Impala/be/build/debug/exec/parquet/parquet-bool-decoder-test+0x6ee4f09)

The problem is the line

    *v |= (byte & 0x7F) << shift;

byte is an uint8_t and 0x7F is an int. The standard section
[expr.bit.and] then applies the "usual arithmetic conversions"
specified in [expr], which applies "if the type of the operand with
signed integer type can represent all of the values of the type of the
operand with unsigned integer type, the operand with unsigned integer
type shall be converted to the type of the operand with signed integer
type." That makes byte & 0x7F a signed integer type, and [expr.shift]
says that "if E1 has a signed type and non-negative value, and E1×2^E2
is representable in the corresponding unsigned type of the result
type, then that value, converted to the result type, is the resulting
value; otherwise, the behavior is undefined."

Change-Id: Ie6e0b956751090f3f8aadd6783b5e06e55e57abe
Reviewed-on: http://gerrit.cloudera.org:8080/12346
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
asfgit pushed a commit that referenced this pull request Dec 2, 2024
As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from 543fa73f3a846
f0e4527514c993cb0985912b06c.

Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);

"where": Case-insensitive trim direction. Valid options are "leading",
  "trailing", and "both". "leading" means trimming characters from the
  start; "trailing" means trimming characters from the end; "both" means
  trimming characters from both sides. For Syntax #2, since no "where"
  is specified, the option "both" is implied by default.

"charset": Case-sensitive characters to be removed. This argument is
  regarded as a character set going to be removed. The occurrence order
  of each character doesn't matter and duplicated instances of the same
  character will be ignored. NULL argument implies " " (standard space)
  by default. Empty argument ("" or '') makes TRIM return the string
  untouched. For Syntax #1, since no "charset" is specified, it trims
  " " (standard space) by default.

"string": Case-sensitive target string to trim. This argument can be
  NULL.

The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().

UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.

Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.

2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.

3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.

Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
asf-gitbox-commits pushed a commit that referenced this pull request May 15, 2025
…quences

This adds support for a summarized textual representation of timestamps
for the event sequences present in the aggregated profile.

With the verbose format present in profile V1 and V2, it becomes
difficult to analyze an event's timestamps across instances.

The event sequences are now displayed in a histogram format, based on
the number of timestamps present, in order to support an easier view
for skew analysis and other possible use cases.
(i.e. based on json_profile_event_timestamp_limit)

The summary generated from aggregated instance-level timestamps
(i.e. IMPALA-13304) is used to achieve this within the profile V2,
which covers the possbility of missing events.

Example,
  Verbosity::DEFAULT
  json_profile_event_timestamp_limit = 5 (default)

  Case #1, Number of instances exceeded limit
    Node Lifecycle Event Timeline Summary :
     - Open Started (4s880ms):
        Min: 2s312ms, Avg: 3s427ms, Max: 4s880ms, Count: 12
        HistogramCount: 4, 4, 0, 0, 4

  Case #2, Number of instances within the limit

    Node Lifecycle Event Timeline:
     - Open Started: 5s885ms, 1s708ms, 3s434ms
     - Open Finished: 5s885ms, 1s708ms, 3s435ms
     - First Batch Requested: 5s885ms, 1s708ms, 3s435ms
     - First Batch Returned: 6s319ms, 2s123ms, 3s570ms
     - Last Batch Returned: 7s878ms, 2s123ms, 3s570ms

With Verbosity::EXTENDED or more, all events and timestamps are printed
with full verbosity as before.

Tests:
For test_profile_tool.py, updated the generated outputs for text
and JSON profiles.

Change-Id: I4bcc0e2e7fccfa8a184cfa8a3a96d68bfe6035c0
Reviewed-on: http://gerrit.cloudera.org:8080/22245
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
asf-gitbox-commits pushed a commit that referenced this pull request Sep 10, 2025
With this patch we create Iceberg file descriptors from
LocatedFileStatus objects during IcebergFileMetadataLoader's
parallelListing(). This has the following benefits:
 * We parallelize the creation of Iceberg file descriptor objects
 * We don't need to maintain a large hash map with all the
   LocatedFileStatus objects at once. Now we only need to keep a few
   LocatedFileStatus objects per partition in memory while we are
   converting them to Iceberg file descriptors. I.e., the GC is free to
   destroy the LocatedFileStatus objects we don't use anymore.

This patch retires startup flag 'iceberg_reload_new_files_threshold'.
Since IMPALA-13254 we only list partitions that have new data files,
and we load them in parallel, i.e. efficient incremental table loading
is already covered. From that point the startup flag only added
unnecessary code complexity.

Measurements

I created two tables (from tpcds.store_sales) to measure table loading
times for large tables:

Table #1:
  PARTITIONED BY SPEC(ss_item_sk, BUCKET(5, ss_sold_time_sk))
  partitions: 107818
  files: 754726

Table #2:
  PARTITIONED BY SPEC(ss_item_sk)
  partitions: 18000
  files: 504224

Time taken in IcebergFileMetadataLoader.load() during full table reload:
+----------+-------+------+---------+
|          | Base  | New  | Speedup |
+----------+-------+------+---------+
| Table #1 | 17.3s | 8.1s |    2.14 |
| Table #2 |  7.8s | 4.3s |     1.8 |
+----------+-------+------+---------+

I measured incremental table loading only for Table #2 (since there are
more files per partition this is the worse scenario for the new code, as
it only uses file listings, and each new file were created in a separate
partition)

Time taken in IcebergFileMetadataLoader.load() during incremental table
reload:
+------------+------+------+---------+
| #new files | Base | New  | Speedup |
+------------+------+------+---------+
|          1 | 1.4s | 1.6s |     0.9 |
|        100 | 1.5s | 1.9s |     0.8 |
|        200 | 1.5s | 1.5s |       1 |
+------------+------+------+---------+

We lose a few tenths of a second, but I think the simplified code
justifies it.

Testing:
 * some tests were updated because we we don't have
   startup flag 'iceberg_reload_new_files_threshold' anymore

Change-Id: Ia1c2a7119d76db7ce7c43caec2ccb122a014851b
Reviewed-on: http://gerrit.cloudera.org:8080/23363
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants