fix doc about parquet under incubation by ebuildy · Pull Request #3 · apache/impala

ebuildy · 2017-10-20T20:30:39Z

No description provided.

jbapple-cloudera · 2017-10-20T20:39:43Z

Thank you for your interest in contributing to Impala! We do code review via gerrit rather than GitHub PRs; you can see the process described at https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

jbapple-cloudera · 2017-10-25T15:51:48Z

Posted to gerrit; doing pre-commit Jenkins job now:

https://gerrit.cloudera.org/#/c/8378/

This change fixes three issues: 1. File handle caching is expected to be disabled for remote files (using exclusive HDFS file handles), however the file handles are still being cached. 2. The retry logic for exclusive file handles is broken, leading the number of open files to be incorrect. 3. There is no test coverage for disabling the file handle cache. To fix issue apache#1, when a scan range is requesting an exclusive file handle from the cache, it will always request a newly opened file handle. It also will destroy the file handle when the scan range is closed. To fix issue apache#2, exclusive file handles will no longer retry IOs. Since the exclusive file handle is always a fresh file handle, it will never have a bad file handle from the cache. This returns the logic to its state before IMPALA-4623 in these cases. If a file handle is borrowed from the cache, then the code will continue to retry once with a fresh handle. To fix issue apache#3, custom_cluster/test_hdfs_fd_caching.py now does both positive and negative tests for the file handle cache. It verifies that setting max_cached_file_handles to zero disables caching. It also verifies that caching is disabled on remote files. (This change will resolve IMPALA-5390.) Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861 Reviewed-on: http://gerrit.cloudera.org:8080/7181 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

A recent patch for IMPALA-5129 introduced a use-after-free bug in thrift-server-test. It is fixed in this patch. Change-Id: I2cd434757de2cd384def5b360a479e51812cccca

…ter dynamic linking. Replaces boost::algorithm::trim() with std::string methods when parsing /proc/self/smaps and adds a trivial unit test for MemInfo::ParseSmaps(). I did *not* replace other uses of trim() with equivalents from be/src/gutil/strings/strip.h at this moment. The backstory here is that TestAdmissionControllerStress::test_admission_controller_with_flags fails occasionally on dynamically linked builds of Impala. I was able to reproduce the failure reliably (within 3 tries) with the following: $ ./buildall.sh -notests -so -noclean $ bin/start-impala-cluster.py --impalad_args="--memory_maintenance_sleep_time_ms=1" $ impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over (order by bigint_col desc) c6 from functional.alltypes) t;' The stack trace looks like: (gdb) bt #0 0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fe230df402a in __GI_abort () at abort.c:89 #2 0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95 #3 0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47 #4 0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57 #5 0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87 #6 0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63 #7 0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale const&) (__loc=...) at /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137 #8 0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::string&, std::locale const&) () #9 0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip/src/Impala/be/src/util/mem-info.cc:132 ... My best theory is that there's a race/bug, wherein the std::locale* static initialization work is getting somehow 'reset' by the dynamic linker, when more libraries are linked in as a result of the query. My evidence to support this theory is scant, but I do notice that LD_DEBUG=all prints the following when the query is executed (but not right at startup): binding file /home/philip/src/Impala/toolchain/gcc-4.9.2/lib64/libstdc++.so.6 [0] to /home/philip/src/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.so.2.2 [0]: normal symbol `std::locale::facet::_S_destroy_c_locale(__locale_struct*&)' Note that there are BSS segments for some of std::locale::facet::* inside of libgflags.so. $nm toolchain/gflags-2.2.0-p1/lib/libgflags.so | c++filt | grep facet | grep ' B ' 00000000002e2d10 B std::locale::facet::_S_c_locale 00000000002e2d0c B std::locale::facet::_S_once I'm not the first to run into variants of these issues, though the results are fairly unhelpful: http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/faq.html https://stackoverflow.com/questions/26990412/c-boost-crashes-while-using-locale https://svn.boost.org/trac10/ticket/4671 http://clang-developers.42468.n3.nabble.com/std-use-facet-lt-std-ctype-lt-char-gt-gt-crashes-on-linux-td4033967.html https://unix.stackexchange.com/questions/719/can-we-get-compiler-information-from-an-elf-binary https://stackoverflow.com/questions/42376100/linking-with-library-causes-collate-facet-to-be-missing-from-char http://lists.llvm.org/pipermail/cfe-dev/2012-July/023289.html https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html Change-Id: I8dd807f869a9359d991ba515177fb2298054520e Reviewed-on: http://gerrit.cloudera.org:8080/8888 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins

memset has undefined behavior when its first argument is NULL. The instance fixed here was found by Clang's undefined behavior sanitizer. It was found in the end-to-end tests. The interesting part of the stack trace is: /exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:62:79: note: nonnull attribute specified here #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3 #1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10 #2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49 #3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53 #4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12 #5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24 #6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44 Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2 Reviewed-on: http://gerrit.cloudera.org:8080/10948 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

In expr.add, the standard says: When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. In the end-to-end tests this is triggered, and the interesting part of the backtrace is: include/c++/4.9.2/bits/stl_iterator.h:782:45: runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffe8 #0 __normal_iterator<MemPool::ChunkInfo*, vector<MemPool::ChunkInfo>>::operator+(long) const stl_iterator.h:782:45 #1 MemPool::AcquireData(MemPool*, bool) runtime/mem-pool.cc:190:62 #2 RowBatch::TransferResourceOwnership(RowBatch*) runtime/row-batch.cc:444:26 #3 RowBatch::AcquireState(RowBatch*) runtime/row-batch.cc:505:8 #4 HdfsScanNode::GetNextInternal(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:105:16 #5 HdfsScanNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:81:19 #6 StreamingAggregationNode::GetRowsStreaming(RuntimeState*, RowBatch*) exec/streaming-aggregation-node.cc:116:51 #7 StreamingAggregationNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/streaming-aggregation-node.cc:92:41 Change-Id: I3d28a80763adb62572b3dd81ea732d18d957d248 Reviewed-on: http://gerrit.cloudera.org:8080/11118 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Jim Apple <jbapple-impala@apache.org>

This was found in ExprTest.LiteralExprs. The rules vioalted are: 1. "reference binding to null pointer of type 'long'". This violates the standard's [dcl.ref]: a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by indirection through a null pointer, which causes undefined behavior. The interesting part of the backtrace is: #1 RuntimeProfile::TimeSeriesCounter::ToThrift(TTimeSeriesCounter*) util/runtime-profile.cc:1117:11 #2 RuntimeProfile::ToThrift(std::vector<TRuntimeProfileNode>*) const util/runtime-profile.cc:905:21 #3 RuntimeProfile::ToThrift(TRuntimeProfileTree*) const util/runtime-profile.cc:847:3 #4 QueryState::ReportExecStatusAux(bool, Status const&, FragmentInstanceState*, bool) runtime/query-state.cc:281:21 #5 QueryState::ReportExecStatus(bool, Status const&, FragmentInstanceState*) runtime/query-state.cc:250:3 #6 FragmentInstanceState::SendReport(bool, Status const&) runtime/fragment-instance-state.cc:406:17 #7 FragmentInstanceState::Finalize(Status const&) runtime/fragment-instance-state.cc:496:3 2. The use of a null pointer when calling memcpy. According to "7.1.4 Use of library functions" in the C99 standard (which is included in C++14 in section [intro.refs]: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. The interesting part of the backtrace is the same as above. Change-Id: I3c8a6624918389396789a83b32dbf068b9327f76 Reviewed-on: http://gerrit.cloudera.org:8080/11195 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

memset has undefined behavior when its first argument is NULL. The instance fixed here was found by Clang's undefined behavior sanitizer. It was found in the end-to-end tests. The interesting part of the stack trace is: /exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:62:79: note: nonnull attribute specified here #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3 apache#1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10 apache#2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49 apache#3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53 apache#4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12 apache#5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24 apache#6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44 Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2 Reviewed-on: http://gerrit.cloudera.org:8080/10948 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

In expr.add, the standard says: When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. In the end-to-end tests this is triggered, and the interesting part of the backtrace is: include/c++/4.9.2/bits/stl_iterator.h:782:45: runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffe8 #0 __normal_iterator<MemPool::ChunkInfo*, vector<MemPool::ChunkInfo>>::operator+(long) const stl_iterator.h:782:45 apache#1 MemPool::AcquireData(MemPool*, bool) runtime/mem-pool.cc:190:62 apache#2 RowBatch::TransferResourceOwnership(RowBatch*) runtime/row-batch.cc:444:26 apache#3 RowBatch::AcquireState(RowBatch*) runtime/row-batch.cc:505:8 apache#4 HdfsScanNode::GetNextInternal(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:105:16 apache#5 HdfsScanNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node.cc:81:19 apache#6 StreamingAggregationNode::GetRowsStreaming(RuntimeState*, RowBatch*) exec/streaming-aggregation-node.cc:116:51 apache#7 StreamingAggregationNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/streaming-aggregation-node.cc:92:41 Change-Id: I3d28a80763adb62572b3dd81ea732d18d957d248 Reviewed-on: http://gerrit.cloudera.org:8080/11118 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Jim Apple <jbapple-impala@apache.org>

This patch fixes all remaining UBSAN "null pointer passed as argument" errors in the backend tests. These are undefined behavior according to "7.1.4 Use of library functions" in the C99 standard (which is included in C++14 in section [intro.refs]): If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. The interesting parts of the backtraces for the errors fixed in this patch are below: exprs/string-functions-ir.cc:311:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::Replace(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::StringVal const&) exprs/string-functions-ir.cc:311:5 #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 #2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 #4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #20 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:868:15: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:868:3 #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:270 #2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 #4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #20 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:871:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:871:5 #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10 #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95 #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 #5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #21 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:873:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:873:5 #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10 #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95 #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 #5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #21 thread_proxy (exprs/expr-test+0x55ca939) runtime/raw-value.cc:159:27: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 RawValue::Write(void const*, void*, ColumnType const&, MemPool*) runtime/raw-value.cc:159:9 #1 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:225:7 #2 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #3 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #4 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #5 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #6 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #7 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #8 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #9 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #16 thread_proxy (exprs/expr-test+0x55ca939) udf/udf.cc:521:24: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 impala_udf::StringVal::CopyFrom(impala_udf::FunctionContext*, unsigned char const*, unsigned long) udf/udf.cc:521:5 #1 AnyValUtil::FromBuffer(impala_udf::FunctionContext*, char const*, int) exprs/anyval-util.h:241:12 #2 StringFunctions::RegexpExtract(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::BigIntVal const&) exprs/string-functions-ir.cc:726:10 #3 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 #4 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 #6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #8 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #9 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #10 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #11 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 #12 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 #13 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 #14 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 #15 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 #22 thread_proxy (exprs/expr-test+0x55ca939) util/coding-util-test.cc:45:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 TestUrl(string const&, string const&, bool) util/coding-util-test.cc:45:3 #1 UrlCodingTest_BlankString_Test::TestBody() util/coding-util-test.cc:88:3 #2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/coding-util-test+0x6630f42) #8 main util/coding-util-test.cc:123:192 util/decompress-test.cc:126:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:126:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9 #2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3 #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) #9 main util/decompress-test.cc:479:47 util/decompress-test.cc:148:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:148:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9 #2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3 #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) #9 main util/decompress-test.cc:479:47 util/decompress-test.cc:269:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompressNoOutputAllocated(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:269:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:71:7 #2 DecompressorTest_LZ4_Test::TestBody() util/decompress-test.cc:381:3 #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) #9 main util/decompress-test.cc:479:47 util/decompress-test.cc:221:329: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::StreamingDecompress(Codec*, long, unsigned char*, long, unsigned char*, bool, long*) util/decompress-test.cc:221:322 #1 DecompressorTest::CompressAndStreamingDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:245:35 #2 DecompressorTest::RunTestStreaming(THdfsCompression::type) util/decompress-test.cc:104:5 #3 DecompressorTest_Gzip_Test::TestBody() util/decompress-test.cc:386:3 #4 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) #10 main util/decompress-test.cc:479:47 util/streaming-sampler.h:55:22: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StreamingSampler<long, 64>::StreamingSampler(int, vector<long> const&) util/streaming-sampler.h:55:5 #1 RuntimeProfile::TimeSeriesCounter::TimeSeriesCounter(string const&, TUnit::type, int, vector<long> const&) util/runtime-profile-counters.h:401:53 #2 RuntimeProfile::Update(vector<TRuntimeProfileNode> const&, int*) util/runtime-profile.cc:310:28 #3 RuntimeProfile::Update(TRuntimeProfileTree const&) util/runtime-profile.cc:245:3 #4 Coordinator::BackendState::InstanceStats::Update(TFragmentInstanceExecStatus const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:473:13 #5 Coordinator::BackendState::ApplyExecStatusReport(TReportExecStatusParams const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:286:21 #6 Coordinator::UpdateBackendExecStatus(TReportExecStatusParams const&) runtime/coordinator.cc:678:22 #7 ClientRequestState::UpdateBackendExecStatus(TReportExecStatusParams const&) service/client-request-state.cc:1253:18 #8 ImpalaServer::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-server.cc:1343:18 #9 ImpalaInternalService::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-internal-service.cc:87:19 #24 thread_proxy (exprs/expr-test+0x55ca939) Change-Id: I317ccc99549744a26d65f3e07242079faad0355a Reviewed-on: http://gerrit.cloudera.org:8080/11545 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This removes two signed integer overflows when using the 'conv' builtin. Signed integer overflow is undefined behavior according to the C++ standard. The interesting parts of the backtraces are: exprs/math-functions-ir.cc:405:13: runtime error: signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 'long' exprs/math-functions-ir.cc:404:24: runtime error: signed integer overflow: 2 * 4738381338321616896 cannot be represented in type 'long' #0 MathFunctions::DecimalInBaseToDecimal(long, signed char, long*) exprs/math-functions-ir.cc:404:24 #1 MathFunctions::ConvInt(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, impala_udf::TinyIntVal const&) exprs/math-functions-ir.cc:327:10 #2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 #3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 #8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 These were triggered in the backend test ExprTest.MathConversionFunctions. Change-Id: I0d97dfcf42072750c16e41175765cd9a468a3c39 Reviewed-on: http://gerrit.cloudera.org:8080/11876 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

test_scanners.py has seen several flaky failures on s3 due to eventual consistency. The symptom is Impala being unable to read a file that it just loaded to s3. A large number of tables used in test_scanners.py use the file_utils helper functions for creating the tables. These follow the pattern: 1. Copy files to temporary directory in HDFS/S3/etc 2. Create table 3. Run LOAD DATA to move the files to the table In step #3, LOAD DATA gets the metadata for the table before it runs the move statement on the files. Subsequent queries on the table will not need to reload metadata and can access the file quickly after the move. This changes the ordering to put the files in place before loading metadata. This may improve the likelihood that the filesystem is consistent by the time we read it. Specifically, we now do: 1. Put the files in directory that the table will use when it is created. 2. Create table Neither of these steps load metadata, so the next query that runs will load metadata. Change-Id: Id042496beabe0d0226b347e0653b820fee369f4e Reviewed-on: http://gerrit.cloudera.org:8080/11959 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

UBSAN finds the following in ParquetBoolDecoder.TestDecodeAndSkipping: util/bit-stream-utils.inline.h:156:25: runtime error: left shift of 42 by 28 places cannot be represented in type 'int' #0 BatchedBitReader::GetUleb128Int(unsigned int*) util/bit-stream-utils.inline.h:156:25 #1 RleBatchDecoder<bool>::NextCounts() util/rle-encoding.h:778:40 #2 RleBatchDecoder<bool>::NextNumRepeats() util/rle-encoding.h:622:28 #3 RleBatchDecoder<bool>::GetValues(int, bool*) util/rle-encoding.h:858:27 #4 bool ParquetBoolDecoder::DecodeValue<(parquet::Encoding::type)3>(bool*) exec/parquet/parquet-bool-decoder.h:85:24 #5 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int)::$_0::operator()() const exec/parquet/parquet-bool-decoder-test.cc:59 #6 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int) exec/parquet/parquet-bool-decoder-test.cc:69:221 #7 ParquetBoolDecoder_TestDecodeAndSkipping_Test::TestBody() exec/parquet/parquet-bool-decoder-test.cc:85:5 #9 testing::Test::Run() (/home/ubuntu/Impala/be/build/debug/exec/parquet/parquet-bool-decoder-test+0x6ee4f09) The problem is the line *v |= (byte & 0x7F) << shift; byte is an uint8_t and 0x7F is an int. The standard section [expr.bit.and] then applies the "usual arithmetic conversions" specified in [expr], which applies "if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type." That makes byte & 0x7F a signed integer type, and [expr.shift] says that "if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined." Change-Id: Ie6e0b956751090f3f8aadd6783b5e06e55e57abe Reviewed-on: http://gerrit.cloudera.org:8080/12346 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Tis patch fixes a signed overflow in the test StringToDecimal.LargeDecimals. The interesting part of the backtrace is: util/string-parser.h:397:14: runtime error: signed integer overflow: 0x4b3b4ca85a86c47a098a223fffffffff * 10 cannot be represented in type '__int128' #0 void StringParser::ApplyExponent<__int128>(int, int, signed char, __int128*, int*, int*) util/string-parser.h:397:14 #1 DecimalValue<__int128> StringParser::StringToDecimal<__int128> (char const*, int, int, int, bool, StringParser::ParseResult*) util/string-parser.h:221:5 #2 void VerifyParse<__int128>(string const&, int, int, bool, DecimalValue<__int128> const&, StringParser::ParseResult) runtime/decimal-test.cc:53:25 #3 void VerifyParse<__int128>(string const&, int, int, DecimalValue<__int128> const&, StringParser::ParseResult) runtime/decimal-test.cc:65:3 #4 StringToDecimal_LargeDecimals_Test::TestBody() runtime/decimal-test.cc:443:3 Change-Id: Ifb4effa291e1e4dac62b84251c74c483d99b06e7 Reviewed-on: http://gerrit.cloudera.org:8080/12426 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Tis patch fixes a signed overflow in the test StringToDecimal.LargeDecimals. The interesting part of the backtrace is: util/string-parser.h:397:14: runtime error: signed integer overflow: 0x4b3b4ca85a86c47a098a223fffffffff * 10 cannot be represented in type '__int128' #0 void StringParser::ApplyExponent<__int128>(int, int, signed char, __int128*, int*, int*) util/string-parser.h:397:14 apache#1 DecimalValue<__int128> StringParser::StringToDecimal<__int128> (char const*, int, int, int, bool, StringParser::ParseResult*) util/string-parser.h:221:5 apache#2 void VerifyParse<__int128>(string const&, int, int, bool, DecimalValue<__int128> const&, StringParser::ParseResult) runtime/decimal-test.cc:53:25 apache#3 void VerifyParse<__int128>(string const&, int, int, DecimalValue<__int128> const&, StringParser::ParseResult) runtime/decimal-test.cc:65:3 apache#4 StringToDecimal_LargeDecimals_Test::TestBody() runtime/decimal-test.cc:443:3 Change-Id: Ifb4effa291e1e4dac62b84251c74c483d99b06e7 Reviewed-on: http://gerrit.cloudera.org:8080/12426 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

The standard says that overflow for signed arithmetic operations is undefined behavior; see [expr]: If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. This patch fixes a signed overflow with the folowing backtrace (uninteresting parts elided): runtime/timestamp-value.inline.h:67:13: runtime error: signed integer overflow: -9223372036854775808 + -9223372037 cannot be represented in type 'long' #0 TimestampValue::FromUnixTimeNanos(long, long, cctz::time_zone const&) runtime/timestamp-value.inline.h:67:13 #1 TimestampValue::FromSubsecondUnixTime(double, cctz::time_zone const&) runtime/timestamp-value.inline.h:62:10 #2 CastFunctions::CastToTimestampVal(impala_udf::FunctionContext*, impala_udf::FloatVal const&) exprs/cast-functions-ir.cc:248:172 #3 impala_udf::TimestampVal ScalarFnCall::InterpretEval<impala_udf::TimestampVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:208 #4 ScalarFnCall::GetTimestampVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:608:44 #5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:314:41 #6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 #7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 #8 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 #9 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 #10 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 #11 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 This was seen in the backend test ExprTest.CastExprs. Change-Id: Iaad158e6634314a5690a43a0cc04426c1aba8f41 Reviewed-on: http://gerrit.cloudera.org:8080/11919 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch handles an undefined behavior in the custom cluster tests in which a reference to an invalid enum value is used. To prevent the invalid value, the bytes are first copied into their underlying integer type. The undefined behavior happens in LdapJdbcTest. The relevant backtrace is: include/c++/4.9.2/bits/stl_algobase.h:199:11: runtime error: load of value 8, which is not a valid value for type 'const TProtocolVersion::type' #0 TProtocolVersion::type const& min<TProtocolVersion::type>(TProtocolVersion::type const&, TProtocolVersion::type const&) include/c++/4.9.2/bits/stl_algobase.h:199:11 #1 ImpalaServer::OpenSession(TOpenSessionResp&, TOpenSessionReq const&) service/impala-hs2-server.cc:304:24 #2 TCLIServiceProcessor::process_OpenSession(int, TProtocol*, TProtocol*, void*) generated-sources/gen-cpp/TCLIService.cpp:4953:13 #3 TCLIServiceProcessor::dispatchCall(TProtocol*, TProtocol*, string const&, int, void*) generated-sources/gen-cpp/TCLIService.cpp:4926:3 #4 ImpalaHiveServer2ServiceProcessor::dispatchCall(TProtocol*, TProtocol*, string const&, int, void*) generated-sources/gen-cpp/ImpalaHiveServer2Service.cpp:505:73 #5 thrift::TDispatchProcessor::process (boost::shared_ptr<TProtocol>, boost::shared_ptr<TProtocol>, void*) toolchain/thrift-0.9.3-p5/include/thrift/TDispatchProcessor.h:121:12 #6 thrift::server::TAcceptQueueServer::Task::run() rpc/TAcceptQueueServer.cpp:74:26 #7 ThriftThread::RunRunnable(boost::shared_ptr <thrift::concurrency::Runnable>, Promise<unsigned long, (PromiseMode)0>*) rpc/thrift-thread.cc:74:13 Change-Id: I63379b4c9d2e4738e729a556108c77fed85e6b64 Reviewed-on: http://gerrit.cloudera.org:8080/13438 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This undefined behavior was caught with UBSAN in the end-to-end tests. The interesting part of the backtrace is: gutil/atomicops-internals-x86.h:283:15: runtime error: signed integer overflow: -9223370395229620599 + -9223371946660462582 cannot be represented in type 'long' #0 base::subtle::Barrier_AtomicIncrement(long volatile*, long) gutil/atomicops-internals-x86.h:283:15 #1 internal::AtomicInt<long>::Add(long) common/atomic.h:93:12 #2 RuntimeProfile::Counter::Add(long) util/runtime-profile.h:93 #3 HdfsOrcScanner::AssembleRows(RowBatch*) exec/hdfs-orc-scanner.cc:636:50 #4 HdfsOrcScanner::GetNextInternal(RowBatch*) exec/hdfs-orc-scanner.cc:507:19 #5 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:426:21 #6 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #7 HdfsScanNode::ScannerThread(bool, long) exec/hdfs-scan-node.cc:415:7 #8 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*):: $_0::operator()() const exec/hdfs-scan-node.cc:337:13 Change-Id: Ic638ff4959eaaffc79caa3453dbccaaabcbe95c9 Reviewed-on: http://gerrit.cloudera.org:8080/13433 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Counterintuitively, even passing 0 as the third argument of memcpy does not avoid undefined behavior. This occurred during an end-to-end test. The interesting part of the backtrace is: util/dict-encoding.h:451:20: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 DictEncoder<StringValue>::AddToTable(StringValue const&, unsigned short*) util/dict-encoding.h:451:3 #1 DictEncoder<StringValue>::Put(StringValue const&) util/dict-encoding.h:422:10 #2 HdfsParquetTableWriter::ColumnWriter<StringValue>:: ProcessValue(void*, long*) exec/parquet/hdfs-parquet-table-writer.cc:436:38 #3 HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow*) exec/parquet/hdfs-parquet-table-writer.cc:662:9 #4 HdfsParquetTableWriter::AppendRows(RowBatch*, vector<int> const&, bool*) exec/parquet/hdfs-parquet-table-writer.cc:1192:60 #5 HdfsTableSink::WriteRowsToPartition(RuntimeState*, RowBatch*, pair<unique_ptr<OutputPartition>, vector<int>>*) exec/hdfs-table-sink.cc:253:71 #6 HdfsTableSink::Send(RuntimeState*, RowBatch*) exec/hdfs-table-sink.cc:588:45 Change-Id: I2e8e57c34c2848f0dc7dbf32892cc6e86df63506 Reviewed-on: http://gerrit.cloudera.org:8080/13434 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch fixes UBSAN "null pointer passed as argument" errors in the end-to-end tests. These are undefined behavior according to "7.1.4 Use of library functions" in the C99 standard (which is included in C++14 in section [intro.refs]): If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. The interesting parts of the backtraces are: runtime/sorter.cc:575:18: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 Sorter::Run::CopyVarLenData(vector<StringValue*> const&, unsigned char*) runtime/sorter.cc:575:5 #1 Status Sorter::Run::AddBatchInternal<true, true>(RowBatch*, int, int*) runtime/sorter.cc:232:11 #2 Sorter::Run::AddInputBatch(RowBatch*, int, int*) runtime/sorter.cc:660:12 #3 Sorter::AddBatchNoSpill(RowBatch*, int, int*) runtime/sorter.cc:882:58 #4 Sorter::AddBatch(RowBatch*) runtime/sorter.cc:862:45 #5 SortNode::SortInput(RuntimeState*) exec/sort-node.cc:177:54 #6 SortNode::Open(RuntimeState*) exec/sort-node.cc:90:43 runtime/tuple.cc:105:25: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 Tuple::DeepCopyVarlenData(TupleDescriptor const&, MemPool*) runtime/tuple.cc:105:5 #1 Tuple::DeepCopy(Tuple*, TupleDescriptor const&, MemPool*) runtime/tuple.cc:94:35 #2 Tuple::DeepCopy(TupleDescriptor const&, MemPool*) runtime/tuple.cc:85:3 #3 KrpcDataStreamSender::Channel::AddRow(TupleRow*) runtime/krpc-data-stream-sender.cc:509:43 #4 KrpcDataStreamSender::AddRowToChannel(int, TupleRow*) runtime/krpc-data-stream-sender.cc:846 #5 (<unknown module>) runtime/tuple.cc:146:19: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 Tuple::DeepCopyVarlenData(TupleDescriptor const&, char**, int*, bool) runtime/tuple.cc:146:5 #1 Tuple::DeepCopy(TupleDescriptor const&, char**, int*, bool) runtime/tuple.cc:135:35 #2 RowBatch::SerializeInternal(long, FixedSizeHashTable<Tuple*, int>*, vector<int>*, string*) runtime/row-batch.cc:392:14 #3 RowBatch::Serialize(bool, vector<int>*, string*, long*, bool*) runtime/row-batch.cc:290:45 #4 RowBatch::Serialize(OutboundRowBatch*) runtime/row-batch.cc:259:43 #5 KrpcDataStreamSender::SerializeBatch(RowBatch*, OutboundRowBatch*, int) runtime/krpc-data-stream-sender.cc:955:50 #6 KrpcDataStreamSender::Send(RuntimeState*, RowBatch*) runtime/krpc-data-stream-sender.cc:870:45 runtime/tuple.h:106:12: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:62:79: note: nonnull attribute specified here #0 Tuple::ClearNullBits(int, int) runtime/tuple.h:106:5 #1 HdfsScanner::InitTuple(TupleDescriptor const*, Tuple*, Tuple*) exec/hdfs-scanner.h:512:14 #2 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:742:7 #3 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:375:20 #4 OrcStructReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:322:52 #5 OrcListReader::ReadChildrenValue(int, int, Tuple*, MemPool*) const exec/orc-column-readers.cc:473:52 #6 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:743:60 #7 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:375:20 #8 OrcStructReader::TransferTuple(Tuple*, MemPool*) exec/orc-column-readers.cc:346:52 #9 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*, RowBatch*) exec/hdfs-orc-scanner.cc:669:58 #10 HdfsOrcScanner::AssembleRows(RowBatch*) exec/hdfs-orc-scanner.cc:629:45 #11 HdfsOrcScanner::GetNextInternal(RowBatch*) exec/hdfs-orc-scanner.cc:507:19 #12 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:426:21 #13 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #14 HdfsScanNode::ScannerThread(bool, long) exec/hdfs-scan-node.cc:415:7 #15 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*) ::$_0::operator()() const exec/hdfs-scan-node.cc:337:13 runtime/collection-value-builder.h:75:25: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:28: note: nonnull attribute specified here #0 CollectionValueBuilder::GetFreeMemory(Tuple**, int*) runtime/collection-value-builder.h:75:9 #1 HdfsScanner::GetCollectionMemory(CollectionValueBuilder*, MemPool**, Tuple**, TupleRow**, long*) exec/hdfs-scanner.cc:194:3 #2 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:733:9 #3 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:710:7 #4 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:710:7 #5 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:375:20 #6 OrcStructReader::TransferTuple(Tuple*, MemPool*) exec/orc-column-readers.cc:346:5 #7 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*, RowBatch*) exec/hdfs-orc-scanner.cc:669:5 #8 HdfsOrcScanner::AssembleRows(RowBatch*) exec/hdfs-orc-scanner.cc:629:5 #9 HdfsOrcScanner::GetNextInternal(RowBatch*) exec/hdfs-orc-scanner.cc:507:19 #10 HdfsScanner::GetNext(RowBatch*) exec/hdfs-scanner.h:133:12 #11 HdfsScanNodeMt::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node-mt.cc:106:29 #12 SubplanNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/subplan-node.cc:129:7 #13 AggregationNode::Open(RuntimeState*) exec/aggregation-node.cc:67:5 Change-Id: I9362ce6b9ba470ed90e5bd2dc313b66ebd8c6af5 Reviewed-on: http://gerrit.cloudera.org:8080/13436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Fix remaining signed overflow undefined behaviors in end-to-end tests. The interesting part of the backtraces: exprs/aggregate-functions-ir.cc:464:25: runtime error: signed integer overflow: 0x5a4728ca063b522c0b728f8000000000 + 0x3c2f7086aed236c807a1b50000000000 cannot be represented in type '__int128' #0 AggregateFunctions::DecimalAvgMerge( impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal*) exprs/aggregate-functions-ir.cc:464:25 #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*) exprs/agg-fn-evaluator.cc:327:7 #2 AggFnEvaluator::Add(TupleRow const*, Tuple*) exprs/agg-fn-evaluator.h:257:3 #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*, bool) exec/aggregator.cc:167:24 #4 NonGroupingAggregator::AddBatchImpl(RowBatch*) exec/non-grouping-aggregator-ir.cc:27:5 #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*) exec/non-grouping-aggregator.cc:124:45 #6 AggregationNode::Open(RuntimeState*) exec/aggregation-node.cc:70:57 exprs/aggregate-functions-ir.cc:513:12: runtime error: signed integer overflow: -8282081183197145958 + -4473782455107795527 cannot be represented in type 'long' #0 void AggregateFunctions::SumUpdate<impala_udf::BigIntVal, impala_udf::BigIntVal>(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::BigIntVal*) exprs/aggregate-functions-ir.cc:513:12 #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*) exprs/agg-fn-evaluator.cc:327:7 #2 AggFnEvaluator::Add(TupleRow const*, Tuple*) exprs/agg-fn-evaluator.h:257:3 #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*, bool) exec/aggregator.cc:167:24 #4 NonGroupingAggregator::AddBatchImpl(RowBatch*) exec/non-grouping-aggregator-ir.cc:27:5 #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*) exec/non-grouping-aggregator.cc:124:45 #6 AggregationNode::Open(RuntimeState*) exec/aggregation-node.cc:70:57 exprs/aggregate-functions-ir.cc:585:14: runtime error: signed integer overflow: 0x5a4728ca063b522c0b728f8000000000 + 0x3c2f7086aed236c807a1b50000000000 cannot be represented in type '__int128' #0 AggregateFunctions::SumDecimalMerge( impala_udf::FunctionContext*, impala_udf::DecimalVal const&, impala_udf::DecimalVal*) exprs/aggregate-functions-ir.cc:585:14 #1 AggFnEvaluator::Update(TupleRow const*, Tuple*, void*) exprs/agg-fn-evaluator.cc:327:7 #2 AggFnEvaluator::Add(TupleRow const*, Tuple*) exprs/agg-fn-evaluator.h:257:3 #3 Aggregator::UpdateTuple(AggFnEvaluator**, Tuple*, TupleRow*, bool) exec/aggregator.cc:167:24 #4 NonGroupingAggregator::AddBatchImpl(RowBatch*) exec/non-grouping-aggregator-ir.cc:27:5 #5 NonGroupingAggregator::AddBatch(RuntimeState*, RowBatch*) exec/non-grouping-aggregator.cc:124:45 #6 AggregationNode::Open(RuntimeState*) exec/aggregation-node.cc:70:57 runtime/decimal-value.inline.h:145:12: runtime error: signed integer overflow: 18 * 0x0785ee10d5da46d900f436a000000000 cannot be represented in type '__int128' #0 DecimalValue<__int128>::ScaleTo(int, int, int, bool*) const runtime/decimal-value.inline.h:145:12 #1 DecimalOperators::ScaleDecimalValue( impala_udf::FunctionContext*, DecimalValue<int> const&, int, int, int) exprs/decimal-operators-ir.cc:132:41 #2 DecimalOperators::RoundDecimal(impala_udf::FunctionContext*, impala_udf::DecimalVal const&, int, int, int, int, DecimalOperators::DecimalRoundOp const&) exprs/decimal-operators-ir.cc:465:16 #3 DecimalOperators::RoundDecimal(impala_udf::FunctionContext*, impala_udf::DecimalVal const&, DecimalOperators::DecimalRoundOp const&) exprs/decimal-operators-ir.cc:519:10 #4 DecimalOperators::CastToDecimalVal( impala_udf::FunctionContext*, impala_udf::DecimalVal const&) exprs/decimal-operators-ir.cc:529:10 #5 impala_udf::DecimalVal ScalarFnCall::InterpretEval <impala_udf::DecimalVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:208 #6 ScalarFnCall::GetDecimalVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:618:44 #7 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:321:27 #8 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:251:10 #9 Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow service/fe-support.cc:246:26 #10 (<unknown module>) runtime/multi-precision.h:116:21: runtime error: negation of 0x80000000000000000000000000000000 cannot be represented in type 'int128_t' (aka '__int128'); cast to an unsigned type to negate this value to itself #0 ConvertToInt128(boost::multiprecision::number <boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)1, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>, __int128, bool*) runtime/multi-precision.h:116:21 #1 DecimalValue<__int128> DecimalValue<__int128>::Multiply<__int128>(int, DecimalValue<__int128> const&, int, int, int, bool, bool*) const runtime/decimal-value.inline.h:438:16 #2 DecimalOperators::Multiply_DecimalVal_DecimalVal( impala_udf::FunctionContext*, impala_udf::DecimalVal const&, impala_udf::DecimalVal const&) exprs/decimal-operators-ir.cc:859:3336 #3 impala_udf::DecimalVal ScalarFnCall::InterpretEval <impala_udf::DecimalVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:376 #4 ScalarFnCall::GetDecimalVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:618:44 #5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:321:27 #6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:251:10 #7 Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow service/fe-support.cc:246:26 #8 (<unknown module>) util/runtime-profile-counters.h:194:24: runtime error: signed integer overflow: -1263418397011577524 + -9223370798768111350 cannot be represented in type 'long' #0 RuntimeProfile::AveragedCounter::UpdateCounter (RuntimeProfile::Counter*) util/runtime-profile-counters.h:194:24 #1 RuntimeProfile::UpdateAverage(RuntimeProfile*) util/runtime-profile.cc:199:20 #2 RuntimeProfile::UpdateAverage(RuntimeProfile*) util/runtime-profile.cc:245:14 #3 Coordinator::BackendState::UpdateExecStats (vector<Coordinator::FragmentStats*, allocator<Coordinator::FragmentStats*> > const&) runtime/coordinator-backend-state.cc:429:22 #4 Coordinator::ComputeQuerySummary() runtime/coordinator.cc:775:20 #5 Coordinator::HandleExecStateTransition(Coordinator::ExecState, Coordinator::ExecState) runtime/coordinator.cc:567:3 #6 Coordinator::SetNonErrorTerminalState(Coordinator::ExecState) runtime/coordinator.cc:484:3 #7 Coordinator::GetNext(QueryResultSet*, int, bool*) runtime/coordinator.cc:657:53 #8 ClientRequestState::FetchRowsInternal(int, QueryResultSet*) service/client-request-state.cc:943:34 #9 ClientRequestState::FetchRows(int, QueryResultSet*) service/client-request-state.cc:835:36 #10 ImpalaServer::FetchInternal(TUniqueId const&, bool, int, beeswax::Results*) service/impala-beeswax-server.cc:545:40 #11 ImpalaServer::fetch(beeswax::Results&, beeswax::QueryHandle const&, bool, int) service/impala-beeswax-server.cc:178:19 #12 beeswax::BeeswaxServiceProcessor::process_fetch(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) generated-sources/gen-cpp/BeeswaxService.cpp:3398:13 #13 beeswax::BeeswaxServiceProcessor::dispatchCall (apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, string const&, int, void*) generated-sources/gen-cpp/BeeswaxService.cpp:3200:3 #14 ImpalaServiceProcessor::dispatchCall (apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, string const&, int, void*) generated-sources/gen-cpp/ImpalaService.cpp:1824:48 #15 apache::thrift::TDispatchProcessor::process (boost::shared_ptr<apache::thrift::protocol::TProtocol>, boost::shared_ptr<apache::thrift::protocol::TProtocol>, void*) toolchain/thrift-0.9.3-p5/include/thrift/TDispatchProcessor.h:121:12 Change-Id: I73dd6802ec1023275d09a99a2950f3558313fc8e Reviewed-on: http://gerrit.cloudera.org:8080/13437 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

memset has undefined behavior when its first argument is NULL. The instance fixed here was found by Clang's undefined behavior sanitizer. It was found in the end-to-end tests. The interesting part of the stack trace is: /exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:62:79: note: nonnull attribute specified here #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3 apache#1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10 apache#2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49 apache#3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53 apache#4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12 apache#5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24 apache#6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44 Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2 Reviewed-on: http://gerrit.cloudera.org:8080/10948 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

memset has undefined behavior when its first argument is NULL. The instance fixed here was found by Clang's undefined behavior sanitizer. It was found in the end-to-end tests. The interesting part of the stack trace is: /exec/data-source-scan-node.cc:152:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:62:79: note: nonnull attribute specified here #0 0x482fd8e in DataSourceScanNode::GetNextInputBatch() /exec/data-source-scan-node.cc:152:3 #1 0x482fb40 in DataSourceScanNode::Open(RuntimeState*) /exec/data-source-scan-node.cc:124:10 #2 0x47ef854 in AggregationNode::Open(RuntimeState*) /exec/aggregation-node.cc:71:49 #3 0x23506a4 in FragmentInstanceState::Open() /runtime/fragment-instance-state.cc:266:53 #4 0x234b6a8 in FragmentInstanceState::Exec() /runtime/fragment-instance-state.cc:81:12 #5 0x236ee52 in QueryState::ExecFInstance(FragmentInstanceState*) /runtime/query-state.cc:401:24 #6 0x237093e in QueryState::StartFInstances()::$_0::operator()() const /runtime/query-state.cc:341:44 Change-Id: I18fa02dc887a42a94c6f81e4923d17568f2184f2 Reviewed-on: http://gerrit.cloudera.org:8080/10948 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This eliminates an out-of-range enum value in the end-to-end tests. The interesting part of the backtrace is: exec/parquet/parquet-column-readers.cc:1530:112: runtime error: load of value 38, which is not a valid value for type 'Encoding::type' #0 BaseScalarColumnReader::ReadDataPage() exec/parquet/parquet-column-readers.cc:1530:112 #1 BaseScalarColumnReader::NextPage() exec/parquet/parquet-column-readers.cc:1769:28 #2 bool ScalarColumnReader<long, (parquet::Type::type)2, true> ::ReadValueBatch<false>(int, int, unsigned char*, int*) exec/parquet/parquet-column-readers.cc:459:12 #3 ScalarColumnReader<long, (parquet::Type::type)2, true> ::ReadNonRepeatedValueBatch(MemPool*, int, int, unsigned char*, int*) exec/parquet/parquet-column-readers.cc:106:12 #4 HdfsParquetScanner::AssembleRows(vector<ParquetColumnReader*> const&, RowBatch*, bool*) exec/parquet/hdfs-parquet-scanner.cc:1113:42 #5 HdfsParquetScanner::GetNextInternal(RowBatch*) exec/parquet/hdfs-parquet-scanner.cc:456:19 #6 HdfsParquetScanner::ProcessSplit() exec/parquet/hdfs-parquet-scanner.cc:353:21 #7 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #8 HdfsScanNode::ScannerThread(bool, long) exec/hdfs-scan-node.cc:415:7 #9 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::$_0 ::operator()() const exec/hdfs-scan-node.cc:337:13 Change-Id: Ia86de44daaf56a941fb95b15d5dfd7b5a2752129 Reviewed-on: http://gerrit.cloudera.org:8080/13804 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

According to [expr.post] in the C++14 standard, a call to a member function like a->b() is interpreted as (a->b)(). In other words, the dereferencing is done separately from the call. This makes calling member functions on nullptr undefined behavior, since the dereference invokes undefined behavior. This fixes an error in hdfs-scanner.h in the end-to-end tests. The interesting part of the backtrace is: exec/hdfs-scanner.h:512:14: runtime error: member call on null pointer of type 'Tuple' #0 HdfsScanner::InitTuple(TupleDescriptor const*, Tuple*, Tuple*) exec/hdfs-scanner.h:512:14 #1 HdfsOrcScanner::AssembleCollection(OrcComplexColumnReader const&, int, CollectionValueBuilder*) exec/hdfs-orc-scanner.cc:743:7 #2 OrcCollectionReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:375:20 #3 OrcStructReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:322:52 #4 OrcStructReader::ReadValue(int, Tuple*, MemPool*) exec/orc-column-readers.cc:322:52 #5 OrcStructReader::TransferTuple(Tuple*, MemPool*) exec/orc-column-readers.cc:346:52 #6 HdfsOrcScanner::TransferTuples(OrcComplexColumnReader*, RowBatch*) exec/hdfs-orc-scanner.cc:670:58 #7 HdfsOrcScanner::AssembleRows(RowBatch*) exec/hdfs-orc-scanner.cc:630:45 #8 HdfsOrcScanner::GetNextInternal(RowBatch*) exec/hdfs-orc-scanner.cc:508:19 #9 HdfsOrcScanner::ProcessSplit() exec/hdfs-orc-scanner.cc:427:21 #10 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #11 HdfsScanNode::ScannerThread(bool, long) exec/hdfs-scan-node.cc:415:7 #12 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*)::$_0 ::operator()() const exec/hdfs-scan-node.cc:337:13 Change-Id: I7e5b130848a3c8f11d9010b3378f4054a35e1612 Reviewed-on: http://gerrit.cloudera.org:8080/13803 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

According to [expr.post] in the C++14 standard, a call to a member function like a->b() is interpreted as (a->b)(). In other words, the dereferencing is done separately from the call. This makes calling member functions on nullptr undefined behavior, since the dereference invokes undefined behavior. This fixes such an error in exec-node.cc in the end-to-end tests. The interesting part of the backtrace is: exec/exec-node.cc:396:27: runtime error: member call on null pointer of type 'MemTracker' #0 in ExecNode::ExecDebugActionImpl(TExecNodePhase::type, RuntimeState*) exec/exec-node.cc:396:27 #1 in ExecNode::ExecDebugAction(TExecNodePhase::type, RuntimeState*) exec/exec-node.h:379:12 #2 in ExecNode::Prepare(RuntimeState*) exec/exec-node.cc:106:43 #3 in TopNNode::Prepare(RuntimeState*) exec/topn-node.cc:75:53 Change-Id: Id62d1c504a273451dc1be6831a473f6c7115b403 Reviewed-on: http://gerrit.cloudera.org:8080/13769 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch fixes an out-of-range enum value in the end-to-end tests. The [expr] section of the C++14 standard indicates that out-of-range enum values are undefined behavior: "If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined." The [decl.enum] section explains what values are "in the range of representable values for its type": "[F]or an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows: Let K be 1 for a two's complement representation and 0 for a one's complement or sign-magnitude representation. bmax is the smallest value greater than or equal to max(|emin| - K, |emax|) and equal to 2M-1, where M is a non-negative integer. bmin is zero if emin is non-negative and -(bmax+K) otherwise." The Parquet PageType enum has emin = 0 and emax = 3, so bmin = 0 and bmax = 3. The out-of-range value in the tests is 4294967249, and is therefore undefined behavior. The interesting part of the backtrace is: parquet/parquet-column-readers.cc:1269:24: runtime error: load of value 4294967249, which is not a valid value for type 'PageType::type' #0 BaseScalarColumnReader::InitDictionary() parquet/parquet-column-readers.cc:1269:24 #1 BaseScalarColumnReader::InitDictionaries( vector<BaseScalarColumnReader*>) parquet/parquet-column-readers.cc:1381:53 #2 HdfsParquetScanner::NextRowGroup() parquet/hdfs-parquet-scanner.cc:678:14 #3 HdfsParquetScanner::GetNextInternal(RowBatch*) parquet/hdfs-parquet-scanner.cc:437:45 #4 HdfsParquetScanner::ProcessSplit() parquet/hdfs-parquet-scanner.cc:353:21 #5 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #6 HdfsScanNode::ScannerThread(bool, long) hdfs-scan-node.cc:415:7 #7 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*) ::$_0::operator()() const hdfs-scan-node.cc:337:13 Change-Id: I2d126a8f3e5910d23088a3f916c4cf31aac28d95 Reviewed-on: http://gerrit.cloudera.org:8080/13805 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

In expr.add, the standard says: When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. This is triggered in the end-to-end tests.h The interesting part of the backtrace is: exec/parquet/hdfs-parquet-scanner.cc:1405:45: runtime error: pointer index expression with base 0x00001300e0e9 overflowed to 0xffffffff1300e0ea #0 HdfsParquetScanner::ProcessFooter() exec/parquet/hdfs-parquet-scanner.cc:1405:45 #1 HdfsParquetScanner::Open(ScannerContext*) exec/parquet/hdfs-parquet-scanner.cc:186:26 #2 HdfsScanNodeBase::CreateAndOpenScannerHelper( HdfsPartitionDescriptor*, ScannerContext*, scoped_ptr<HdfsScanner>*) exec/hdfs-scan-node-base.cc:721:59 #3 HdfsScanNodeMt::CreateAndOpenScanner(HdfsPartitionDescriptor*, ScannerContext*, scoped_ptr<HdfsScanner>*) exec/hdfs-scan-node-mt.cc:127:19 #4 HdfsScanNodeMt::GetNext(RuntimeState*, RowBatch*, bool*) exec/hdfs-scan-node-mt.cc:97:21 Change-Id: I81c7db75b564045106edf3d46e2c4a62be77359f Reviewed-on: http://gerrit.cloudera.org:8080/13889 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This fixes an instance of undefined behavior in the end-to-end tests in which an enum value is outside of the allowable values for that enum according to the C++14 standard. Representative backtrace: exec/parquet/parquet-metadata-utils.cc:293:26: runtime error: load of value 49, which is not a valid value for type 'Type::type' #0 ParquetMetadataUtils::ValidateRowGroupColumn( parquet::FileMetaData const&, char const*, int, int, parquet::SchemaElement const&, RuntimeState*) exec/parquet/parquet-metadata-utils.cc:293:26 #1 BaseScalarColumnReader::Reset(HdfsFileDesc const&, parquet::ColumnChunk const&, int) exec/parquet/parquet-column-readers.cc:1077:43 #2 HdfsParquetScanner::InitScalarColumns() exec/parquet/hdfs-parquet-scanner.cc:1679:60 #3 HdfsParquetScanner::NextRowGroup() exec/parquet/hdfs-parquet-scanner.cc:648:45 #4 HdfsParquetScanner::GetNextInternal(RowBatch*) exec/parquet/hdfs-parquet-scanner.cc:437:45 #5 HdfsParquetScanner::ProcessSplit() exec/parquet/hdfs-parquet-scanner.cc:353:21 #6 HdfsScanNode::ProcessSplit(vector<FilterContext> const&, MemPool*, io::ScanRange*, long*) exec/hdfs-scan-node.cc:514:21 #7 HdfsScanNode::ScannerThread(bool, long) exec/hdfs-scan-node.cc:415:7 #8 HdfsScanNode::ThreadTokenAvailableCb(ThreadResourcePool*):: $_0::operator()() const exec/hdfs-scan-node.cc:337:13 Change-Id: I48090e8e0c6c6f18bb1ad3c32c1f5fbffc908844 Reviewed-on: http://gerrit.cloudera.org:8080/13940 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

The error could occur in the following scenario, where thread A is executing a join build fragment and thread B is cancelling the fragment instance. 1. Thread A is in HandoffToProbesAndWait(), reads is_cancelled_ and sees false. 2. Thread B in RuntimeState::Cancel() sets is_cancelled_ = true, acquires cancellation_cvs_lock_, then calls NotifyAll() on the condition variable 3. Thread A calls Wait() on the condition variable, blocks forever because cancellation already happened. The fix is for thread B to acquire the lock that thread A is holding. That prevents the race because #1 and #3 above are in the same critical section and thread B won't be able to signal the condition variable until thread A has released it. Testing: Added metric check to test_failpoints to make it easier to detect hangs caused by those tests in future. Looped test_failpoints.py overnight, which was previously enough to reproduce the failure within a couple of hours. Ran exhaustive tests. Change-Id: I996ad2055d6542eb57e12c663b89de5f84208f77 Reviewed-on: http://gerrit.cloudera.org:8080/15672 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Fixes the following TSAN data races that come up when running custom cluster tests. The immediate goal is to fix all remaining data races in custom cluster tests and then enable custom cluster tests in the TSAN builds. This patch fixes about half of the remaining data races reported during a TSAN build of custom cluster tests. SUMMARY: ThreadSanitizer: data race util/stopwatch.h:186:9 in impala::MonotonicStopWatch::RunningTime() const Read of size 8 at 0x7b580000dba8 by thread T342: #0 impala::MonotonicStopWatch::RunningTime() const util/stopwatch.h:186:9 #1 impala::MonotonicStopWatch::Reset() util/stopwatch.h:136:20 #2 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35 Previous write of size 8 at 0x7b580000dba8 by thread T341: #0 impala::MonotonicStopWatch::Reset() util/stopwatch.h:139:21 (impalad+0x1f744ab) #1 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35 SUMMARY: ThreadSanitizer: data race status.h:220:10 in impala::Status::operator=(impala::Status&&) Write of size 8 at 0x7b50002e01e0 by thread T341 (mutexes: write M17919): #0 impala::Status::operator=(impala::Status&&) common/status.h:220:10 #1 impala::RuntimeState::SetQueryStatus(std::string const&) runtime/runtime-state.h:250 #2 impala_udf::FunctionContext::SetError(char const*) udf/udf.cc:423:47 Previous read of size 8 at 0x7b50002e01e0 by thread T342: #0 impala::Status::ok() const common/status.h:236:42 #1 impala::RuntimeState::GetQueryStatus() runtime/runtime-state.h:15 #2 impala::HdfsScanner::CommitRows(int, impala::RowBatch*) exec/hdfs-scanner.cc:218:3 SUMMARY: ThreadSanitizer: data race hashtable.h:370:58 Read of size 8 at 0x7b2400091df8 by thread T338 (mutexes: write M106814410723061456): ... #3 impala::MetricGroup::CMCompatibleCallback() util/metrics.cc:185:40 ... #9 impala::Webserver::RenderUrlWithTemplate() util/webserver.cc:801:3 #10 impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) util/webserver.cc:696:5 Previous write of size 8 at 0x7b2400091df8 by thread T364 (mutexes: write M600803201008047112, write M1046659357959855584): ... #4 impala::AtomicMetric<(impala::TMetricKind::type)0>* impala::MetricGroup::RegisterMetric<> >() util/metrics.h:366:5 #5 impala::MetricGroup::AddGauge(std::string const&, long, std::string const&) util/metrics.h:384:12 #6 impala::AdmissionController::PoolStats::InitMetrics() scheduling/admission-controller.cc:1714:55 Testing: * Ran core tests * Re-ran TSAN tests and made sure issues were resolved * Ran single_node_perf_run for workload TPC-H scale factor 30; no regressions detected +----------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +----------+-----------------------+---------+------------+------------+----------------+ | TPCH(30) | parquet / none / none | 7.36 | -1.77% | 5.01 | -1.61% | +----------+-----------------------+---------+------------+------------+----------------+ Change-Id: Id4244c9a7f971c96b8b8dc7d5262904a0a4b77c1 Reviewed-on: http://gerrit.cloudera.org:8080/16079 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch fixes the remaining TSAN errors reported while running custom cluster tests. After this patch, TSAN can be enabled for custom cluster tests (currently it is only run for be tests). Adds a data race suppression for HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret, which usually calls a codegen function. TSAN currently does not support codegen functions, so this warning needs to be suppressed. The call stack of this warning is: #0 kudu::BlockBloomFilter::Find(unsigned int) const kudu/util/block_bloom_filter.cc:257:7 #1 <null> <null> (0x7f19af1c74cd) #2 impala::HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:106:10 #3 impala::HdfsColumnarScanner::TransferScratchTuples(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:66:34 Fixes a data race in DmlExecState::FinalizeHdfsInsert where a local HdfsFsCache::HdfsFsMap is unsafely passed between threads of a HdfsOperationSet. HdfsOperationSet instances are run in a HdfsOpThreadPool and each operation is run in one of the threads from the pool. Each operation uses HdfsFsCache::GetConnection to get a hdfsFs instance. GetConnection can take in a 'local_cache' of hdfsFs instances before using the global map. The race condition is that the same local cache is used for all operations in HdfsOperationSet. Testing: * Re-ran TSAN tests and confirmed the data races have disappeared Change-Id: If1658a9b56d220e2cfd1f8b958604edcdf7757f4 Reviewed-on: http://gerrit.cloudera.org:8080/16426 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Fixes the following TSAN data races that come up when running custom cluster tests. The immediate goal is to fix all remaining data races in custom cluster tests and then enable custom cluster tests in the TSAN builds. This patch fixes about half of the remaining data races reported during a TSAN build of custom cluster tests. SUMMARY: ThreadSanitizer: data race util/stopwatch.h:186:9 in impala::MonotonicStopWatch::RunningTime() const Read of size 8 at 0x7b580000dba8 by thread T342: #0 impala::MonotonicStopWatch::RunningTime() const util/stopwatch.h:186:9 #1 impala::MonotonicStopWatch::Reset() util/stopwatch.h:136:20 apache#2 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35 Previous write of size 8 at 0x7b580000dba8 by thread T341: #0 impala::MonotonicStopWatch::Reset() util/stopwatch.h:139:21 (impalad+0x1f744ab) #1 impala::StatestoreSubscriber::Heartbeat(impala::TUniqueId const&) statestore/statestore-subscriber.cc:358:35 SUMMARY: ThreadSanitizer: data race status.h:220:10 in impala::Status::operator=(impala::Status&&) Write of size 8 at 0x7b50002e01e0 by thread T341 (mutexes: write M17919): #0 impala::Status::operator=(impala::Status&&) common/status.h:220:10 #1 impala::RuntimeState::SetQueryStatus(std::string const&) runtime/runtime-state.h:250 apache#2 impala_udf::FunctionContext::SetError(char const*) udf/udf.cc:423:47 Previous read of size 8 at 0x7b50002e01e0 by thread T342: #0 impala::Status::ok() const common/status.h:236:42 #1 impala::RuntimeState::GetQueryStatus() runtime/runtime-state.h:15 apache#2 impala::HdfsScanner::CommitRows(int, impala::RowBatch*) exec/hdfs-scanner.cc:218:3 SUMMARY: ThreadSanitizer: data race hashtable.h:370:58 Read of size 8 at 0x7b2400091df8 by thread T338 (mutexes: write M106814410723061456): ... apache#3 impala::MetricGroup::CMCompatibleCallback() util/metrics.cc:185:40 ... apache#9 impala::Webserver::RenderUrlWithTemplate() util/webserver.cc:801:3 apache#10 impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) util/webserver.cc:696:5 Previous write of size 8 at 0x7b2400091df8 by thread T364 (mutexes: write M600803201008047112, write M1046659357959855584): ... apache#4 impala::AtomicMetric<(impala::TMetricKind::type)0>* impala::MetricGroup::RegisterMetric<> >() util/metrics.h:366:5 apache#5 impala::MetricGroup::AddGauge(std::string const&, long, std::string const&) util/metrics.h:384:12 apache#6 impala::AdmissionController::PoolStats::InitMetrics() scheduling/admission-controller.cc:1714:55 Testing: * Ran core tests * Re-ran TSAN tests and made sure issues were resolved * Ran single_node_perf_run for workload TPC-H scale factor 30; no regressions detected +----------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +----------+-----------------------+---------+------------+------------+----------------+ | TPCH(30) | parquet / none / none | 7.36 | -1.77% | 5.01 | -1.61% | +----------+-----------------------+---------+------------+------------+----------------+ Change-Id: Id4244c9a7f971c96b8b8dc7d5262904a0a4b77c1 Reviewed-on: http://gerrit.cloudera.org:8080/16079 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch fixes the remaining TSAN errors reported while running custom cluster tests. After this patch, TSAN can be enabled for custom cluster tests (currently it is only run for be tests). Adds a data race suppression for HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret, which usually calls a codegen function. TSAN currently does not support codegen functions, so this warning needs to be suppressed. The call stack of this warning is: #0 kudu::BlockBloomFilter::Find(unsigned int) const kudu/util/block_bloom_filter.cc:257:7 #1 <null> <null> (0x7f19af1c74cd) apache#2 impala::HdfsColumnarScanner::ProcessScratchBatchCodegenOrInterpret(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:106:10 apache#3 impala::HdfsColumnarScanner::TransferScratchTuples(impala::RowBatch*) exec/hdfs-columnar-scanner.cc:66:34 Fixes a data race in DmlExecState::FinalizeHdfsInsert where a local HdfsFsCache::HdfsFsMap is unsafely passed between threads of a HdfsOperationSet. HdfsOperationSet instances are run in a HdfsOpThreadPool and each operation is run in one of the threads from the pool. Each operation uses HdfsFsCache::GetConnection to get a hdfsFs instance. GetConnection can take in a 'local_cache' of hdfsFs instances before using the global map. The race condition is that the same local cache is used for all operations in HdfsOperationSet. Testing: * Re-ran TSAN tests and confirmed the data races have disappeared Change-Id: If1658a9b56d220e2cfd1f8b958604edcdf7757f4 Reviewed-on: http://gerrit.cloudera.org:8080/16426 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

In the current design, catalogd cache gets updated from 2 sources: 1. Impala shell 2. MetastoreEventProcessor The updates from the Impala shell are applied in place whereas MetastoreEventProcessor runs as a background thread which polls HMS events from notifications log table and apply them asynchronously. These two stream of updates cause consistency issues. For example consider the following sequence of alter table events on a table t1 as per HMS: 1. alter table t1 from source s1 say other Impala cluster 2. alter table t1 from source s2 say other Hive cluster 3. alter table t1 from local Impala cluster The apache#3 alter table ddl operation would get reflected in the local cache immediately. However, later on event processor would process events from apache#1 and apache#2 above and try to alter the table. In an ideal scenario, these alters should have been applied before apache#3 i.e in the same order as they appear in HMS notification log. This leaves table t1 in an inconsistent state. Proposed solution: The main idea of the solution is to keep track of the last event id for a given table as eventId which the catalogd has synced to in the Db/Table object. The events processor ignores any event whose EVENT_ID is less than or equal to the eventId stored in the table. Once the events processor successfully processes a given event, it updates the value of eventId in the table before releasing the table lock. Also, any DDL or refresh operation on the catalogd from both catalog HMS metastore server and Impala shell) will follow the following steps to update the event id for the table: 1. Acquire write lock on the table 2. Perform ddl operation in HMS 3. Sync table till the latest event id (as per HMS) since its last synced event id The above steps ensure that any concurrent updates applied on a same db/table from multiple sources like Hive, Impala or say multiple Impala clusters, get reflected in the local catalogd cache in the same order as they appear in HMS thus removing any inconsistencies. Also the solution relies on the existing locking mechanism in the catalogd to prevent any other concurrent updates to the table (even via EventsProcessor). In case of database objects, we will also have a similar eventId which represents the events on the database object (CREATE, DROP, ALTER database) to which the catalogd as synced to. This patch addresses the following: 1. Add a new flag enable_sync_to_latest_event_on_ddls to enable/disable this improvement. It is turned off by default. 2. If flag in apache#1 is enabled then apart from Impala shell and MetastoreEventProcessor the cache would also get updated for ddls executed via catalog HMS endpoints. And while excuting a ddl, db/table will be synced till latest event id. 3. Event processor skips processing an event if db/table is already synced till that event id. Sets that event id in db/table if the event is processed. 4. When EventProcessor detects a self event, it sets the last synced event id in db/table before skipping the processing of an event. 5. Full table refresh sets the last event processed in table cache. Future Work: 1. Sync db/table to latest event id for ddls executed from Impala shell (execDdlRequest() in catalogOpExecutor) Testing: 1. Added new unit tests and modified existing ones 2. Ran exhaustive tests with flag both turned on and off Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9

In the current design, catalogd cache gets updated from 2 sources: 1. Impala shell 2. MetastoreEventProcessor The updates from the Impala shell are applied in place whereas MetastoreEventProcessor runs as a background thread which polls HMS events from notifications log table and apply them asynchronously. These two stream of updates cause consistency issues. For example consider the following sequence of alter table events on a table t1 as per HMS: 1. alter table t1 from source s1 say other Impala cluster 2. alter table t1 from source s2 say other Hive cluster 3. alter table t1 from local Impala cluster The #3 alter table ddl operation would get reflected in the local cache immediately. However, later on event processor would process events from #1 and #2 above and try to alter the table. In an ideal scenario, these alters should have been applied before #3 i.e in the same order as they appear in HMS notification log. This leaves table t1 in an inconsistent state. Proposed solution: The main idea of the solution is to keep track of the last event id for a given table as eventId which the catalogd has synced to in the Db/Table object. The events processor ignores any event whose EVENT_ID is less than or equal to the eventId stored in the table. Once the events processor successfully processes a given event, it updates the value of eventId in the table before releasing the table lock. Also, any DDL or refresh operation on the catalogd from both catalog HMS metastore server and Impala shell) will follow the following steps to update the event id for the table: 1. Acquire write lock on the table 2. Perform ddl operation in HMS 3. Sync table till the latest event id (as per HMS) since its last synced event id The above steps ensure that any concurrent updates applied on a same db/table from multiple sources like Hive, Impala or say multiple Impala clusters, get reflected in the local catalogd cache in the same order as they appear in HMS thus removing any inconsistencies. Also the solution relies on the existing locking mechanism in the catalogd to prevent any other concurrent updates to the table (even via EventsProcessor). In case of database objects, we will also have a similar eventId which represents the events on the database object (CREATE, DROP, ALTER database) to which the catalogd as synced to. This patch addresses the following: 1. Add a new flag enable_sync_to_latest_event_on_ddls to enable/disable this improvement. It is turned off by default. 2. If flag in #1 is enabled then apart from Impala shell and MetastoreEventProcessor the cache would also get updated for ddls executed via catalog HMS endpoints. And while excuting a ddl, db/table will be synced till latest event id. 3. Event processor skips processing an event if db/table is already synced till that event id. Sets that event id in db/table if the event is processed. 4. When EventProcessor detects a self event, it sets the last synced event id in db/table before skipping the processing of an event. 5. Full table refresh sets the last event processed in table cache. Future Work: 1. Sync db/table to latest event id for ddls executed from Impala shell (execDdlRequest() in catalogOpExecutor) Testing: 1. Added new unit tests and modified existing ones 2. Ran exhaustive tests with flag both turned on and off Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9 Reviewed-on: http://gerrit.cloudera.org:8080/17859 Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This was found in ExprTest.LiteralExprs. The rules vioalted are: 1. "reference binding to null pointer of type 'long'". This violates the standard's [dcl.ref]: a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by indirection through a null pointer, which causes undefined behavior. The interesting part of the backtrace is: #1 RuntimeProfile::TimeSeriesCounter::ToThrift(TTimeSeriesCounter*) util/runtime-profile.cc:1117:11 apache#2 RuntimeProfile::ToThrift(std::vector<TRuntimeProfileNode>*) const util/runtime-profile.cc:905:21 apache#3 RuntimeProfile::ToThrift(TRuntimeProfileTree*) const util/runtime-profile.cc:847:3 apache#4 QueryState::ReportExecStatusAux(bool, Status const&, FragmentInstanceState*, bool) runtime/query-state.cc:281:21 apache#5 QueryState::ReportExecStatus(bool, Status const&, FragmentInstanceState*) runtime/query-state.cc:250:3 apache#6 FragmentInstanceState::SendReport(bool, Status const&) runtime/fragment-instance-state.cc:406:17 apache#7 FragmentInstanceState::Finalize(Status const&) runtime/fragment-instance-state.cc:496:3 2. The use of a null pointer when calling memcpy. According to "7.1.4 Use of library functions" in the C99 standard (which is included in C++14 in section [intro.refs]: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. The interesting part of the backtrace is the same as above. Change-Id: I3c8a6624918389396789a83b32dbf068b9327f76 Reviewed-on: http://gerrit.cloudera.org:8080/11195 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This patch fixes all remaining UBSAN "null pointer passed as argument" errors in the backend tests. These are undefined behavior according to "7.1.4 Use of library functions" in the C99 standard (which is included in C++14 in section [intro.refs]): If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. The interesting parts of the backtraces for the errors fixed in this patch are below: exprs/string-functions-ir.cc:311:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::Replace(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::StringVal const&) exprs/string-functions-ir.cc:311:5 #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 apache#2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 apache#4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 apache#5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#20 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:868:15: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:868:3 #1 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:270 apache#2 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#3 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 apache#4 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 apache#5 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#7 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#8 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#9 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#10 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#11 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#12 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#13 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#20 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:871:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:871:5 #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10 apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95 apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 apache#5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#21 thread_proxy (exprs/expr-test+0x55ca939) exprs/string-functions-ir.cc:873:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StringFunctions::ConcatWs(impala_udf::FunctionContext*, impala_udf::StringVal const&, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:873:5 #1 StringFunctions::Concat(impala_udf::FunctionContext*, int, impala_udf::StringVal const*) exprs/string-functions-ir.cc:843:10 apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:510:95 apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#4 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 apache#5 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 apache#6 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#11 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#12 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#13 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#14 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#21 thread_proxy (exprs/expr-test+0x55ca939) runtime/raw-value.cc:159:27: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 RawValue::Write(void const*, void*, ColumnType const&, MemPool*) runtime/raw-value.cc:159:9 #1 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:225:7 apache#2 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#3 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#4 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#5 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#6 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#7 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#8 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#9 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#16 thread_proxy (exprs/expr-test+0x55ca939) udf/udf.cc:521:24: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 impala_udf::StringVal::CopyFrom(impala_udf::FunctionContext*, unsigned char const*, unsigned long) udf/udf.cc:521:5 #1 AnyValUtil::FromBuffer(impala_udf::FunctionContext*, char const*, int) exprs/anyval-util.h:241:12 apache#2 StringFunctions::RegexpExtract(impala_udf::FunctionContext*, impala_udf::StringVal const&, impala_udf::StringVal const&, impala_udf::BigIntVal const&) exprs/string-functions-ir.cc:726:10 apache#3 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 apache#4 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#5 ScalarExprEvaluator::GetValue(ScalarExpr const&, TupleRow const*) exprs/scalar-expr-evaluator.cc:299:38 apache#6 ScalarExprEvaluator::GetValue(TupleRow const*) exprs/scalar-expr-evaluator.cc:250:10 apache#7 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, ScalarExprEvaluator* const*, MemPool*, StringValue**, int*, int*) runtime/tuple.cc:222:27 apache#8 void Tuple::MaterializeExprs<false, false>(TupleRow*, TupleDescriptor const&, vector<ScalarExprEvaluator*> const&, MemPool*, vector<StringValue*>*, int*) runtime/tuple.h:174:5 apache#9 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#10 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#11 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 apache#12 FragmentInstanceState::ExecInternal() runtime/fragment-instance-state.cc:310:59 apache#13 FragmentInstanceState::Exec() runtime/fragment-instance-state.cc:95:14 apache#14 QueryState::ExecFInstance(FragmentInstanceState*) runtime/query-state.cc:488:24 apache#15 QueryState::StartFInstances()::$_0::operator()() const runtime/query-state.cc:416:35 apache#22 thread_proxy (exprs/expr-test+0x55ca939) util/coding-util-test.cc:45:10: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 TestUrl(string const&, string const&, bool) util/coding-util-test.cc:45:3 #1 UrlCodingTest_BlankString_Test::TestBody() util/coding-util-test.cc:88:3 apache#2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/coding-util-test+0x6630f42) apache#8 main util/coding-util-test.cc:123:192 util/decompress-test.cc:126:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:126:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9 apache#2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3 apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) apache#9 main util/decompress-test.cc:479:47 util/decompress-test.cc:148:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:148:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:84:9 apache#2 DecompressorTest_Default_Test::TestBody() util/decompress-test.cc:373:3 apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) apache#9 main util/decompress-test.cc:479:47 util/decompress-test.cc:269:261: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::CompressAndDecompressNoOutputAllocated(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:269:254 #1 DecompressorTest::RunTest(THdfsCompression::type) util/decompress-test.cc:71:7 apache#2 DecompressorTest_LZ4_Test::TestBody() util/decompress-test.cc:381:3 apache#3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) apache#9 main util/decompress-test.cc:479:47 util/decompress-test.cc:221:329: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/string.h:66:58: note: nonnull attribute specified here #0 DecompressorTest::StreamingDecompress(Codec*, long, unsigned char*, long, unsigned char*, bool, long*) util/decompress-test.cc:221:322 #1 DecompressorTest::CompressAndStreamingDecompress(Codec*, Codec*, long, unsigned char*) util/decompress-test.cc:245:35 apache#2 DecompressorTest::RunTestStreaming(THdfsCompression::type) util/decompress-test.cc:104:5 apache#3 DecompressorTest_Gzip_Test::TestBody() util/decompress-test.cc:386:3 apache#4 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (util/decompress-test+0x6642bb2) apache#10 main util/decompress-test.cc:479:47 util/streaming-sampler.h:55:22: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:45: note: nonnull attribute specified here #0 StreamingSampler<long, 64>::StreamingSampler(int, vector<long> const&) util/streaming-sampler.h:55:5 #1 RuntimeProfile::TimeSeriesCounter::TimeSeriesCounter(string const&, TUnit::type, int, vector<long> const&) util/runtime-profile-counters.h:401:53 apache#2 RuntimeProfile::Update(vector<TRuntimeProfileNode> const&, int*) util/runtime-profile.cc:310:28 apache#3 RuntimeProfile::Update(TRuntimeProfileTree const&) util/runtime-profile.cc:245:3 apache#4 Coordinator::BackendState::InstanceStats::Update(TFragmentInstanceExecStatus const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:473:13 apache#5 Coordinator::BackendState::ApplyExecStatusReport(TReportExecStatusParams const&, Coordinator::ExecSummary*, ProgressUpdater*) runtime/coordinator-backend-state.cc:286:21 apache#6 Coordinator::UpdateBackendExecStatus(TReportExecStatusParams const&) runtime/coordinator.cc:678:22 apache#7 ClientRequestState::UpdateBackendExecStatus(TReportExecStatusParams const&) service/client-request-state.cc:1253:18 apache#8 ImpalaServer::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-server.cc:1343:18 apache#9 ImpalaInternalService::ReportExecStatus(TReportExecStatusResult&, TReportExecStatusParams const&) service/impala-internal-service.cc:87:19 apache#24 thread_proxy (exprs/expr-test+0x55ca939) Change-Id: I317ccc99549744a26d65f3e07242079faad0355a Reviewed-on: http://gerrit.cloudera.org:8080/11545 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

This removes two signed integer overflows when using the 'conv' builtin. Signed integer overflow is undefined behavior according to the C++ standard. The interesting parts of the backtraces are: exprs/math-functions-ir.cc:405:13: runtime error: signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 'long' exprs/math-functions-ir.cc:404:24: runtime error: signed integer overflow: 2 * 4738381338321616896 cannot be represented in type 'long' #0 MathFunctions::DecimalInBaseToDecimal(long, signed char, long*) exprs/math-functions-ir.cc:404:24 #1 MathFunctions::ConvInt(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, impala_udf::TinyIntVal const&) exprs/math-functions-ir.cc:327:10 apache#2 impala_udf::StringVal ScalarFnCall::InterpretEval<impala_udf::StringVal>(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:485:580 apache#3 ScalarFnCall::GetStringVal(ScalarExprEvaluator*, TupleRow const*) const exprs/scalar-fn-call.cc:599:44 apache#8 UnionNode::MaterializeExprs(vector<ScalarExprEvaluator*> const&, TupleRow*, unsigned char*, RowBatch*) exec/union-node-ir.cc:29:14 apache#9 UnionNode::GetNextConst(RuntimeState*, RowBatch*) exec/union-node.cc:263:5 apache#10 UnionNode::GetNext(RuntimeState*, RowBatch*, bool*) exec/union-node.cc:296:45 These were triggered in the backend test ExprTest.MathConversionFunctions. Change-Id: I0d97dfcf42072750c16e41175765cd9a468a3c39 Reviewed-on: http://gerrit.cloudera.org:8080/11876 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

test_scanners.py has seen several flaky failures on s3 due to eventual consistency. The symptom is Impala being unable to read a file that it just loaded to s3. A large number of tables used in test_scanners.py use the file_utils helper functions for creating the tables. These follow the pattern: 1. Copy files to temporary directory in HDFS/S3/etc 2. Create table 3. Run LOAD DATA to move the files to the table In step apache#3, LOAD DATA gets the metadata for the table before it runs the move statement on the files. Subsequent queries on the table will not need to reload metadata and can access the file quickly after the move. This changes the ordering to put the files in place before loading metadata. This may improve the likelihood that the filesystem is consistent by the time we read it. Specifically, we now do: 1. Put the files in directory that the table will use when it is created. 2. Create table Neither of these steps load metadata, so the next query that runs will load metadata. Change-Id: Id042496beabe0d0226b347e0653b820fee369f4e Reviewed-on: http://gerrit.cloudera.org:8080/11959 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

UBSAN finds the following in ParquetBoolDecoder.TestDecodeAndSkipping: util/bit-stream-utils.inline.h:156:25: runtime error: left shift of 42 by 28 places cannot be represented in type 'int' #0 BatchedBitReader::GetUleb128Int(unsigned int*) util/bit-stream-utils.inline.h:156:25 #1 RleBatchDecoder<bool>::NextCounts() util/rle-encoding.h:778:40 apache#2 RleBatchDecoder<bool>::NextNumRepeats() util/rle-encoding.h:622:28 apache#3 RleBatchDecoder<bool>::GetValues(int, bool*) util/rle-encoding.h:858:27 apache#4 bool ParquetBoolDecoder::DecodeValue<(parquet::Encoding::type)3>(bool*) exec/parquet/parquet-bool-decoder.h:85:24 apache#5 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int)::$_0::operator()() const exec/parquet/parquet-bool-decoder-test.cc:59 apache#6 TestSkipping(parquet::Encoding::type, unsigned char*, int, vector<bool> const&, int, int) exec/parquet/parquet-bool-decoder-test.cc:69:221 apache#7 ParquetBoolDecoder_TestDecodeAndSkipping_Test::TestBody() exec/parquet/parquet-bool-decoder-test.cc:85:5 apache#9 testing::Test::Run() (/home/ubuntu/Impala/be/build/debug/exec/parquet/parquet-bool-decoder-test+0x6ee4f09) The problem is the line *v |= (byte & 0x7F) << shift; byte is an uint8_t and 0x7F is an int. The standard section [expr.bit.and] then applies the "usual arithmetic conversions" specified in [expr], which applies "if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type." That makes byte & 0x7F a signed integer type, and [expr.shift] says that "if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined." Change-Id: Ie6e0b956751090f3f8aadd6783b5e06e55e57abe Reviewed-on: http://gerrit.cloudera.org:8080/12346 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in IMPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from 543fa73f3a846 f0e4527514c993cb0985912b06c. Supported syntaxes: Syntax #1 TRIM(<where> FROM <string>); Syntax #2 TRIM(<charset> FROM <string>); Syntax #3 TRIM(<where> <charset> FROM <string>); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. The UTF8_MODE query option is honored by TRIM-FROM, similarly to existing TRIM(). UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query option. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Although generally a keyword would allow easier and better parsing, on the negative side it restricts token's usage in general context. However, leading/trailing/both, being previously saved as reserved words, are now added as keywords to make possible their usage with no escaping. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Reviewed-on: http://gerrit.cloudera.org:8080/21825 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>

fix doc about parquet under incubation

802409b

Humbedooh closed this Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix doc about parquet under incubation#3

fix doc about parquet under incubation#3
ebuildy wants to merge 1 commit intoapache:masterfrom
ebuildy:patch-1

ebuildy commented Oct 20, 2017

Uh oh!

jbapple-cloudera commented Oct 20, 2017

Uh oh!

jbapple-cloudera commented Oct 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ebuildy commented Oct 20, 2017

Uh oh!

jbapple-cloudera commented Oct 20, 2017

Uh oh!

jbapple-cloudera commented Oct 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants