[fix](BufferedReader) fix BufferedReader::_read_once call memcpy function using null pointor _buffer #27775
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
I got a core stack like this:
#0 _mm_loadu_si128(long long __vector(2) const*) (P=0x576ff) at /var/local/ldb-toolchain/lib/gcc/x86_64-linux-gnu/11/include/emmintrin.h:703
#1 inline_memcpy (size=77472, src=, dst=0x7f76a5252d00) at /data/doris-1.x/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:187
#2 memcpy (dst=0x7f76a5252d00, src=, size=size@entry=77472) at /data/doris-1.x/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:219
#3 0x0000557c566d3abd in memcpy (__len=77472, __src=, __dest=) at /var/local/ldb-toolchain/usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
#4 doris::BufferedReader::_read_once (this=0x7f7647e98180, position=, nbytes=, bytes_read=0x7f7a19088178, out=)
at /data/doris-1.x/be/src/io/buffered_reader.cpp:137
#5 0x0000557c566d3c42 in doris::BufferedReader::readat (this=0x7f7647e98180, position=358146, nbytes=77472, bytes_read=0x7f7a19088178, out=0x7f76a5252d00)
at /data/doris-1.x/be/src/io/buffered_reader.cpp:94
#6 0x0000557c56817a27 in doris::ArrowFile::ReadAt (this=0x7f765b4ec4b0, position=, nbytes=77472, out=0x7f76a5252d00)
at /data/doris-1.x/be/src/exec/arrow/arrow_reader.cpp:228
#7 0x0000557c5d07a6d0 in ?? ()
#8 0x0000557c5dd4e724 in orc::SeekableFileInputStream::Next(void const**, int*) ()
#9 0x0000557c5dd4fd0c in orc::DecompressionStream::readBuffer(bool) ()
#10 0x0000557c5dd4fa4f in orc::DecompressionStream::readHeader() ()
#11 0x0000557c5dd4fe30 in orc::DecompressionStream::Next(void const**, int*) ()
#12 0x0000557c5dd5aed1 in orc::StringDirectColumnReader::next(orc::ColumnVectorBatch&, unsigned long, char*) ()
#13 0x0000557c5dd5a207 in orc::StructColumnReader::next(orc::ColumnVectorBatch&, unsigned long, char*) ()
#14 0x0000557c5dd4679b in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
#15 0x0000557c5d07b243 in ?? ()
#16 0x0000557c5ce7403f in arrow::RecordBatchReader::ReadAll(std::vector<std::shared_ptrarrow::RecordBatch, std::allocator<std::shared_ptrarrow::RecordBatch > >*) ()
#17 0x0000557c57bed85d in doris::ORCReaderWrap::read_batches (this=0x7f7a0f5a0400, batches=..., current_group=)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
#18 0x0000557c5681907f in doris::ArrowReaderWrap::prefetch_batch (this=0x7f7a0f5a0400) at /data/doris-1.x/be/src/exec/arrow/arrow_reader.cpp:183
#19 0x0000557c5e7933e0 in execute_native_thread_routine ()
#20 0x00007f7aa0215ea5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f7aa05289fd in clone () from /lib64/libc.so.6
I gdb the core file and found be core because of BufferedReader::_read_once call memcpy using null pointor _buffer.
I have read the relevant code, and I couldn't find any place that sets the _buffer to a null pointer except for BufferedReader::close(). The close() function is only called when BufferedReader is destructed. I didn't find any place where BufferedReader continues to be used after it is destructed. Therefore, I will temporarily add a null pointer protection before memcpy.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...