Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up loading of data parts at startup. #4699

Closed
alexey-milovidov opened this issue Mar 14, 2019 · 5 comments · Fixed by #42181
Closed

Speed up loading of data parts at startup. #4699

alexey-milovidov opened this issue Mar 14, 2019 · 5 comments · Fixed by #42181
Assignees
Labels
feature performance st-accepted The issue is in our backlog, ready to take

Comments

@alexey-milovidov
Copy link
Member

  1. Do not load inactive data parts that are old enough (they will be deleted nevertheless). But still use them if we need to repair broken part at startup. Delete old inactive data parts at startup.

  2. Use multiple threads for loading data parts to mitigate latency.

@alexey-milovidov
Copy link
Member Author

№2 was implemented.
№1 is still worth doing.

@alexey-milovidov alexey-milovidov added the st-accepted The issue is in our backlog, ready to take label Nov 1, 2019
@amosbird
Copy link
Collaborator

Perhaps we should also separate table loading and table starting up, or else merge/fetch will interfere with the part loading process.

@alexey-milovidov
Copy link
Member Author

Parts are not loaded in parallel :(

Thread 53 (Thread 0x7f33a65fd700 (LWP 9216)):
#0  __libc_pread64 (offset=<optimized out>, count=3288, buf=0x7f33b9e44000, fd=25) at ../sysdeps/unix/sysv/linux/pread64.c:29
#1  __libc_pread64 (fd=25, buf=0x7f33b9e44000, count=3288, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:27
#2  0x000000000a729dc1 in DB::ReadBufferFromFileDescriptor::nextImpl (this=0x7f33b9e53000) at ./src/IO/ReadBufferFromFileDescriptor.cpp:73
#3  0x000000001319b7ac in DB::ReadBuffer::next (this=0x7f33b9e53000) at ./src/IO/ReadBuffer.h:62
#4  DB::ReadBuffer::eof (this=0x7f33b9e53000) at ./src/IO/ReadBuffer.h:96
#5  DB::MergeTreeDataPartWide::loadIndexGranularity (this=0x7f331384cc18) at ./src/Storages/MergeTree/MergeTreeDataPartWide.cpp:129
#6  0x00000000130d9f88 in DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes (this=0x7f331384cc18, require_columns_checksums=<optimized out>, check_consistency=true) at ./src/Storages/MergeTree/IMergeTreeDataPart.cpp:618
#7  0x0000000013173458 in DB::MergeTreeData::loadDataPartsFromDisk(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >&, ThreadPoolImpl<ThreadFromGlobalPool>&, unsigned long, std::__1::queue<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > >, std::__1::deque<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > >, std::__1::allocator<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > > > > >&, bool, std::__1::shared_ptr<DB::MergeTreeSettings const> const&)::$_11::operator()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::IDisk> const&) const (this=0x7f33a6df2f50, part_name=..., part_disk_ptr=...) at ./src/Storages/MergeTree/MergeTreeData.cpp:1016
#8  DB::MergeTreeData::loadDataPartsFromDisk(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >&, ThreadPoolImpl<ThreadFromGlobalPool>&, unsigned long, std::__1::queue<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > >, std::__1::deque<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > >, std::__1::allocator<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<DB::IDisk> > > > > > >&, bool, std::__1::shared_ptr<DB::MergeTreeSettings const> const&)::$_12::operator()() const (this=<optimized out>) at ./src/Storages/MergeTree/MergeTreeData.cpp:1110
#9  0x000000000a76db09 in std::__1::__function::__policy_func<void ()>::operator()() const (this=0x7f33a65f3cd0) at ./contrib/libcxx/include/functional:2221
#10 std::__1::function<void ()>::operator()() const (this=0x7f33a65f3cd0) at ./contrib/libcxx/include/functional:2560
#11 ThreadPoolImpl<ThreadFromGlobalPool>::worker (this=this@entry=0x7f33a6df3260, thread_it=thread_it@entry=...) at ./src/Common/ThreadPool.cpp:274
#12 0x000000000a76f38c in ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}::operator()() const (this=<optimized out>) at ./src/Common/ThreadPool.cpp:139
#13 std::__1::__invoke_constexpr<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&> (__f=...) at ./contrib/libcxx/include/type_traits:3682
#14 std::__1::__apply_tuple_impl<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&, std::__1::tuple<>&>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&, std::__1::tuple<>&, std::__1::__tuple_indices<>) (__f=..., __t=...) at ./contrib/libcxx/include/tuple:1415
#15 std::__1::apply<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&, std::__1::tuple<>&>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&, std::__1::tuple<>&) (__f=..., __t=...) at ./contrib/libcxx/include/tuple:1424
#16 ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}&&)::{lambda()#1}::operator()() (this=0x7f33bd8ec940) at ./src/Common/ThreadPool.h:188
#17 0x000000000a76c0aa in std::__1::__function::__policy_func<void ()>::operator()() const (this=0x7f33a65f4010) at ./contrib/libcxx/include/functional:2221
#18 std::__1::function<void ()>::operator()() const (this=0x7f33a65f4010) at ./contrib/libcxx/include/functional:2560
#19 ThreadPoolImpl<std::__1::thread>::worker (this=0x7f344903ea00, thread_it=...) at ./src/Common/ThreadPool.cpp:274
#20 0x000000000a76e50e in ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}::operator()() const (this=0x7f34490847e8) at ./src/Common/ThreadPool.cpp:139
#21 std::__1::__invoke<ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}> (__f=...) at ./contrib/libcxx/include/type_traits:3676
#22 std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}>&, std::__1::__tuple_indices<>) (__t=...) at ./contrib/libcxx/include/thread:280
#23 std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}> >(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#2}>) (__vp=0x7f34490847e0) at ./contrib/libcxx/include/thread:291
#24 0x00007f344a125609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x00007f344a04c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@alexey-milovidov
Copy link
Member Author

Broken here: #6489

@alexey-milovidov
Copy link
Member Author

@CurtizJ said he can implement №1.
It is nontrivial - requires building a tree of data parts according to their names, then start loading by breadth-first traversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature performance st-accepted The issue is in our backlog, ready to take
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants