Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many open files when pressure test #2196

Closed
kangpinghuang opened this issue Nov 14, 2019 · 2 comments
Closed

too many open files when pressure test #2196

kangpinghuang opened this issue Nov 14, 2019 · 2 comments
Assignees
Labels
kind/fix Categorizes issue or PR as related to a bug.
Milestone

Comments

@kangpinghuang
Copy link
Contributor

Describe the bug
running segment v2 under pressure test, it comes to the warning:
failed to init segment writer: IO error: /home/disk3/palo-service/9130/data/57/27138/169995813/020000000009a0511541d25ef1896ca2b3e5fd0c02d15585_0.dat: Too many open files

which will lead to init segment writer failed and NPE error when add row.

@kangpinghuang
Copy link
Contributor Author

core dump message:

Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/opt/compiler/gcc-4.8.2/lib/libthread_db.so.1".
Core was generated by `/home/palo-service/test/9130/PALO-BE/be/lib/palo_be'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 put_varint32doris::faststring (v=, dst=0x70) at /home/palo-service/doris/core/be/src/util/coding.h:135
135 /home/palo-service/doris/core/be/src/util/coding.h: No such file or directory.
(gdb) bt
#0 put_varint32doris::faststring (v=, dst=0x70) at /home/palo-service/doris/core/be/src/util/coding.h:135
#1 doris::ShortKeyIndexBuilder::add_item (this=0x0, key=...) at /home/palo-service/doris/core/be/src/olap/short_key_index.cpp:30
#2 0x000000000157aeb4 in doris::segment_v2::SegmentWriter::append_rowdoris::RowCursor (this=0x7d2ac380, row=...)
at /home/palo-service/doris/core/be/src/olap/rowset/segment_v2/segment_writer.cpp:87
#3 0x0000000000f4f68a in doris::BetaRowsetWriter::_add_rowdoris::RowCursor (this=0x5fdb810e0, row=...)
at /home/palo-service/doris/core/be/src/olap/rowset/beta_rowset_writer.cpp:89
#4 0x000000000155c2c1 in doris::RowBlockMerger::merge (this=this@entry=0x7f578e103eb0, row_block_arr=..., rowset_writer=0x5fdb810e0,
merged_rows=merged_rows@entry=0x7f578e103e20) at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:604
#5 0x000000000155d699 in doris::SchemaChangeWithSorting::_internal_sorting (this=this@entry=0x4f3c81a80, row_block_arr=..., version=...,
version_hash=, new_tablet=..., new_rowset_type=new_rowset_type@entry=doris::BETA_ROWSET, rowset=0x7f578e104050)
at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:1110
#6 0x000000000155e2e6 in doris::SchemaChangeWithSorting::process (this=0x4f3c81a80, rowset_reader=..., new_rowset_writer=0x354051440, new_tablet=...,
base_tablet=...) at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:1002
#7 0x00000000015610c8 in doris::SchemaChangeHandler::_convert_historical_rowsets (sc_params=...)
at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:1683
#8 0x000000000156359d in doris::SchemaChangeHandler::_do_process_alter_tablet_v2 (this=this@entry=0x7f578e1045c0, request=...)
at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:1338
#9 0x00000000015646f2 in doris::SchemaChangeHandler::process_alter_tablet_v2 (this=this@entry=0x7f578e1045c0, request=...)
at /home/palo-service/doris/core/be/src/olap/schema_change.cpp:1163
#10 0x0000000001594b09 in doris::EngineAlterTabletTask::execute (this=0x7f578e104800)
at /home/palo-service/doris/core/be/src/olap/task/engine_alter_tablet_task.cpp:39
#11 0x0000000000e648d5 in doris::StorageEngine::execute_task (this=0x589c580, task=task@entry=0x7f578e104800)
at /home/palo-service/doris/core/be/src/olap/storage_engine.cpp:926
#12 0x00000000013adf41 in doris::TaskWorkerPool::_alter_tablet (this=this@entry=0x75ad7a0, worker_pool_this=worker_pool_this@entry=0x75ad7a0, agent_task_req=...,
signature=signature@entry=15898, task_type=task_type@entry=doris::TTaskType::ALTER, finish_task_request=finish_task_request@entry=0x7f578e1048d0)
at /home/palo-service/doris/core/be/src/agent/task_worker_pool.cpp:614
#13 0x00000000013b7c3a in doris::TaskWorkerPool::_alter_tablet_worker_thread_callback (arg_this=0x75ad7a0)
at /home/palo-service/doris/core/be/src/agent/task_worker_pool.cpp:562
#14 0x00007f581bcf01c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
#15 0x00007f581bfed12d in clone () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
(gdb)

@kangpinghuang
Copy link
Contributor Author

there are two problems for this issue:

  1. Too many open file, the reason is the table has many tablet and eache tablet has hundruds of rowsets(600+), rowsets will hold segment file fd.
  2. when the segment writer init failed(because of too many open files or others), the segment writer pointer in BetaRowsetWriter in not null but the ShortKeyIndexBuilder pointer in SegmentWriter is null, which lead to NPE core dump.

kangpinghuang pushed a commit to kangpinghuang/incubator-doris that referenced this issue Nov 14, 2019
@imay imay added the kind/fix Categorizes issue or PR as related to a bug. label Nov 14, 2019
@imay imay added this to To do in Add Beta Rowset via automation Nov 14, 2019
@imay imay added this to the 0.12.0 milestone Nov 14, 2019
@imay imay closed this as completed Nov 14, 2019
Add Beta Rowset automation moved this from To do to Done Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/fix Categorizes issue or PR as related to a bug.
Projects
No open projects
Development

No branches or pull requests

2 participants