Limit the memory usage of Loading process #1954

morningman · 2019-10-11T09:56:31Z

Current load framework using memtable to receive incoming loading data and flush to disk when reaching limit (default is 100MB).
Each tablet corresponds to a memtable, so that if there are many tablets on a Backend, and the loading data is distributed evenly on each of these tablets, the total memory consumption can be very large because all memtables will no be flushed until they reaching the 100MB.
For example, if there are 100 tablets on a Backend, the peak memory consumption can be 10GB(100 * 100MB), and this may cause process killed by system OOM.

This CL try to resolve this problem.

ISSUE #1951

be/src/runtime/load_channel_mgr.h

imay · 2019-10-11T11:08:55Z

be/src/runtime/tablets_channel.h

+    // return Status::OK if mem is reduced.
+    Status reduce_mem_usage();
+
+    int64_t mem_consumption() { return _mem_tracker->consumption(); }


Suggested change

int64_t mem_consumption() { return _mem_tracker->consumption(); }

int64_t mem_consumption() const { return _mem_tracker->consumption(); }

imay · 2019-10-11T11:35:55Z

be/src/runtime/load_channel_mgr.cpp

+            if (handle != nullptr && request.has_eos() && request.eos()) {
+                _lastest_success_channel->release(handle);
+                return Status::OK();
+            }


should release handle if handle is not nullptr

imay · 2019-10-11T11:41:21Z

be/src/runtime/load_channel_mgr.cpp

+Status LoadChannelMgr::start_bg_worker() {
+    _load_channels_clean_thread = std::thread(
+        [this] {
+            #ifdef GOOGLE_PROFILER


What's wrong? It looks good~

Macro definitions do not require spaces

imay · 2019-10-11T11:44:28Z

be/src/runtime/load_channel_mgr.cpp

+            #endif
+
+            uint32_t interval = 60;
+            while (true) {


better join this thread in destructor to avoid invalid visit of destroyed members.

OK, I will not detach this thread and join it when deconstructing LoadChannelMgr

If you while true, how does this thread exit?

imay · 2019-10-11T12:25:13Z

be/src/runtime/load_channel.h

+    // index id -> tablets channel
+    std::unordered_map<int64_t, std::shared_ptr<TabletsChannel>> _tablets_channels;
+
+    Cache* _lastest_success_channel = nullptr;


std::unordered_set<int64_t> is simple and enough, and renaming it _finished_channels is easy to understand.

imay · 2019-10-11T12:27:41Z

be/src/runtime/load_channel.cpp

+        if (it == _tablets_channels.end()) {
+            auto handle = _lastest_success_channel->lookup(std::to_string(index_id));
+            // success only when eos be true
+            if (handle != nullptr && request.has_eos() && request.eos()) {


If use _finished_channels, no need to check if request is eos

imay · 2019-10-11T12:28:25Z

be/src/runtime/load_channel.cpp

+
+    RETURN_IF_ERROR(channel->open(params));
+
+    if (!_opened) {


no need to check

imay · 2019-10-11T12:43:39Z

be/src/runtime/load_channel.cpp

+    return st;
+}
+
+bool LoadChannel::_find_largest_max_consumption_tablets_channel(std::shared_ptr<TabletsChannel>* channel) {


add comment for lock should be held.
1 + 1 + 80 < 50 + 50
why not choose 80?

It may happen but with low possibility. And even if we choose 50, it is not a big deal.
I will eave it unchanged. If it becomes a real case, change it later.

imay · 2019-10-14T09:01:07Z

be/src/common/config.h

-    CONF_Int32(write_buffer_size, "104857600");
+    CONF_Int64(write_buffer_size, "104857600");
+
+    // followin 2 configs limit the memory consumption of load process on a Backend.


Suggested change

// followin 2 configs limit the memory consumption of load process on a Backend.

// following 2 configs limit the memory consumption of load process on a Backend.

imay · 2019-10-14T09:08:53Z

be/src/http/action/stream_load.cpp

@@ -347,6 +347,13 @@ Status StreamLoadAction::_process_put(HttpRequest* http_req, StreamLoadContext*
    if (!http_req->header(HTTP_TIMEZONE).empty()) {
        request.__set_timezone(http_req->header(HTTP_TIMEZONE));
    }
+    if (!http_req->header(HTTP_EXEC_MEM_LIMIT).empty()) {
+        try {
+            request.__set_execMemLimit(std::stoi(http_req->header(HTTP_EXEC_MEM_LIMIT))); 


stoi return int, it should be stoll

imay · 2019-10-14T09:10:43Z

be/src/exec/tablet_sink.h

@@ -283,6 +283,9 @@ class OlapTableSink : public DataSink {

    // BE id -> add_batch method counter
    std::unordered_map<int64_t, AddBatchCounter> _node_add_batch_counter_map;
+
+    // load mem limit is for remote load channel
+    int64_t _load_mem_limit;


Suggested change

int64_t _load_mem_limit;

int64_t _load_mem_limit = 0;

better to give a default value

I will set default as -1, which means unlimit

imay · 2019-10-14T09:17:20Z

be/src/olap/memtable_flush_executor.cpp

@@ -51,9 +52,16 @@ void FlushHandler::on_flush_finished(const FlushResult& res) {
        _stats.flush_count.fetch_add(1);
        _counter_cond.dec();
    }
+
+#if 0


Why comment this ?

No need, I will remove it. Memtracker will be released when deconstructing memtbale

imay · 2019-10-14T09:18:00Z

be/src/olap/memtable_flush_executor.h

@@ -54,6 +55,7 @@ struct MemTableFlushContext {
 struct FlushResult {
    OLAPStatus flush_status;
    int64_t flush_time_ns;
+    int64_t flush_size_bytes;


better to give a default value.

imay · 2019-10-14T09:22:11Z

be/src/runtime/load_channel.cpp

+}
+
+LoadChannel::~LoadChannel() {
+    LOG(INFO) << "load channel mem peak usage: " << _mem_tracker->peak_consumption()


Do we need this log?

I just want to observe. It won't print too many logs. Maybe removed later.

imay · 2019-10-14T09:26:59Z

be/src/runtime/load_channel_mgr.cpp

+Status LoadChannelMgr::start_bg_worker() {
+    _load_channels_clean_thread = std::thread(
+        [this] {
+            #ifdef GOOGLE_PROFILER


Macro definitions do not require spaces

imay · 2019-10-14T09:27:59Z

be/src/runtime/load_channel_mgr.cpp

+            #endif
+
+            uint32_t interval = 60;
+            while (true) {


If you while true, how does this thread exit?

be/src/runtime/load_channel_mgr.h

imay · 2019-10-14T09:50:16Z

be/src/olap/memtable.h

@@ -57,7 +57,7 @@ class MemTable {
    };

    RowCursorComparator _row_comparator;
-    std::unique_ptr<MemTracker> _tracker;
+    std::unique_ptr<MemTracker> _mem_tracker;


There are a little many levels of Memory Trackers.
It will hurt performance when the level of concurrence is high.

We use 5-level hierarchy of mem trackers:
LoadChannelMgr->LoadChannel->TabletsChannel->DeltaWriter->Memtable.

I think it is OK because:

Load process may not has such high concurrency that updating the mem trackers have impact on it.

The Memtracker is designed for query process, which is ought to support at least 5 levels: process -> pool -> query -> fragment -> sub fragment. (This is what Memtracker's comment said)

imay

LGTM

…pache#1954)

morningman and others added 3 commits October 9, 2019 18:54

add load channel manager

c2fafb5

second commit

aac62de

add memtracker

dfff85f

imay requested changes Oct 11, 2019

View reviewed changes

morningman-cmy and others added 3 commits October 12, 2019 22:29

add total mem limit

1c80f0e

modify frontend

d9ab959

add docs

914d24c

imay reviewed Oct 14, 2019

View reviewed changes

morningman-cmy added 4 commits October 14, 2019 21:46

fix by review

3d3153b

fix by review 3

42e6e68

fix by review 4

c81c4a9

fix by review 5

7f31818

imay approved these changes Oct 14, 2019

View reviewed changes

imay closed this Oct 14, 2019

imay reopened this Oct 14, 2019

imay merged commit 62acf5d into apache:master Oct 15, 2019

wuyunfeng pushed a commit to wuyunfeng/incubator-doris that referenced this pull request Oct 22, 2019

Limit the memory usage of Loading process (apache#1954)

0823228

SWJTU-ZhangLei pushed a commit to SWJTU-ZhangLei/incubator-doris that referenced this pull request Jul 25, 2023

(selectdb-cloud) Remove frequent log of file cache background thread (a…

30487de

…pache#1954)

	int64_t mem_consumption() { return _mem_tracker->consumption(); }
	int64_t mem_consumption() const { return _mem_tracker->consumption(); }

	// followin 2 configs limit the memory consumption of load process on a Backend.
	// following 2 configs limit the memory consumption of load process on a Backend.

Limit the memory usage of Loading process #1954

Limit the memory usage of Loading process #1954

Conversation

morningman commented Oct 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imay left a comment

Choose a reason for hiding this comment