Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yb-master can occasionally allocate lots (~100GB) of ram #862

Closed
aphyr opened this Issue Feb 12, 2019 · 7 comments

Comments

4 participants
@aphyr
Copy link

commented Feb 12, 2019

Despite --memory_limit_hard_bytes 4294967296 (4 GB), Yugabyte 1.1.10.0 community edition can, during Jepsen tests involving network partitions, have a single yb-master process allocate hundreds of gigabytes of ram, eventually causing the box to run out of ram, and processes to start crashing. I don't know yet whether this was specifically triggered by a partition, but I do have a reproduction case for you, if you've got Jepsen available:

git checkout 2330b2efdeee435943784f95aa3a52b1356a18db
lein run test -o debian --version 1.1.11.0 --concurrency 4n -w single-key-acid --nemesis partition-half --nemesis-interval 20 --nemesis-schedule fixed --time-limit 600 --test-count 10

I know this isn't much to go on, but I do have a tarball of the data files on each node. The invocation parameters for yb-master and yb-tserver are in the log directories, in files called stdout.

@aphyr

This comment has been minimized.

Copy link
Author

commented Feb 13, 2019

Update: this is also present in 1.1.13.0-b2, and doesn't require a network partition--I just hit this problem in a test without any failures at all. Here's the data files for all nodes in the cluster. By the time I noticed the fault, n2 had crashed on its own; n3 was using most of my RAM. Here's the data files.

@aphyr aphyr changed the title Network partitions can cause yb-master to allocate lots (~100GB) of ram yb-master can occasionally allocate lots (~100GB) of ram Feb 13, 2019

@aphyr

This comment has been minimized.

Copy link
Author

commented Feb 14, 2019

I have another failing case, and this time I caught it in the act! One master node, n3, was allocating about 2gb/second before I was forced to kill the process. I've got all kinds of debugging information here, though I wasn't able to move fast enough to get a core dump.

This was running Jepsen 0b1af10d382f705162671d0151ba58ad19d14bda, with the following command:

lein run test-all -o debian --version 1.1.13.0-b2 --url https://downloads.yugabyte.com/new/yugabyte-ce-1.1.13.0-b2-centos-x86_64.tar.gz --concurrency 4n --time-limit 300 --test-count 3 --only-workloads-expected-to-pass -w bank

It's sporadic: I'm hitting memory leaks about once a day in my tests right now. This particular test involved network partitions (--nemesis partition-half,partition-one,partition-ring), but we've seen in earlier tests that this can happen even without any partitions, process kills, restarts, etc.

Here are the log files from all nodes, as well as the Jepsen logs for the test, including the time window when the memory leak started happening: memory-leak.zip.

Here's the complete data files, as well as atop's state files, from each node: memleak.tar.bz2

At this point the process had already chewed up hundreds of GB of ram, but that didn't show up in /memz, which fits with the theory that it might be something outside tcmalloc:

------------------------------------------------
MALLOC:      225565800 (  215.1 MiB) Bytes in use by application
MALLOC: +       876544 (    0.8 MiB) Bytes in page heap freelist
MALLOC: +      3056032 (    2.9 MiB) Bytes in central cache freelist
MALLOC: +      1249600 (    1.2 MiB) Bytes in transfer cache freelist
MALLOC: +     17690808 (   16.9 MiB) Bytes in thread cache freelists
MALLOC: +      2568352 (    2.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    251007136 (  239.4 MiB) Actual memory used (physical + swap)
MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    251007136 (  239.4 MiB) Virtual address space used
MALLOC:
MALLOC:          11573              Spans in use
MALLOC:            301              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

rpcz.txt
varz.txt
threadz.txt
metrics.json.txt

screenshot from 2019-02-14 13-20-12

yugabyte-ci pushed a commit that referenced this issue Feb 17, 2019

[ENG-4666] [#862] Implement a global memory limit for the master and …
…stop the master if exceeded

Summary:
In case of a memory leak, this change prevents the master process from consuming all memory on the
system, and forces it to shut down and eventually be restarted by its supervisor script. We use
the `/proc/<pid>/statm` Linux file to compute the RSS (resident set size), not including
memory-mapped files and shared memory. If that amount exceeds the root memtracker limit multiplied
by the given factor (200% by default), we shut down the master server. This mechanism is also
generalizable to other programs in case we need it. The macOS implementation is currently a no-op.

Test Plan:
- Jenkins
- Manually add a memory leak to the master and verify that it shuts down when memory usage grows high.

Reviewers: bogdan, sergei, kannan, hector

Reviewed By: kannan, hector

Subscribers: ybase, bharat

Differential Revision: https://phabricator.dev.yugabyte.com/D6187

yugabyte-ci pushed a commit that referenced this issue Feb 19, 2019

[#862] Upgrade to gperftools 2.7 to pick up tcmalloc bug fixes
Summary:
Upgrade to gperftools 2.7 to pick up tcmalloc bug fixes.

The full concatenation of release notes between gperftools 2.2.1 (exclusive) and gperftools 2.7 (inclusive) is available here:
<doc-link>

It is not clear which bug could actually be causing #862,
but it seems very similar to the bug described at gperftools/gperftools#976,
which got fixed by upgrading gperftools from 2.2 to 2.6.90.

Also fixing the behavior of automatically adding a third-party archive checksum (`--add_checksum`) in
in `thirdparty/build_thirdparty.sh`.

Test Plan: Jenkins

Reviewers: kannan, hector, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6188
@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

I was able to reproduce the issue on the build after upgrading gperftools to 2.7, but now we have more stats:

E0221 09:17:13.444824 30711 total_mem_watcher.cc:82] Memory usage exceeded configured limit, terminating the process: Non-shared RSS (4355428352 bytes, 4153 MiB) has exceeded the configured limit (4294967296 bytes, 4096 MiB)
Details:
------------------------------------------------
MALLOC:      139826960 (  133.3 MiB) Bytes in use by application
MALLOC: +       565248 (    0.5 MiB) Bytes in page heap freelist
MALLOC: +       927696 (    0.9 MiB) Bytes in central cache freelist
MALLOC: +      1988608 (    1.9 MiB) Bytes in transfer cache freelist
MALLOC: +      8735008 (    8.3 MiB) Bytes in thread cache freelists
MALLOC: +      2883584 (    2.8 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    154927104 (  147.8 MiB) Actual memory used (physical + swap)
MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    154927104 (  147.8 MiB) Virtual address space used
MALLOC:
MALLOC:           1880              Spans in use
MALLOC:             61              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

---------------------
OS-level memory stats
---------------------
VmSize:             5674741760 bytes (5411 MiB)
VmRSS:              4466307072 bytes (4259 MiB)
RssFile + RssShmem: 110878720 bytes (105 MiB)
Text (code):        53248 bytes (0 MiB)
Data + stack:       4975906816 bytes (4745 MiB)```
@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

Tried to cut out backtrace lib and made GetStackTrace to return an empty string. For the last 4 hours, the issue is no longer reproducible so far on this custom build. Previously it took 2-3 hours to reproduce on my setup.

@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

Found ticket related to memory issues with libbacktrace - ianlancetaylor/libbacktrace#13. Will investigate in this direction further and also will try to use http://valgrind.org

@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2019

According to valgrind memory usage report - the most memory is allocated by backtrace_alloc:

  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 89  5,689,038,549    5,657,395,200    5,657,395,200             0            0
100.00% (5,657,395,200B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->95.42% (5,398,278,144B) 0xBDFE2EB: mmap (in /home/yugabyte/lib/yb-thirdparty/libtcmalloc.so.4.5.3)
| ->75.71% (4,283,052,032B) 0xDE6AF3D: backtrace_alloc (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | ->61.83% (3,497,799,680B) 0xDE67006: read_function_entry (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | ->47.00% (2,658,967,552B) 0xDE6717B: read_function_entry (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | ->34.45% (1,948,717,056B) 0xDE6717B: read_function_entry (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | ->31.76% (1,796,767,744B) 0xDE6717B: read_function_entry (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | | ->23.14% (1,309,134,848B) 0xDE6879E: dwarf_lookup_pc (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | | | ->23.14% (1,309,134,848B) 0xDE68A7D: dwarf_fileline (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | | |   ->20.58% (1,164,136,448B) 0xDE69BC5: unwind (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | | |   | ->20.58% (1,164,136,448B) 0x101F95D7: ??? (in /home/yugabyte/linuxbrew/lib/libgcc_s.so.1)
| | | | | | |   |   ->20.58% (1,164,136,448B) 0xDE69C66: backtrace_full (in /home/yugabyte/lib/yb-thirdparty/libbacktrace.so.0.0.0)
| | | | | | |   |     ->20.58% (1,164,136,448B) 0xBA3E5E9: yb::GetStackTrace(yb::StackTraceLineFormat, int) (debug-util.cc:537)
| | | | | | |   |       ->20.58% (1,164,136,448B) 0x4D09B73: yb::master::CatalogManager::ScopedLeaderSharedLock::Unlock() (catalog_manager.cc:6895)
| | | | | | |   |         ->20.58% (1,164,136,448B) 0x4D553BF: yb::master::CatalogManager::ScopedLeaderSharedLock::~ScopedLeaderSharedLock() (catalog_manager.h:719)
| | | | | | |   |           ->20.58% (1,164,136,448B) 0x4D9861B: yb::master::MasterServiceImpl::GetMasterRegistration(yb::master::GetMasterRegistrationRequestPB const*, yb::master
::GetMasterRegistrationResponsePB*, yb::rpc::RpcContext) (master_service.cc:477)
| | | | | | |   |             ->20.58% (1,164,136,448B) 0x9927F00: yb::master::MasterServiceIf::Handle(std::shared_ptr<yb::rpc::InboundCall>) (master.service.cc:885)
| | | | | | |   |               ->20.58% (1,164,136,448B) 0x9CCD9A8: yb::rpc::ServicePoolImpl::Handle(std::shared_ptr<yb::rpc::InboundCall>) (service_pool.cc:214)
| | | | | | |   |                 ->20.58% (1,164,136,448B) 0x9CCBF18: yb::rpc::TasksPool<yb::rpc::(anonymous namespace)::InboundCallTask>::WrappedTask::Run() (service_pool.cc:252
)
| | | | | | |   |                   ->20.58% (1,164,136,448B) 0x9CD3F4C: yb::rpc::(anonymous namespace)::Worker::Execute() (thread_pool.cc:101)
| | | | | | |   |                     ->20.58% (1,164,136,448B) 0xBAEFBF4: yb::Thread::SuperviseThread(void*) (functional:2267)
| | | | | | |   |                       ->20.58% (1,164,136,448B) 0x52BF692: start_thread (pthread_create.c:333)
| | | | | | |   |                         ->20.58% (1,164,136,448B) 0x104E841B: clone (clone.S:109
@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2019

This is often reproducible and looks like a bug in libbacktrace, which is not reproducible if we use libbacktrace in one thread under lock, but libbacktrace might have other memory issues. We will use google::glog_internal_namespace_::DumpStackTraceToString instead.

yugabyte-ci pushed a commit that referenced this issue Feb 25, 2019

#862: ENG-4728: Use libbacktrace only in debug builds and in single-t…
…hreaded mode to avoid high memory consumption

Summary: When using `libbacktrace` concurrently from multiple threads it can start allocating a lot of memory and cause out of memory issue. And even in a single-threaded mode, there are some questions regarding memory leaks and high memory consumption. Instead, for release builds, we use google::glog_internal_namespace_::DumpStackTraceToString(), which is similar, but doesn't print line numbers.

Test Plan:
Run Jepsen testing in a loop with following command:
```lein run test -o debian --url "$URL" \
-w set --concurrency 20  -w single-key-acid --time-limit 600 \
--nemesis partition-half,partition-one,partiti$on-ring```
Use `--v=3` for `yb-master` process to cause stack traces printing on catalog manager long locks.

Reviewers: mikhail, kannan, sergei

Reviewed By: kannan, sergei

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6229

@ttyusupov ttyusupov closed this Feb 25, 2019

YBase features automation moved this from To Do to Done Feb 25, 2019

Jepsen Testing automation moved this from To do to Done Feb 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.