TServer memory monitor could never pickup old inactive tablet to flush #1672

ttyusupov · 2019-07-02T10:31:01Z

TSTabletManager has a global memory monitor which runs flushes on tablets when rocksdb memory usage exceeds global_memstore_size_mb_max (or global_memstore_size_percentage in case global_memstore_size_mb_max is not set).
The tablet to flush in this case is selected based on the oldest write to memstore. For this reason, each tablet has a TabletFlushStatsinstance which is monitoring writes and flushes and tracking oldest_write_in_memstore. TabletFlushStats::OnFlushScheduled resets oldest_write_in_memstoreto max value, this was supposed to prevent empty tablets from participating in selection for flush. But in reality, TabletFlushStats::OnFlushScheduled is called from DBImpl::SchedulePendingFlush even if there is no pending flush to schedule. In its turn, DBImpl::SchedulePendingFlush could be called after compaction, rocksdb opening, deleting obsolete SST files. This will reset oldest_write_in_memstore to max value for the tablet and if there are no further writes to tablet - the tablet will never be picked for flush by tserver memory monitor.

The text was updated successfully, but these errors were encountered:

Summary: `TSTabletManager` has a global memory monitor which runs flushes on tablets when rocksdb memory usage exceeds `global_memstore_size_mb_max` (or `global_memstore_size_percentage` in case `global_memstore_size_mb_max` is not set). The tablet to flush in this case is selected based on the oldest write to memstore. For this reason, each tablet has a `TabletFlushStats`instance which is monitoring writes and flushes and tracking `oldest_write_in_memstore`. `TabletFlushStats::OnFlushScheduled` resets `oldest_write_in_memstore`to max value, this was supposed to prevent empty tablets from participating in selection for flush. But in reality, `TabletFlushStats::OnFlushScheduled` is called from `DBImpl::SchedulePendingFlush` even if there is no pending flush to schedule. In its turn, `DBImpl::SchedulePendingFlush` could be called after compaction, rocksdb opening, deleting obsolete SST files. This will reset `oldest_write_in_memstore` to max value for the tablet and if there are no further writes to tablet - the tablet will never be picked for flush by tserver memory monitor. The fix is to use mem table frontiers to get oldest hybrid time written in memtable and do flush based on that. Test Plan: Jenkins Reviewers: amitanand, mikhail, sergei Reviewed By: sergei Subscribers: kannan, bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6846

ttyusupov self-assigned this Jul 2, 2019

ttyusupov added this to To Do in YBase features via automation Jul 2, 2019

rkarthik007 added area/docdb YugabyteDB core features kind/bug This issue is a bug labels Jul 2, 2019

ttyusupov closed this as completed Jul 4, 2019

YBase features automation moved this from To Do to Done Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TServer memory monitor could never pickup old inactive tablet to flush #1672

TServer memory monitor could never pickup old inactive tablet to flush #1672

ttyusupov commented Jul 2, 2019 •

edited by kmuthukk

TServer memory monitor could never pickup old inactive tablet to flush #1672

TServer memory monitor could never pickup old inactive tablet to flush #1672

Comments

ttyusupov commented Jul 2, 2019 • edited by kmuthukk

ttyusupov commented Jul 2, 2019 •

edited by kmuthukk