Skip to content

MB-60971: Avoiding work on persister side on no-op notifs from merger#2006

Merged
abhinavdangeti merged 5 commits intomasterfrom
noOpMerge
Apr 4, 2024
Merged

MB-60971: Avoiding work on persister side on no-op notifs from merger#2006
abhinavdangeti merged 5 commits intomasterfrom
noOpMerge

Conversation

@abhinavdangeti
Copy link
Copy Markdown
Member

@abhinavdangeti abhinavdangeti commented Mar 27, 2024

Authored-by: @Thejas-bhat
Original: #2003

  • There are chances that the merger doesn't see any eligible segments to
    be merged in the current iteration which causes the tasks list to be
    empty. In this situation, merger which didn't update the root snapshot
    would notify the persister.
  • Now, if the persister was napping at this point of time, and assuming
    there were mutations coming into the system (so the root snapshot would
    be updated by the introducer) would lead to the persister to be awoken
    and start flushing out the in-memory segments to disk.
  • Perhaps a better behaviour would be to let the persister nap for the remaining
    duration such and then let the persister do some work. This would also help in merger
    waiting for the notification reply (which is like an interrupt fashion type of wait)
    rather than doing something more expensive of letting merger continue to do work
    (which the earlier commits of this PR was doing).

Some numbers on local testing (~4.18M dataset with lorem ipsum content)

With patch

"num_bytes_used_ram": 224100280,
"num_files_on_disk": 34,
"test_bucket:test_bucket._default.travel:num_file_merge_ops": 6,
"test_bucket:test_bucket._default.travel:num_file_merge_plan": 0,
"test_bucket:test_bucket._default.travel:num_files_on_disk": 34,
"test_bucket:test_bucket._default.travel:num_mem_merge_ops": 70,
"test_bucket:test_bucket._default.travel:num_persister_nap_merger_break": 4,
"test_bucket:test_bucket._default.travel:num_persister_nap_pause_completed": 69,
"test_bucket:test_bucket._default.travel:num_root_filesegments": 16,
"test_bucket:test_bucket._default.travel:num_root_memorysegments": 0,

"TotFileMergePlan": 45,
"TotFileMergePlanErr": 0,
"TotFileMergePlanNone": 1,
"TotFileMergePlanOk": 44,

Without patch

"num_bytes_used_ram": 252726152,
"num_files_on_disk": 45,
"test_bucket:test_bucket._default.travel:num_file_merge_ops": 11,
"test_bucket:test_bucket._default.travel:num_file_merge_plan": 0,
"test_bucket:test_bucket._default.travel:num_files_on_disk": 45,
"test_bucket:test_bucket._default.travel:num_mem_merge_ops": 129,
"test_bucket:test_bucket._default.travel:num_persister_nap_merger_break": 85,
"test_bucket:test_bucket._default.travel:num_persister_nap_pause_completed": 44,
"test_bucket:test_bucket._default.travel:num_root_filesegments": 33,
"test_bucket:test_bucket._default.travel:num_root_memorysegments": 0,

"TotFileMergePlan": 96,
"TotFileMergePlanErr": 0,
"TotFileMergePlanNone": 1,
"TotFileMergePlanOk": 95,

Comment thread index/scorch/persister.go Outdated
Comment thread index/scorch/persister.go Outdated
@Thejas-bhat Thejas-bhat changed the title MB-60971: Avoiding unnecessary persister notifs from merger MB-60971: Avoiding work on persister side on no-op notifs from merger Apr 2, 2024
Comment thread index/scorch/persister.go

case ew := <-s.persisterNotifier:
s.rootLock.RLock()
currRootEpoch := s.root.epoch
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thejas-bhat Shouldn't we be checking the epoch from the persisterNotifier epochWatcher instead or somewhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why we're comparing this epoch here, but is there any possibility of a race here -

  1. the merger updates the root epoch, sets up an epoch watcher
  2. the persister then records the lastRootEpoch in line 271 above
  3. the comparison in line 289 below passes because no other change since, and the persister sleeps longer than it should have.

Copy link
Copy Markdown
Member

@Thejas-bhat Thejas-bhat Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what i understand that shouldn't happen, this is because

  1. when the merger has some changes to introduce, the introducer will get the message of the newly merged segments to be introduced. the introducer now updates the root epoch after acquiring a lock. after this, the merger notifies the persister. so the updating of the rootEpoch followed by notifying the persister is happening in a sequential manner.
  2. the lastRootEpoch value at 271 is still the pre-merged snapshot and now at 289 when we get the currRootEpoch at that instance using the read lock, the currRootEpoch value would be the new root epoch - so the check would fail and the persister continues to do the work (instead of sleeping)

@abhinavdangeti
Copy link
Copy Markdown
Member Author

Ok, let's go ahead with this.

@abhinavdangeti abhinavdangeti merged commit 2d81bf0 into master Apr 4, 2024
@abhinavdangeti abhinavdangeti deleted the noOpMerge branch April 4, 2024 15:22
abhinavdangeti added a commit that referenced this pull request Apr 8, 2024
abhinavdangeti added a commit that referenced this pull request Apr 8, 2024
@abhinavdangeti abhinavdangeti restored the noOpMerge branch April 8, 2024 23:52
abhinavdangeti added a commit that referenced this pull request Apr 9, 2024
…m merger" (#2010)

This reverts commit 2d81bf0 (
#2006 ) on account of the
regression highlighted with MB-61447.
@CascadingRadium CascadingRadium deleted the noOpMerge branch April 22, 2024 08:14
@CascadingRadium CascadingRadium restored the noOpMerge branch April 22, 2024 08:17
@CascadingRadium CascadingRadium deleted the noOpMerge branch June 13, 2024 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants