Browse files

Snapshot streaming:

Typically a follower would ask a leader to send a snapshot. There's a
rare edge that we witnessed where the leader would switch after asking
the follower to get a snapshot. In that case, if the follower sends a
snapshot request to another follower, Dgraph ends up in an infinitely
failing loop.

We skip this loop by not checking if the receiver of the request is a
leader. Any follower should be able to service the request, once it is
past the read timestamp. That's what this PR does.
  • Loading branch information...
manishrjain committed Jan 14, 2019
1 parent 2a68d54 commit c736b19f6b4502688580327d076271c6a27f593e
Showing with 9 additions and 6 deletions.
  1. +9 −6 worker/snapshot.go
@@ -26,7 +26,6 @@ import (

const (
@@ -108,6 +107,15 @@ func doStreamSnapshot(snap *pb.Snapshot, out pb.Worker_StreamSnapshotServer) err
// might be OK. Otherwise, we'd want to version the schemas as well. Currently, they're stored
// at timestamp=1.

// We no longer check if this node is the leader, because the leader can switch between snapshot
// requests. Therefore, we wait until this node has reached snap.ReadTs, before servicing the
// request. Any other node in the group should have the same data as the leader, once it is past
// the read timestamp.
glog.Infof("Waiting to reach timestamp: %d", snap.ReadTs)
if err := posting.Oracle().WaitForTs(out.Context(), snap.ReadTs); err != nil {
return err

var num int
stream := pstore.NewStreamAt(snap.ReadTs)
stream.LogPrefix = "Sending Snapshot"
@@ -150,11 +158,6 @@ func (w *grpcWorker) StreamSnapshot(stream pb.Worker_StreamSnapshotServer) error
atomic.AddInt32(&n.streaming, 1)
defer atomic.AddInt32(&n.streaming, -1)

if !x.IsTestRun() {
if !n.AmLeader() {
return errNotLeader
snap, err := stream.Recv()
if err != nil {
// If we don't even receive a request (here or if no StreamSnapshot is called), we can't

0 comments on commit c736b19

Please sign in to comment.