handoff of fallback data can be delayed or never performed #154

Closed
rzezeski opened this Issue Mar 21, 2012 · 1 comment

Comments

Projects
None yet
3 participants
Contributor

rzezeski commented Mar 21, 2012

Summary

During startup, a node with fallback data on it's secondary partitions
will not start those secondary partitions. This means that nodes with
fallback data may delay or even never handoff the fallback data.
Depending on the ring and preflist this means that replicas may exist
on the same node for a longer period of time than required.

Delayed Handoff

In the case where the home node of the fallback partitions is still
down then read/write load will cause the secondaries to spin back up
and handoff will commence when the home node comes alive.

Missed Handoff

In the case where the home node of the fallbacks partitions is up when
the node(s) with the fallback partitions come up then the secondaries
will not be started by read/write load since all the primaries are
up. This means that, depending on the ring and preflist, 2 replicas
could exist on the same node until read repair or a write occurs.
That is, since the secondary partitions with fallback data are never
started handoff is missed and read repair + new writes must account
for all of the fallback data that was written while the home node
was down.

Steps to Reproduce

  1. create devrel - make devrel, for d in dev/dev*; do $d/bin/riak start; done
  2. join nodes - for d in dev/dev*; do $d/bin/riak-admin join dev1@127.0.0.1; done
  3. wait for everything to settle (members + transfers)
  4. stop dev4 - ./dev/dev4/bin/riak stop
  5. insert data - for i in $(seq 1 100); do curl -X PUT -H 'content-type: text/plain' http://localhost:8091/riak/test/$i -d "hello"; done
  6. check data - for i in $(seq 1 100); do curl http://localhost:8091/riak/test/$i 2> /dev/null && echo; done | wc -l
  7. stop dev1-dev3 for d in dev/dev{1,2,3}; do $d/bin/riak stop; done
  8. start dev4 - ./dev/dev4/bin/riak start
  9. verify keys don't exist - for i in $(seq 1 100); do curl http://localhost:8094/riak/test/$i 2> /dev/null && echo; done | less
  10. start dev1-dev3 - for d in dev/dev*; do $d/bin/riak start; done
  11. wait a little, check transfers - ./dev/dev1/bin/riak-admin transfers

After step 11 there will be some transfers listed but after a while
they will complete. Those transfers are for stuff other than the
fallback data. To be sure you can wait until no transfers are
reported, stop dev1 - dev3, and then try to read the keys from
dev4--they won't be there.

  1. stop dev4 - ./dev/dev4/bin/riak stop
  2. read the keys - for i in $(seq 1 100); do curl http://localhost:8091/riak/test/$i 2> /dev/null && echo; done | less
  3. check for transfers - ./dev/dev1/bin/riak-admin transfers

There should be multiple partitions pending transfer to dev4 now.
This is because the read caused the secondary vnodes to be read from
preflist and spun up.

jtuple was assigned Apr 3, 2012

@jtuple jtuple added a commit that referenced this issue Apr 4, 2012

@jtuple jtuple Change vnode manager to periodically start never-before-started vnodes
Ensures vnodes responsible for fallback data are eventually started.
Resolves issue #154
a724e7f

jonmeredith was assigned Apr 5, 2012

Contributor

jtuple commented Apr 6, 2012

The issue here is that after changes to how Riak starts up in version 1.1, vnode that held fallback data were never started. This has been addressed by the above pull-request, which ensures that all vnodes are eventually started after a reboot.

Fix merged in, and will be available in Riak 1.1.2.

jtuple closed this Apr 6, 2012

jtuple was assigned Apr 6, 2012

@Licenser Licenser pushed a commit to Licenser/riak_test_core that referenced this issue Apr 7, 2013

@jtuple jtuple Add test for issue basho/riak_core#154 1e06fc7

jtuple was unassigned by ooshlablu Apr 25, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment