Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hinted handoff relies upon the VNode entering "inactive" state. #715

Open
emauton opened this issue Feb 26, 2015 · 2 comments
Open

Hinted handoff relies upon the VNode entering "inactive" state. #715

emauton opened this issue Feb 26, 2015 · 2 comments

Comments

@emauton
Copy link

emauton commented Feb 26, 2015

The riak_core_vnode carries an inactivity timeout reset each time it handles a message:
https://github.com/basho/riak_core/blob/develop/src/riak_core_vnode.erl#L277

The timeout is necessary for handoffs of the "hinted" type to occur, via coordination of the VNode:
https://github.com/basho/riak_core/blob/develop/src/riak_core_vnode.erl#L463

... and the VNode manager (using maybe_trigger_handoff):
https://github.com/basho/riak_core/blob/develop/src/riak_core_vnode_manager.erl#L468

The upshot is that if you are sending periodic messages to your application's VNode (with a period less than the inactivity_timeout), hinted handoff never happens.

Sending regular messages is a reasonable thing to do - in my case, I had been periodically exporting statistics that way - and it took a lot of tracing down to understand that the VNode's never becoming inactive was the reason hinted handoff was not happening.

Since riak_core_vnode_manager:maybe_trigger_handoff/4 appears not to be an expensive operation (at least by my reading of the code), I think there's no particularly good reason to rely on inactivity here - I believe this could just be tried regularly in the manager's management_tick loop.

What do you think?

@jonmeredith
Copy link
Contributor

Using the vnode manager is my preferred way of handling it. Periodically the vnode mgr should ping the vnode (using a proxy process so it doesn't block) and ask how many requests it has handled - if the vnode mgr sees that it is inactive and a fallback, it can trigger handoff/shutdown.

Ideally we'd remove any circular dependencies between the vnode and the vnode mgr as it would be very helpful for constructing unit/integration tests if instances of vnodes could be spun up in tests, combined with the Pid form of preflists so that requests can be sent to the vnodes.

@DeadZen
Copy link
Contributor

DeadZen commented Feb 26, 2015

The underlying question I think is, could handoffs get stalled if a vnode is never considered idle, is a management timer alone enough to determine this reliably, or is perhaps a timer + max # of requests more appropriate or a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants