Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop 2.0 merge. #658

Merged
merged 14 commits into from
Feb 24, 2015
Merged

Develop 2.0 merge. #658

merged 14 commits into from
Feb 24, 2015

Conversation

cmeiklejohn
Copy link
Contributor

Develop 2.0 merge. Required 2.0/develop merged riak_kv. Do not merge until complete.

macintux and others added 14 commits August 19, 2014 10:30
…t were unintentionally attributed to the first function
Various fixes for edoc

Reviewed-by: reiddraper
Problem: transient failures of aae, such as trees not yet built or locks not
being aquired, would cause an aae fullsync process to exit abnormally. This
could happen several times in a row, creating log spam.

Resolution: the concept of soft_exit. A soft_exit is a message sent from a soon
to be exiting process to a soft_linked process. The exiting process would then
exit normally, while any soft_linked processes could handle the soft_exit
message in a similar fashion as an exit message. This would indicate an exit
reason that should be handled, but not bad enough to have the system logger
know about it.

The soft_exit message sent from the aae worker to the fscoordinator is
as simple as `{soft_exit, pid(), term()}'.

The current implementation is not generic. There can only one soft_link to
the aae, and there's no general mechanism to use soft_link's or soft_exits
elsewhere in the code base. Sorry.

Another change rolled into this is consistent use of a #partition_info record
in the fscoordinator, and error tracking the fscoordinator's state. By swapping
to useing a single data structure in the partition queue, whereis waiting list,
and purgatory queues it makes it easier to understand the fscordinator (as
there is less code modify structures).

This is a forward port of the fix done for 1.4. Conflicts favor existing code
where it does not directly effect the fix.

Conflicts:
	Makefile
	rebar.config
	src/riak_repl2_fssource.erl
	src/riak_repl2_rtq_proxy.erl
	src/riak_repl_aae_source.erl
	test/riak_core_cluster_mgr_tests.erl
Increment_error_dict expects the partition, elementN of error dict, and the
state. It pulls the dict out of the state so it put it back in place, thus just
returning the state. So this call that passed the dict in was wrong.
When a partition is not available, perhaps after a number of retries,
the error exits stat should be incremented. Also, the retry exits stat
should be incremented on each retry.  This was discovered when
backporting the repl_location_failures riak_test.
The one in riak_repl2_fssource is a legit bug in the code
…nsient-aae-fs-failures

Implement soft_exit, primarily for aae_fullsyn.

Reviewed-by: engelsanchez
Conflicts:
	dialyzer.ignore-warnings
	rebar.config
Conflicts:
	src/riak_repl2_fscoordinator.erl
@seancribbs
Copy link

👍 8c6b159

borshop added a commit that referenced this pull request Feb 24, 2015
Develop 2.0 merge.

Reviewed-by: seancribbs
@cmeiklejohn
Copy link
Contributor Author

@borshop merge

@borshop borshop merged commit 8c6b159 into develop Feb 24, 2015
@cmeiklejohn cmeiklejohn deleted the develop-2.0-merge branch February 24, 2015 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants