New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make couch_peruser a proper Erlang app #756

Merged
merged 25 commits into from Oct 11, 2017

Conversation

Projects
None yet
5 participants
@chewbranca
Contributor

chewbranca commented Aug 17, 2017

Overview

The couch_peruser app depends on couch having been started, yet it's also set as one of the [daemons] in default.ini, resulting in couch_peruser getting started prior to all of couch having started up. This explodes when calling out to mem3 prior to the mem3 application having been started, which causes various ETS issues as the tables do not exist yet.

This PR removes couch_peruser from [daemons] and adds in the necessary application and supervision logic to bootstrap this app appropriately.

Testing recommendations

GitHub issue number

Fixes #749

Related Pull Requests

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
@chewbranca

This comment has been minimized.

Show comment
Hide comment
@chewbranca

chewbranca Aug 17, 2017

Contributor

Well the changes in 863e8fd get the test suite a bit further, but for some reason it appears that

delete_user(AuthDb, Name) ->
Url = lists:concat([get_cluster_base_url(), "/", ?b2l(AuthDb),
"/org.couchdb.user:", Name]),
{ok, 200, _, Body} = do_request(get, Url),
{DocProps} = jiffy:decode(Body),
Rev = proplists:get_value(<<"_rev">>, DocProps),
{ok, 200, _, _} = do_request(delete, Url ++ "?rev=" ++ ?b2l(Rev)),
% let's proceed after giving couch_peruser some time to delete the user db
timer:sleep(2000).
results in the couch_peruser pid disappearing.

Contributor

chewbranca commented Aug 17, 2017

Well the changes in 863e8fd get the test suite a bit further, but for some reason it appears that

delete_user(AuthDb, Name) ->
Url = lists:concat([get_cluster_base_url(), "/", ?b2l(AuthDb),
"/org.couchdb.user:", Name]),
{ok, 200, _, Body} = do_request(get, Url),
{DocProps} = jiffy:decode(Body),
Rev = proplists:get_value(<<"_rev">>, DocProps),
{ok, 200, _, _} = do_request(delete, Url ++ "?rev=" ++ ?b2l(Rev)),
% let's proceed after giving couch_peruser some time to delete the user db
timer:sleep(2000).
results in the couch_peruser pid disappearing.

@wohali

This comment has been minimized.

Show comment
Hide comment
@wohali

wohali Aug 23, 2017

Member

@chewbranca are we going to keep this PR open until the tests can be fixed, or does couch_peruser need a complete rework?

Member

wohali commented Aug 23, 2017

@chewbranca are we going to keep this PR open until the tests can be fixed, or does couch_peruser need a complete rework?

@arcadius

This comment has been minimized.

Show comment
Hide comment
@arcadius

arcadius Sep 7, 2017

Hello @chewbranca .
Anything trivial that I can help with to get this moving?
Thanks

arcadius commented Sep 7, 2017

Hello @chewbranca .
Anything trivial that I can help with to get this moving?
Thanks

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 1, 2017

Member

it appears that couchdb/src/couch_peruser/test/couch_peruser_test.erl#L98-106 results in the couch_peruser pid disappearing.

@chewbranca how to reproduce this?

make eunit suites=couch_peruser works fine locally, and Travis seems to agree, the failing build fails on unrelated mango tests.

Member

janl commented Oct 1, 2017

it appears that couchdb/src/couch_peruser/test/couch_peruser_test.erl#L98-106 results in the couch_peruser pid disappearing.

@chewbranca how to reproduce this?

make eunit suites=couch_peruser works fine locally, and Travis seems to agree, the failing build fails on unrelated mango tests.

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 1, 2017

Member

the failing build fails on unrelated mango tests.

nevertheless, I pushed a commit that makes the mango test runner more reliable on slow build nodes.

Member

janl commented Oct 1, 2017

the failing build fails on unrelated mango tests.

nevertheless, I pushed a commit that makes the mango test runner more reliable on slow build nodes.

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 5, 2017

Member

@chewbranca on IRC:

[19:36:46] jan____: Wohali: those couch per user changes I made weren't sufficient to fix the things btw, hopefully that's not getting merged in its current form. It needs to have some type of singleton thing like replication pids, otherwise you'll end up creating the user database N times which will result in conflicts on the dbs db which is no bueno

Member

janl commented Oct 5, 2017

@chewbranca on IRC:

[19:36:46] jan____: Wohali: those couch per user changes I made weren't sufficient to fix the things btw, hopefully that's not getting merged in its current form. It needs to have some type of singleton thing like replication pids, otherwise you'll end up creating the user database N times which will result in conflicts on the dbs db which is no bueno

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 6, 2017

Member

More @chewbranca on IRC:

jan____: Wohali: as far as I'm aware, the only place in the code that does anything along the lines of a singleton/leader election type thing is the couch_replicator logic to ensure only one replication per replicator doc is active. Perhaps rnewson/nick have some thoughts on generalizing that logic to a reusable behavior or some such

Member

janl commented Oct 6, 2017

More @chewbranca on IRC:

jan____: Wohali: as far as I'm aware, the only place in the code that does anything along the lines of a singleton/leader election type thing is the couch_replicator logic to ensure only one replication per replicator doc is active. Perhaps rnewson/nick have some thoughts on generalizing that logic to a reusable behavior or some such

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 6, 2017

Member

Bit of help from @nickva

[21:25:20] the logic is mostly isolated to https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl
[21:27:26] there are 2 main bits to consider : how to distribute the things across the nodes (see the owner functions) and what to do if node configuration changes (nodes added or removed)
[21:29:39] the insight with the first point (how to distribute things) is that it's an algorithm that runs on all the nodes concurrently, and the result on all nodes should all agree who the owner is (given they have the same list of connected nodes)
[21:30:34] so node1 sees the thing and says, "not mine, node2 owns, I'll ignore it", node3 does the same and node2 sees and say "yap I own, I'll take care of it"
[21:38:19] <+Wohali> not just nodes added/removed, also "node fails or is temporarily gone" right?
[21:39:11] yap
[21:40:56] the list of nodes is an input to the algorithm, that list might include nodes in maintenance or might not, or may only include nodes where the shard of a particular db/docid live

Member

janl commented Oct 6, 2017

Bit of help from @nickva

[21:25:20] the logic is mostly isolated to https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl
[21:27:26] there are 2 main bits to consider : how to distribute the things across the nodes (see the owner functions) and what to do if node configuration changes (nodes added or removed)
[21:29:39] the insight with the first point (how to distribute things) is that it's an algorithm that runs on all the nodes concurrently, and the result on all nodes should all agree who the owner is (given they have the same list of connected nodes)
[21:30:34] so node1 sees the thing and says, "not mine, node2 owns, I'll ignore it", node3 does the same and node2 sees and say "yap I own, I'll take care of it"
[21:38:19] <+Wohali> not just nodes added/removed, also "node fails or is temporarily gone" right?
[21:39:11] yap
[21:40:56] the list of nodes is an input to the algorithm, that list might include nodes in maintenance or might not, or may only include nodes where the shard of a particular db/docid live

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 7, 2017

Member

I’ve done a bit of further research on this code and I think I’ve come up with a decent plan forward. I’m writing it down here for my own benefit, to see if I have all the missing parts, and for @chewbranca and @nickva to +1 the plan.

I’ve identified two independent issues with this module, both stem from it being fundamentally being designed for CouchDB 1.x and the 2.x port being done rather haphazardly in the past (by me).

Problem No. 1: this module gets started on each node in a cluster, each listening to _changes on _users and each trying to create the associated db. This can lead to conflicts on the _dbs db, which isn’t a fun situation as per @chewbranca above.

Problem No. 2: this module doesn’t handle the case of the node it being run on failing and restarting. This is less an issue, if Problem No. 1 isn’t solved yet, as in a typical cluster there are at least two more instances of this module running that could create the database.

But say we fix Problem No. 1 (as outlined above) so that each user creation will only ever result in a single attempt to create the associated database. Then, what should happen if in between picking up the notification to create the database and doing the DB creation, the current node becomes unavailable?

The module currently opens _changes on startup and asserts all user/db creations corresponding to the _users db. That is, if there are 100k users, on startup, the module will try to create 100k databases, even if they already exist.

A solution to this would be to add per-node high-watermark _local/ docs, so we can more efficiently resume, but I’ll keep this as out of scope for this PR.


Solution

The solution to Problem No. 1 as outline by @nickva is to re-use the replicator’s code that makes sure a replication is only run on a single node (and specifically on the node that handles the shard that the _replicator document is part of. All of this is neatly encapsulated in the couch_replicator_clustering module.

That module is structured in a way that it can tell, given a DbName and a DocId, which active node in a cluster should handle a given action (replication by default, but we want user creation). All nodes participating will independently come to the same conclusion, ensuring an operation is only handled once.

I propose to keep the behaviour of running the peruser module on each node in the cluster, but modify it’s changes handler to make use of couch_replicator_clustering’s ability to decide which node should handle an incoming user creation/deletion, since at that point, we do have the same input as the replicator code has.

One concern is the use of couch_replicator_clustering outside of the replicator. We should make this a standalone module, but at the moment it depends on some other replicator infrastructure (couch_replicator_notifier) and I’d say it is out of scope to make things more generic. That probably means we must make sure that peruser is loaded after the replicator.

Member

janl commented Oct 7, 2017

I’ve done a bit of further research on this code and I think I’ve come up with a decent plan forward. I’m writing it down here for my own benefit, to see if I have all the missing parts, and for @chewbranca and @nickva to +1 the plan.

I’ve identified two independent issues with this module, both stem from it being fundamentally being designed for CouchDB 1.x and the 2.x port being done rather haphazardly in the past (by me).

Problem No. 1: this module gets started on each node in a cluster, each listening to _changes on _users and each trying to create the associated db. This can lead to conflicts on the _dbs db, which isn’t a fun situation as per @chewbranca above.

Problem No. 2: this module doesn’t handle the case of the node it being run on failing and restarting. This is less an issue, if Problem No. 1 isn’t solved yet, as in a typical cluster there are at least two more instances of this module running that could create the database.

But say we fix Problem No. 1 (as outlined above) so that each user creation will only ever result in a single attempt to create the associated database. Then, what should happen if in between picking up the notification to create the database and doing the DB creation, the current node becomes unavailable?

The module currently opens _changes on startup and asserts all user/db creations corresponding to the _users db. That is, if there are 100k users, on startup, the module will try to create 100k databases, even if they already exist.

A solution to this would be to add per-node high-watermark _local/ docs, so we can more efficiently resume, but I’ll keep this as out of scope for this PR.


Solution

The solution to Problem No. 1 as outline by @nickva is to re-use the replicator’s code that makes sure a replication is only run on a single node (and specifically on the node that handles the shard that the _replicator document is part of. All of this is neatly encapsulated in the couch_replicator_clustering module.

That module is structured in a way that it can tell, given a DbName and a DocId, which active node in a cluster should handle a given action (replication by default, but we want user creation). All nodes participating will independently come to the same conclusion, ensuring an operation is only handled once.

I propose to keep the behaviour of running the peruser module on each node in the cluster, but modify it’s changes handler to make use of couch_replicator_clustering’s ability to decide which node should handle an incoming user creation/deletion, since at that point, we do have the same input as the replicator code has.

One concern is the use of couch_replicator_clustering outside of the replicator. We should make this a standalone module, but at the moment it depends on some other replicator infrastructure (couch_replicator_notifier) and I’d say it is out of scope to make things more generic. That probably means we must make sure that peruser is loaded after the replicator.

@janl janl referenced this pull request Oct 7, 2017

Open

make peruser resumable #872

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 7, 2017

Member

Latest push solves Problem No. 1. Would appreciate a brief review by @chewbranca and/or @nickva

Member

janl commented Oct 7, 2017

Latest push solves Problem No. 1. Would appreciate a brief review by @chewbranca and/or @nickva

@nickva

This comment has been minimized.

Show comment
Hide comment
@nickva

nickva Oct 7, 2017

Contributor

@janl I think you have the right idea there.

Technically you don't have to use the clustering ownership function but could make a copy of that function, it's pretty small and we maybe want to customize it in the future. Even better (!) if you want we can extract the function and put them in mem3, the would be the cleanest way to do it:

These two:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L198 (would be owner/2 perhaps there).

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L102 (would be owner/3 then in mem3).

So now you don't depend on replicator app.

Then another thing you'd do is to handle the stable / unstable state locally in your app just like replicator clustering does. Keep a boolean flag and updated as you get stable/unstable events.

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L124-L136

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L165-L169

Another reason not to reuse replicator code is that we might want to have different setting for stability checks for per-users than what replicator uses. Replicator uses a minute, we might want to have something less than that.

Here is how that configuration is passed to the stability monitoring gen_server:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L143-L149

Note that's a gen_server that's linked against your gen-server and would be a separate instance of it running in couch_replicator, rexi and your code:

(rexi example as well: https://github.com/apache/couchdb/blob/master/src/rexi/src/rexi_server_mon.erl#L69-L70)

Also @chewbranca gets the credit for the idea of re-using replicator code, that was a good insight and should solve the problem nicely.

Contributor

nickva commented Oct 7, 2017

@janl I think you have the right idea there.

Technically you don't have to use the clustering ownership function but could make a copy of that function, it's pretty small and we maybe want to customize it in the future. Even better (!) if you want we can extract the function and put them in mem3, the would be the cleanest way to do it:

These two:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L198 (would be owner/2 perhaps there).

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L102 (would be owner/3 then in mem3).

So now you don't depend on replicator app.

Then another thing you'd do is to handle the stable / unstable state locally in your app just like replicator clustering does. Keep a boolean flag and updated as you get stable/unstable events.

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L124-L136

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L165-L169

Another reason not to reuse replicator code is that we might want to have different setting for stability checks for per-users than what replicator uses. Replicator uses a minute, we might want to have something less than that.

Here is how that configuration is passed to the stability monitoring gen_server:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_clustering.erl#L143-L149

Note that's a gen_server that's linked against your gen-server and would be a separate instance of it running in couch_replicator, rexi and your code:

(rexi example as well: https://github.com/apache/couchdb/blob/master/src/rexi/src/rexi_server_mon.erl#L69-L70)

Also @chewbranca gets the credit for the idea of re-using replicator code, that was a good insight and should solve the problem nicely.

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 7, 2017

Member

@nickva excellent comments, I’ll get that sorted :)

Member

janl commented Oct 7, 2017

@nickva excellent comments, I’ll get that sorted :)

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 7, 2017

Member

@nickva I think I found the reason owner/2 lives in couch_replication_clustering for now: it depends on is_stable() which depends on the gen_server’s state. I wouldn’t feel comfortable moving that over to mem3.

Member

janl commented Oct 7, 2017

@nickva I think I found the reason owner/2 lives in couch_replication_clustering for now: it depends on is_stable() which depends on the gen_server’s state. I wouldn’t feel comfortable moving that over to mem3.

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 8, 2017

Member

@nickva latest push has peruser untangled from couch_replicator_clustering and owner/3 moved into mem3.

I occasionally see this in the logs, which is somewhat concerning as we don’t want databases with lax security, but haven’t found the source of this yet.

{{badmatch,false},[{couch_peruser,ensure_security,3,[{file,"src/couch_peruser.erl"},{line,248}]},{couch_peruser,changes_handler,3,[{file,"src/couch_peruser.erl"},{line,136}]},{couch_changes,changes_enumerator,2,[{file,"src/couch_changes.erl"},{line,784}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,848}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,222}]},{couch_db,changes_since,5,[{file,"src/couch_db.erl"},{line,1400}]},{couch_changes,keep_sending_changes,3,[{file,"src/couch_changes.erl"},{line,637}]},{couch_changes,'-handle_changes/4-fun-3-',8,[{file,"src/couch_changes.erl"},{line,144}]}]}
Member

janl commented Oct 8, 2017

@nickva latest push has peruser untangled from couch_replicator_clustering and owner/3 moved into mem3.

I occasionally see this in the logs, which is somewhat concerning as we don’t want databases with lax security, but haven’t found the source of this yet.

{{badmatch,false},[{couch_peruser,ensure_security,3,[{file,"src/couch_peruser.erl"},{line,248}]},{couch_peruser,changes_handler,3,[{file,"src/couch_peruser.erl"},{line,136}]},{couch_changes,changes_enumerator,2,[{file,"src/couch_changes.erl"},{line,784}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,848}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,222}]},{couch_db,changes_since,5,[{file,"src/couch_db.erl"},{line,1400}]},{couch_changes,keep_sending_changes,3,[{file,"src/couch_changes.erl"},{line,637}]},{couch_changes,'-handle_changes/4-fun-3-',8,[{file,"src/couch_changes.erl"},{line,144}]}]}
@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 8, 2017

Member

The one additional thing I want to do before getting this merged is handling the case when there are multiple shards of the _users db on the same node. That results in two or more _changes followers for those shards to initiate the db create/delete sequence concurrently.

My idea so far here is to use a similar trick to the consistent node hashing: hashing the chance notification sans shard name against the listener pid. That should ensure only one create/delete sequence to be initiated.

This also might help with the ensure_security issue above.

Member

janl commented Oct 8, 2017

The one additional thing I want to do before getting this merged is handling the case when there are multiple shards of the _users db on the same node. That results in two or more _changes followers for those shards to initiate the db create/delete sequence concurrently.

My idea so far here is to use a similar trick to the consistent node hashing: hashing the chance notification sans shard name against the listener pid. That should ensure only one create/delete sequence to be initiated.

This also might help with the ensure_security issue above.

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 8, 2017

Member

Okay that was a nice excursion in implementing singletons, but I found out the issue was different after all: there was a duplicate call to start_listening() in there, that I’ve eliminated in the next push. Haven’t seen the security issue since, so assuming that’s fixed 😇 .

Member

janl commented Oct 8, 2017

Okay that was a nice excursion in implementing singletons, but I found out the issue was different after all: there was a duplicate call to start_listening() in there, that I’ve eliminated in the next push. Haven’t seen the security issue since, so assuming that’s fixed 😇 .

@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 8, 2017

Member

tests pass, rebased with master, if I can get a +1 here, I’ll squash & merge.

Member

janl commented Oct 8, 2017

tests pass, rebased with master, if I can get a +1 here, I’ll squash & merge.

@wohali

This comment has been minimized.

Show comment
Hide comment
@wohali

wohali Oct 8, 2017

Member

One test failure on one platform in Travis. Normally I'd say "re-run it" but this is a brand new type of failure I've never seen before in mem3...

mem3_sync_security_test: go_test (module 'mem3_sync_security_test')...*timed out*
Member

wohali commented Oct 8, 2017

One test failure on one platform in Travis. Normally I'd say "re-run it" but this is a brand new type of failure I've never seen before in mem3...

mem3_sync_security_test: go_test (module 'mem3_sync_security_test')...*timed out*
@janl

This comment has been minimized.

Show comment
Hide comment
@janl

janl Oct 8, 2017

Member

Good call @wohali I touched code in mem3, we must make sure I didn't break anything.

Member

janl commented Oct 8, 2017

Good call @wohali I touched code in mem3, we must make sure I didn't break anything.

@nickva

This comment has been minimized.

Show comment
Hide comment
@nickva

nickva Oct 9, 2017

Contributor

@janl Good work!

A few comments here and there mostly cleanup and such. I'll add them one by one below

Contributor

nickva commented Oct 9, 2017

@janl Good work!

A few comments here and there mostly cleanup and such. I'll add them one by one below

@nickva

This comment has been minimized.

Show comment
Hide comment
@nickva

nickva Oct 9, 2017

Contributor

Update README file since it's not daemon anymore. If you want add a bit more about how it works. Monitor _users, create db, but only on one node. If cluster configuration changes, this might happen... kinda stuff.

Contributor

nickva commented Oct 9, 2017

Update README file since it's not daemon anymore. If you want add a bit more about how it works. Monitor _users, create db, but only on one node. If cluster configuration changes, this might happen... kinda stuff.

DbName = ?l2b(config:get(
"couch_httpd_auth", "authentication_db", "_users")),
DeleteDbs = config:get_boolean("couch_peruser", "delete_dbs", false),
ClusterState = #clusterState{
% set up cluster-stable listener
Period = abs(config:get_integer("couch_peruser", "cluster_quiet_period",

This comment has been minimized.

@nickva

nickva Oct 9, 2017

Contributor

Maybe add these to the default.ini (commented if you wish) to make sure they documented and visible there.

@nickva

nickva Oct 9, 2017

Contributor

Maybe add these to the default.ini (commented if you wish) to make sure they documented and visible there.

@wohali wohali merged commit d71ce9f into master Oct 11, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@wohali wohali deleted the 749-fix-couch_peruser-app-structure branch Oct 11, 2017

wohali added a commit that referenced this pull request Oct 19, 2017

Make couch_peruser a proper Erlang app (#756)
* Make couch_peruser a proper Erlang app
* Start and stop couch_peruser in the test suite
* feat: mango test runner: do not rely on timeout for CouchDB start alone

On slow build nodes, 10 seconds might not be enough of a wait.

* Ensure a user creation is handlined on one node only

This patch makes use of the mechanism that ensures that replications
are only run on one node.

When the cluster has nodes added/removed all changes listeners are
restarted.

* track cluster state in gen_server state and get notfied from mem3 directly

* move couch_replication_clustering:owner/3 to mem3.erl

* remove reliance on couch_replicator_clustering, handle cluster state internally

* make sure peruser listeners are only initialised once per node

* add type specs

* fix tests

* simplify couch_persuer.app definition

* add registered modules

* remove leftover code from olde notification system

* s/clusterState/state/ && s/state/changes_state/

* s,init/0,init_state/0,

* move function declaration around for internal consistency

* whitespace

* update README

* document ini entries

* unlink changes listeners before exiting them so we survive

* fix state call

* fix style

* fix state

* whitespace and more state fixes

* 80 cols

Closes #749

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

Make couch_peruser a proper Erlang app (apache#756)
* Make couch_peruser a proper Erlang app
* Start and stop couch_peruser in the test suite
* feat: mango test runner: do not rely on timeout for CouchDB start alone

On slow build nodes, 10 seconds might not be enough of a wait.

* Ensure a user creation is handlined on one node only

This patch makes use of the mechanism that ensures that replications
are only run on one node.

When the cluster has nodes added/removed all changes listeners are
restarted.

* track cluster state in gen_server state and get notfied from mem3 directly

* move couch_replication_clustering:owner/3 to mem3.erl

* remove reliance on couch_replicator_clustering, handle cluster state internally

* make sure peruser listeners are only initialised once per node

* add type specs

* fix tests

* simplify couch_persuer.app definition

* add registered modules

* remove leftover code from olde notification system

* s/clusterState/state/ && s/state/changes_state/

* s,init/0,init_state/0,

* move function declaration around for internal consistency

* whitespace

* update README

* document ini entries

* unlink changes listeners before exiting them so we survive

* fix state call

* fix style

* fix state

* whitespace and more state fixes

* 80 cols

Closes apache#749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment