New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cluster auto-assembly through a seedlist #1658

Merged
merged 9 commits into from Nov 11, 2018

Conversation

Projects
None yet
3 participants
@kocolosk
Member

kocolosk commented Oct 16, 2018

Overview

This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up:

[cluster]
seedlist = couchdb@node1.example.com,couchdb@node2.example.com,couchdb@node3.example.com

If configured, CouchDB will add every node in the seedlist to the _nodes DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.

We also modify the /_up endpoint to reflect the progress of the initial seeding of the node. If a seedlist is configured the endpoint will return 404 until the local node has updated its local replica of each of the system databases from one of the members of the seedlist. The body of the HTTP response now looks like

{
  "status": "seeding"
  "seeds": {
    "couchdb@node1.example.com": {
      "timestamp": "2018-10-16T19:58:03+00:00",
      "last_replication_status": "ok",
      "pending_updates": {"_nodes": 0, "_dbs": 101, "_users": 42}
    },
    "couchdb@node2.example.com": { ...
}

Once the status flips to "ok" the endpoint will return 200 and it's safe to direct requests to the new node.

Testing recommendations

  • Configure the seedlist for a new 3 node cluster with the names of the 3 nodes and check /_membership to confirm that the nodes connect to each other automatically
  • On a cluster with lots of databases or users, add a node to the cluster and check that /_up returns 404 while the initial internal replication takes place.

You'll notice that the PR currently has no tests. I wanted to put it up for review while I familiarize myself with the latest bits of the test suite and see what I can contribute.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

kocolosk added some commits Jun 28, 2018

Enable cluster auto-assembly through a seedlist
This introduces a new config setting which allows an administrator to
configure an initial list of nodes that should be contacted when a node
boots up:

[cluster]
seedlist = couchdb@node1.example.com,couchdb@node2.example.com,couchdb@node3.example.com

If configured, CouchDB will add every node in the seedlist to the _nodes
DB automatically, which will trigger a distributed Erlang connection and
a replication of the internal system databases to the local node. This
eliminates the need to explicitly add each node using the HTTP API.
Pull local system DBs from seed on startup
This patch adds a new gen_server whose only job is to download the
system DBs (_nodes, _dbs, _users) from the nodes in the seedlist, and
then set a flag once it has downloaded a complete copy. Once the flag
is set we can confidently allow the node to handle HTTP requests.
Add some unit tests for seedlist configuration
Missing from this test suite is anything that actually triggers an
internal replication between nodes in a cluster, because I don't know
how to do that (or if it is even possible).

@asfgit asfgit force-pushed the mem3-seedlist branch from b85a9e2 to 6f35073 Oct 17, 2018

@kocolosk kocolosk referenced this pull request Oct 17, 2018

Merged

Document the new seedlist config setting #339

2 of 3 tasks complete
% "Pull" is a bit of a misnomer here, as what we're actually doing is
% issuing an RPC request and telling the remote node to push updates to
% us. This lets us reuse all of the battle-tested machinery of mem3_rpc.
pull_from_seed(Seed) ->

This comment has been minimized.

@nickva

nickva Oct 25, 2018

Contributor

This seems useful in general and maybe rename this to pull_replication so it matches with the pull_replication_rpc local callback?

This comment has been minimized.

@kocolosk

kocolosk Oct 25, 2018

Member

Good idea will do

gen_server:call(?MODULE, get_status).
init([]) ->
Seeds = get_seeds(),

This comment has been minimized.

@nickva

nickva Oct 25, 2018

Contributor

Is there any use case where the seed would be added later to config after the node is started, so get_seeds() would be called then every time before start_replication(Seeds) is called.

This comment has been minimized.

@kocolosk

kocolosk Oct 25, 2018

Member

I don’t expect that use case as the whole seed list feature is really built to make the node initialization process more robust. Would adding a seed cause _up to flip back to 404 if no seed had previously been contacted? Lots of weird stuff there.

init([]) ->
Seeds = get_seeds(),
timer:send_interval(?REPLICATION_INTERVAL, start_replication),

This comment has been minimized.

@nickva

nickva Oct 25, 2018

Contributor

This will send start_replication forever every minute. What is the idea behind it? Something like, "once we have a seed list, we'll try to continuously replicate dbs from the seed list to our this node". Once we do it one time, wouldn't mem3_sync take care of this afterwards. Or this is just to handle retries if there are failures?

This comment has been minimized.

@kocolosk

kocolosk Oct 25, 2018

Member

Hmmm ... I honestly don’t remember. I probably added it as a safeguard against failures with the intention to cancel the timer once we hit a “ready” status — but never did. I can add that

@wohali

This comment has been minimized.

Member

wohali commented Nov 7, 2018

@kocolosk any progress on the test suite? Also, is there any need for #1337 to land for this to function?

@kocolosk

This comment has been minimized.

Member

kocolosk commented Nov 7, 2018

Yes, 6f35073 added some tests. Could do more but I think I'd need to figure out how to mock internal replication.

No, #1337 is an independent thought. Both can coexist. Of the two I would personally consider this one to be a higher priority for production automation and operations.

@wohali

This comment has been minimized.

Member

wohali commented Nov 7, 2018

@kocolosk thanks. Do you think you could submit a documentation PR for this new feature?

Also, you must update default.ini and/or local.ini under etc/rel/overlay before this PR lands. ;)

@kocolosk

This comment has been minimized.

Member

kocolosk commented Nov 7, 2018

I already took care of the documentation PR -- submitted and approved: apache/couchdb-documentation#339

I will update the default config and address Nick's other comment, hopefully tomorrow.

kocolosk added some commits Nov 9, 2018

Remove superfluous start_replication messages
This is a holdover from an initial prototype; the current version is
already equipped to run start_replication only as often as necessary to
get the node into a ready state.
@nickva

nickva approved these changes Nov 9, 2018 edited

Looks good. Nice work!

I tested by starting a disconnected cluster:

./dev/run --admin=adm:pass --no-join

Created some dbs in node1. Then stopped, it, editedd the seedlist with node1 as the only seed. Restarted the disconnected cluster.

Cluster connected as expected and _dbs was synchronized.

http http://adm:pass@localhost:15984/_dbs
http http://adm:pass@localhost:25984/_dbs
http http://adm:pass@localhost:35984/_dbs

All show:

"update_seq": "5196-g2wAAAABaANkAA9ub2RlMUAxMjcuMC4wLjFsAAAAAmEAbgQA_____2piAAAUTGo"

While seeding response on node2 and node3 was 404:

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 127
Content-Type: application/json
Date: Fri, 09 Nov 2018 22:56:26 GMT
Server: CouchDB/2.2.0-6f3507303 (Erlang OTP/20)
X-Couch-Request-ID: 74f82375be
X-CouchDB-Body-Time: 0

{
    "seeds": {
        "node1@127.0.0.1": {
            "last_replication_status": "error",
            "timestamp": "2018-11-09T22:56:07.608284Z"
        }
    },
    "status": "seeding"
}

It did actually show an error. I think I might have seen a rexi_DOWN in the log possibly from it.

But it did finish correctly and _up started showing:

{
    "seeds": {
        "node1@127.0.0.1": {
            "last_replication_status": "ok",
            "pending_updates": {
                "_dbs": 0,
                "_nodes": 0,
                "_users": 0
            },
            "timestamp": "2018-11-09T22:56:28.126795Z"
        }
    },
    "status": "ok"
}

kocolosk added some commits Nov 10, 2018

@kocolosk kocolosk merged commit 0302e9d into master Nov 11, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@kocolosk kocolosk deleted the mem3-seedlist branch Nov 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment