New couchup 1.x -> 2.x database migration tool #483

wohali · 2017-04-19T21:47:51Z

This commit adds a new Python-based database migration tool, couchup.
It is intended to be used at the command-line on the server being
upgraded, before bringing the node (or cluster) into service.

couchup provides 4 sub-commands to assist in the migration process:

list - lists all CouchDB 1.x databases
replicate - replicates one or more 1.x databases to CouchDB 2.x
rebuild - rebuilds one or more CouchDB 2.x views
delete - deletes one or more CouchDB 1.x databases

A typical workflow for a single-node upgrade process would look like:

$ couchup list
$ couchup replicate -a
$ couchup rebuild -a
$ couchup delete -a

A clustered upgrade process would be the same, but must be preceded by
setting up all the nodes in the cluster first.

Various optional arguments provide for admin login/password, overriding
ports, quiet mode and so on.

Of special note is that couchup rebuild supports an optional flag,
-f, to filter deleted documents during the replication process.

I struggled some with the naming convention. For those in the know, a
'1.x database' is a node-local database appearing only on port 5986, and
a '2.x database' is a clustered database appearing on port 5984, and in
raw, sharded form on port 5986.

Testing recommendations

Copy a bunch of CouchDB 1.x .couch files into the CouchDB 2.x data/ directory.
Start CouchDB 2.x.
Use the 4 couchup commands to list, replicate, rebuild and delete the databases as desired.
Test your application against the migrated data.
Report any bugs on this PR please 👍

janl

Very well done Joan, flexible, but to the point and easy to follow.

I only have a few nits and wiggles. I’m confident in merging this, so +1.

I just haven’t yet had the chance to actually run this locally.

janl · 2017-04-21T11:47:15Z

rel/overlay/bin/couchup

+except ImportError:
+    HAVE_BAR = False
+
+def _tojson(req):


📃 a comment about what this does would be nice, it looks like a compatibility feature?

Yup, this is how I support python-requests 0.x. Will do.

janl · 2017-04-21T11:48:52Z

rel/overlay/bin/couchup

+
+def _do_list(args):
+    port = str(args['local_port'])
+    req = requests.get('http://127.0.0.1:' + port + '/_all_dbs',


🤔Might be useful to make the ip/host configurable like with port, but happy to keep this open as a feature request, it doesn’t block the merge. Might be a good first timer issue.

Also applies to all other instances of 127.0.0.1, not marking them up explicitly here.

I presumed this might be one of the first requests, and agree that it'd be a good thing for a first timer to tackle, though it's a trivial refactor.

janl · 2017-04-21T11:49:36Z

rel/overlay/bin/couchup

+    ))
+    if not args['include_system_dbs']:
+        local_dbs = [x for x in local_dbs if x[0] != '_']
+        clustered_dbs = [x for x in clustered_dbs if x[0] != '_']


📃 somewhat terse code, comments would be cool, no blocker.

janl · 2017-04-21T11:51:47Z

rel/overlay/bin/couchup

+        local_port=5986,
+        clustered_port=5984,
+        creds=None,
+        no_progress_bar=False,


🎨 style: boolean flag names in True state, as to avoid brain-wicking double negatives, as here. Not a blocker.

This is because of how argparse works; it's annoying to get it to handle "--flag={true,false}" but very easy to get it to handle "--flag" (and store either True or False depending) I want the default for the progress bar to be on, so the CLI flag needs to turn it off.

What about hide_progress_bar instead?

janl · 2017-04-21T11:52:57Z

rel/overlay/bin/couchup

+        raise Exception('Cannot retrieve {} doc_count!'.format(db))
+    if local_size == 0:
+        return
+    if HAVE_BAR and not no_progress_bar and not quiet:


🎨 as per above, this is already very nice, but would be even nicer as:

if HAVE_BAR and not progress_bar and not quiet:

janl · 2017-04-21T11:55:15Z

rel/overlay/bin/couchup

+        req = requests.get(url, auth=creds)
+        req.raise_for_status()
+        req = _tojson(req)
+        local_docs = req['doc_count']


🎨 tripped me up, I think you mean number docs in the source db, not actual CouchDB-_local/ docs, right? Not a blocker.

Good point, will change. I meant "node-local" docs. Could just make this more generic with source/target rather than local_/clus_.

janl · 2017-04-21T11:58:24Z

rel/overlay/bin/couchup

+                clus_size = local_size
+            progbar.update(clus_size)
+        count = clus_count
+        time.sleep(1)


🤔 sleep value is a good candidate for an option. Not a blocker.

Disagree...this is only used for the --timeout=XX option, and needs to stay at 1s granularity otherwise the timeout needs to be forced to be a multiple of the per-loop value.

janl · 2017-04-21T11:58:55Z

rel/overlay/bin/couchup

+        del doc['_rev']
+        if doc != ddoc:
+            if not args['quiet']:
+                print('Source replication filter does not match! Aborting.')


👍 Good catch.

janl · 2017-04-21T12:00:26Z

rel/overlay/bin/couchup

+        if doc != ddoc:
+            if not args['quiet']:
+                print('Source replication filter does not match! Aborting.')
+            exit(1)


🎨 I’m impartial to returning different exit codes depending on the condition, and then use named constants to refer to the different exit codes. E.g. here it could be exit(E_DDOC_MISMATCH) that is set to 1 further up and then line 167 gets anther, e.g. E_DDOC_PUT_ERROR set to 2). And similar for all other lines with exit(1)` that I’m not referencing here individually.

Absolutely not a blocker.

I did consider this, yeah. Initially I was raising unique Exceptions but switched over to this to make it easier for shell scripting. Going on memory I think the only function that has multiple exit()s is this one, so this might be the only place that needs more exit codes.

janl · 2017-04-21T12:06:08Z

rel/overlay/bin/couchup

+        help='show clustered (2.x) databases instead')
+    parser_list.add_argument('-i', '--include-system-dbs',
+        action='store_true',
+        help='include system databases (_users, _replicator, etc.)')


🤔 Might be worth considering doing separate options for _users and _replicator (the only system dbs in 1.x), I might wanna do _users but not _replicator.

Not a blocker, but would raise as open issue after merge. Would make a good first time contributor issue.

🤔 Not sure I agree, but maybe. Line 352 is just for couchup list which probably doesn't need explicit flags for those 2 DBs.

I suspect you mean line 387 or line 427 where action can be taken on them instead. Right now you'd do that with couchup {replicate|rebuild} _users, the only time -i makes sense is in couchup -a -i. Is it worth adding --users and --replicator flags when couchup replicate -a && couchup replicate _users would suffice? I'm unconvinced, but not super opinionated.

ah sorry, yes, ignore me for now :)

wohali · 2017-04-21T17:18:48Z

Thanks @janl. I'll correct a few of the more straightforward tweaks today.

I'd like to get feedback from at least a couple of people trying the script (other than me!) before merging, but if no one responds on user@ or dev@ by middle of next week, I'll merge anyway.

This commit adds a new Python-based database migration tool, couchup. It is intended to be used at the command-line on the server being upgraded, before bringing the node (or cluster) into service. couchup provides 4 subcommands to assist in the migration process: * list - lists all CouchDB 1.x databases * replicate - replicates one or more 1.x databases to CouchDB 2.x * rebuild - rebuilds one or more CouchDB 2.x views * delete - deletes one or more CouchDB 1.x databases A typical workflow for a single-node upgrade process would look like: ```sh $ couchdb list $ couchdb replicate -a $ couchdb rebuild -a $ couchdb delete -a ``` A clustered upgrade process would be the same, but must be preceded by setting up all the nodes in the cluster first. Various optional arguments provide for admin login/password, overriding ports, quiet mode and so on. Of special note is that `couchup rebuild` supports an optional flag, `-f`, to filter deleted documents during the replication process. I struggled some with the naming convention. For those in the know, a '1.x database' is a node-local database appearing only on port 5986, and a '2.x database' is a clustered database appearing on port 5984, and in raw, sharded form on port 5986.

jvabob · 2017-08-14T18:20:44Z

Trying to use the replicate function to migrate my 686GB collection of databases...

First 3-4 databases (about 5GB) work great... but now I get something like this when I try it
first got it using the -a feature and now am trying individual db to see if I can find a pattern

/usr/local/couchdb/bin/couchup replicate -l USER -p PASSWORD -t 3600 -f eridu
Starting replication for eridu...
{"error":"not_found","reason":"Could not open source database http://127.0.0.1:5986/eridu/: {db_not_found,<<"http://127.0.0.1:5986/eridu/\">>}"}

if I curl http://127.0.0.1:5986/eridu/ ( and add the user and pass ) I get

{"db_name":"eridu","doc_count":7438871,"doc_del_count":511625,"update_seq":8168757,"purge_seq":0,"compact_running":false,"disk_size":81804304630,"other":{"data_size":109},"data_size":12273332816,"sizes":{"file":81804304630,"active":12273332816,"external":109},"instance_start_time":"1502734780758722","disk_format_version":6,"committed_update_seq":8168757,"compacted_seq":0,"uuid":"4a84be53b812cac9b71915eab09f3be9"}

So the db sure seems to be there... I can go to 5986/_utils and see the db... can view data etc

Any pointers?

wohali · 2017-08-14T19:36:48Z

Please file this as a new issue so we can properly follow this request, not as a comment on the pull request.

Please be sure to paste the exact results of the output. If possible please wrap your paste in triple-backticks, like this:

    ```
    /usr/local/couchdb/bin/couchup replicate...
    ```

jvabob · 2017-08-14T19:41:46Z

Sorry for the confusion on my part… I have submitted the new issue. Thanks! Bob Jones 850.251.7485

…

On Aug 14, 2017, at 3:36:53:000PM, Joan Touzet ***@***.***> wrote: Please file this as a new issue <https://github.com/apache/couchdb/issues/new> so we can properly follow this request, not as a comment on the pull request. Please be sure to paste the exact results of the output. If possible please wrap your paste in triple-backticks, like this: ``` /usr/local/couchdb/bin/couchup replicate... ``` — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#483 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApIKoyAT5dwGmCTw6G8SnwuOgwFlZnCks5sYKHUgaJpZM4NCSOx>.

Fix typo in `_scheduler/docs`

wohali requested a review from janl April 20, 2017 05:10

janl approved these changes Apr 21, 2017

View reviewed changes

wohali force-pushed the feat-couchup branch 2 times, most recently from 592b7fa to 8f6391c Compare April 22, 2017 18:23

wohali mentioned this pull request Apr 23, 2017

New debian/ubuntu 2.0 .deb support apache/couchdb-pkg#4

Merged

wohali force-pushed the feat-couchup branch from 8f6391c to b05b172 Compare April 23, 2017 19:34

wohali merged commit 1111e60 into master Apr 25, 2017

wohali deleted the feat-couchup branch April 25, 2017 21:08

nickva pushed a commit to nickva/couchdb that referenced this pull request Sep 7, 2022

Merge pull request apache#483 from bessbd/patch-2

bc6cbfe

Fix typo in `_scheduler/docs`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New couchup 1.x -> 2.x database migration tool #483

New couchup 1.x -> 2.x database migration tool #483

wohali commented Apr 19, 2017 •

edited

Loading

janl left a comment

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

gesellix Apr 22, 2017

wohali Apr 22, 2017

janl Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

janl Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

wohali Apr 21, 2017

janl Apr 21, 2017

wohali commented Apr 21, 2017

jvabob commented Aug 14, 2017

wohali commented Aug 14, 2017

jvabob commented Aug 14, 2017 via email

New couchup 1.x -> 2.x database migration tool #483

New couchup 1.x -> 2.x database migration tool #483

Conversation

wohali commented Apr 19, 2017 • edited Loading

Testing recommendations

janl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wohali commented Apr 21, 2017

jvabob commented Aug 14, 2017

wohali commented Aug 14, 2017

jvabob commented Aug 14, 2017 via email

wohali commented Apr 19, 2017 •

edited

Loading