Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New couchup 1.x -> 2.x database migration tool #483

Merged
merged 1 commit into from
Apr 25, 2017
Merged

Conversation

wohali
Copy link
Member

@wohali wohali commented Apr 19, 2017

This commit adds a new Python-based database migration tool, couchup.
It is intended to be used at the command-line on the server being
upgraded, before bringing the node (or cluster) into service.

couchup provides 4 sub-commands to assist in the migration process:

  • list - lists all CouchDB 1.x databases
  • replicate - replicates one or more 1.x databases to CouchDB 2.x
  • rebuild - rebuilds one or more CouchDB 2.x views
  • delete - deletes one or more CouchDB 1.x databases

A typical workflow for a single-node upgrade process would look like:

$ couchup list
$ couchup replicate -a
$ couchup rebuild -a
$ couchup delete -a

A clustered upgrade process would be the same, but must be preceded by
setting up all the nodes in the cluster first.

Various optional arguments provide for admin login/password, overriding
ports, quiet mode and so on.

Of special note is that couchup rebuild supports an optional flag,
-f, to filter deleted documents during the replication process.

I struggled some with the naming convention. For those in the know, a
'1.x database' is a node-local database appearing only on port 5986, and
a '2.x database' is a clustered database appearing on port 5984, and in
raw, sharded form on port 5986.

Testing recommendations

  1. Copy a bunch of CouchDB 1.x .couch files into the CouchDB 2.x data/ directory.
  2. Start CouchDB 2.x.
  3. Use the 4 couchup commands to list, replicate, rebuild and delete the databases as desired.
  4. Test your application against the migrated data.
  5. Report any bugs on this PR please 👍

@wohali wohali requested a review from janl April 20, 2017 05:10
Copy link
Member

@janl janl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well done Joan, flexible, but to the point and easy to follow.

I only have a few nits and wiggles. I’m confident in merging this, so +1.

I just haven’t yet had the chance to actually run this locally.

except ImportError:
HAVE_BAR = False

def _tojson(req):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📃 a comment about what this does would be nice, it looks like a compatibility feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is how I support python-requests 0.x. Will do.


def _do_list(args):
port = str(args['local_port'])
req = requests.get('http://127.0.0.1:' + port + '/_all_dbs',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔Might be useful to make the ip/host configurable like with port, but happy to keep this open as a feature request, it doesn’t block the merge. Might be a good first timer issue.

Also applies to all other instances of 127.0.0.1, not marking them up explicitly here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presumed this might be one of the first requests, and agree that it'd be a good thing for a first timer to tackle, though it's a trivial refactor.

))
if not args['include_system_dbs']:
local_dbs = [x for x in local_dbs if x[0] != '_']
clustered_dbs = [x for x in clustered_dbs if x[0] != '_']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📃 somewhat terse code, comments would be cool, no blocker.

local_port=5986,
clustered_port=5984,
creds=None,
no_progress_bar=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 style: boolean flag names in True state, as to avoid brain-wicking double negatives, as here. Not a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of how argparse works; it's annoying to get it to handle "--flag={true,false}" but very easy to get it to handle "--flag" (and store either True or False depending) I want the default for the progress bar to be on, so the CLI flag needs to turn it off.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about hide_progress_bar instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, ok.

raise Exception('Cannot retrieve {} doc_count!'.format(db))
if local_size == 0:
return
if HAVE_BAR and not no_progress_bar and not quiet:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 as per above, this is already very nice, but would be even nicer as:

if HAVE_BAR and not progress_bar and not quiet:

req = requests.get(url, auth=creds)
req.raise_for_status()
req = _tojson(req)
local_docs = req['doc_count']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 tripped me up, I think you mean number docs in the source db, not actual CouchDB-_local/ docs, right? Not a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will change. I meant "node-local" docs. Could just make this more generic with source/target rather than local_/clus_.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha

clus_size = local_size
progbar.update(clus_size)
count = clus_count
time.sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 sleep value is a good candidate for an option. Not a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree...this is only used for the --timeout=XX option, and needs to stay at 1s granularity otherwise the timeout needs to be forced to be a multiple of the per-loop value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye.

del doc['_rev']
if doc != ddoc:
if not args['quiet']:
print('Source replication filter does not match! Aborting.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good catch.

if doc != ddoc:
if not args['quiet']:
print('Source replication filter does not match! Aborting.')
exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 I’m impartial to returning different exit codes depending on the condition, and then use named constants to refer to the different exit codes. E.g. here it could be exit(E_DDOC_MISMATCH) that is set to 1 further up and then line 167 gets anther, e.g. E_DDOC_PUT_ERROR set to 2). And similar for all other lines with exit(1)` that I’m not referencing here individually.

Absolutely not a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this, yeah. Initially I was raising unique Exceptions but switched over to this to make it easier for shell scripting. Going on memory I think the only function that has multiple exit()s is this one, so this might be the only place that needs more exit codes.

help='show clustered (2.x) databases instead')
parser_list.add_argument('-i', '--include-system-dbs',
action='store_true',
help='include system databases (_users, _replicator, etc.)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Might be worth considering doing separate options for _users and _replicator (the only system dbs in 1.x), I might wanna do _users but not _replicator.

Not a blocker, but would raise as open issue after merge. Would make a good first time contributor issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Not sure I agree, but maybe. Line 352 is just for couchup list which probably doesn't need explicit flags for those 2 DBs.

I suspect you mean line 387 or line 427 where action can be taken on them instead. Right now you'd do that with couchup {replicate|rebuild} _users, the only time -i makes sense is in couchup -a -i. Is it worth adding --users and --replicator flags when couchup replicate -a && couchup replicate _users would suffice? I'm unconvinced, but not super opinionated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, yes, ignore me for now :)

@wohali
Copy link
Member Author

wohali commented Apr 21, 2017

Thanks @janl. I'll correct a few of the more straightforward tweaks today.

I'd like to get feedback from at least a couple of people trying the script (other than me!) before merging, but if no one responds on user@ or dev@ by middle of next week, I'll merge anyway.

This commit adds a new Python-based database migration tool, couchup.
It is intended to be used at the command-line on the server being
upgraded, before bringing the node (or cluster) into service.

couchup provides 4 subcommands to assist in the migration process:

* list - lists all CouchDB 1.x databases
* replicate - replicates one or more 1.x databases to CouchDB 2.x
* rebuild - rebuilds one or more CouchDB 2.x views
* delete - deletes one or more CouchDB 1.x databases

A typical workflow for a single-node upgrade process would look like:

```sh
$ couchdb list
$ couchdb replicate -a
$ couchdb rebuild -a
$ couchdb delete -a
```

A clustered upgrade process would be the same, but must be preceded by
setting up all the nodes in the cluster first.

Various optional arguments provide for admin login/password, overriding
ports, quiet mode and so on.

Of special note is that `couchup rebuild` supports an optional flag,
`-f`, to filter deleted documents during the replication process.

I struggled some with the naming convention. For those in the know, a
'1.x database' is a node-local database appearing only on port 5986, and
a '2.x database' is a clustered database appearing on port 5984, and in
raw, sharded form on port 5986.
@wohali wohali merged commit 1111e60 into master Apr 25, 2017
@wohali wohali deleted the feat-couchup branch April 25, 2017 21:08
@jvabob
Copy link

jvabob commented Aug 14, 2017

Trying to use the replicate function to migrate my 686GB collection of databases...

First 3-4 databases (about 5GB) work great... but now I get something like this when I try it
first got it using the -a feature and now am trying individual db to see if I can find a pattern

/usr/local/couchdb/bin/couchup replicate -l USER -p PASSWORD -t 3600 -f eridu
Starting replication for eridu...
{"error":"not_found","reason":"Could not open source database http://127.0.0.1:5986/eridu/: {db_not_found,<<"http://127.0.0.1:5986/eridu/\">>}"}

if I curl http://127.0.0.1:5986/eridu/ ( and add the user and pass ) I get

{"db_name":"eridu","doc_count":7438871,"doc_del_count":511625,"update_seq":8168757,"purge_seq":0,"compact_running":false,"disk_size":81804304630,"other":{"data_size":109},"data_size":12273332816,"sizes":{"file":81804304630,"active":12273332816,"external":109},"instance_start_time":"1502734780758722","disk_format_version":6,"committed_update_seq":8168757,"compacted_seq":0,"uuid":"4a84be53b812cac9b71915eab09f3be9"}

So the db sure seems to be there... I can go to 5986/_utils and see the db... can view data etc

Any pointers?

@wohali
Copy link
Member Author

wohali commented Aug 14, 2017

Please file this as a new issue so we can properly follow this request, not as a comment on the pull request.

Please be sure to paste the exact results of the output. If possible please wrap your paste in triple-backticks, like this:

    ```
    /usr/local/couchdb/bin/couchup replicate...
    ```

@jvabob
Copy link

jvabob commented Aug 14, 2017 via email

nickva pushed a commit to nickva/couchdb that referenced this pull request Sep 7, 2022
Fix typo in `_scheduler/docs`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants