-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New couchup 1.x -> 2.x database migration tool #483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well done Joan, flexible, but to the point and easy to follow.
I only have a few nits and wiggles. I’m confident in merging this, so +1.
I just haven’t yet had the chance to actually run this locally.
except ImportError: | ||
HAVE_BAR = False | ||
|
||
def _tojson(req): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📃 a comment about what this does would be nice, it looks like a compatibility feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is how I support python-requests 0.x. Will do.
|
||
def _do_list(args): | ||
port = str(args['local_port']) | ||
req = requests.get('http://127.0.0.1:' + port + '/_all_dbs', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔Might be useful to make the ip/host configurable like with port
, but happy to keep this open as a feature request, it doesn’t block the merge. Might be a good first timer issue.
Also applies to all other instances of 127.0.0.1
, not marking them up explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presumed this might be one of the first requests, and agree that it'd be a good thing for a first timer to tackle, though it's a trivial refactor.
rel/overlay/bin/couchup
Outdated
)) | ||
if not args['include_system_dbs']: | ||
local_dbs = [x for x in local_dbs if x[0] != '_'] | ||
clustered_dbs = [x for x in clustered_dbs if x[0] != '_'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📃 somewhat terse code, comments would be cool, no blocker.
rel/overlay/bin/couchup
Outdated
local_port=5986, | ||
clustered_port=5984, | ||
creds=None, | ||
no_progress_bar=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 style: boolean flag names in True
state, as to avoid brain-wicking double negatives, as here. Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of how argparse works; it's annoying to get it to handle "--flag={true,false}" but very easy to get it to handle "--flag" (and store either True or False depending) I want the default for the progress bar to be on, so the CLI flag needs to turn it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about hide_progress_bar
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, ok.
rel/overlay/bin/couchup
Outdated
raise Exception('Cannot retrieve {} doc_count!'.format(db)) | ||
if local_size == 0: | ||
return | ||
if HAVE_BAR and not no_progress_bar and not quiet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 as per above, this is already very nice, but would be even nicer as:
if HAVE_BAR and not progress_bar and not quiet:
rel/overlay/bin/couchup
Outdated
req = requests.get(url, auth=creds) | ||
req.raise_for_status() | ||
req = _tojson(req) | ||
local_docs = req['doc_count'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 tripped me up, I think you mean number docs in the source db, not actual CouchDB-_local/
docs, right? Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will change. I meant "node-local" docs. Could just make this more generic with source/target rather than local_/clus_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha
clus_size = local_size | ||
progbar.update(clus_size) | ||
count = clus_count | ||
time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 sleep value is a good candidate for an option. Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree...this is only used for the --timeout=XX option, and needs to stay at 1s granularity otherwise the timeout needs to be forced to be a multiple of the per-loop value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aye.
del doc['_rev'] | ||
if doc != ddoc: | ||
if not args['quiet']: | ||
print('Source replication filter does not match! Aborting.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good catch.
if doc != ddoc: | ||
if not args['quiet']: | ||
print('Source replication filter does not match! Aborting.') | ||
exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 I’m impartial to returning different exit codes depending on the condition, and then use named constants to refer to the different exit codes. E.g. here it could be exit(E_DDOC_MISMATCH)
that is set to 1
further up and then line 167 gets anther, e.g. E_DDOC_PUT_ERROR
set to 2). And similar for all other lines with
exit(1)` that I’m not referencing here individually.
Absolutely not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this, yeah. Initially I was raising unique Exceptions but switched over to this to make it easier for shell scripting. Going on memory I think the only function that has multiple exit()s is this one, so this might be the only place that needs more exit codes.
help='show clustered (2.x) databases instead') | ||
parser_list.add_argument('-i', '--include-system-dbs', | ||
action='store_true', | ||
help='include system databases (_users, _replicator, etc.)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Might be worth considering doing separate options for _users
and _replicator
(the only system dbs in 1.x), I might wanna do _users
but not _replicator
.
Not a blocker, but would raise as open issue after merge. Would make a good first time contributor issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Not sure I agree, but maybe. Line 352 is just for couchup list
which probably doesn't need explicit flags for those 2 DBs.
I suspect you mean line 387 or line 427 where action can be taken on them instead. Right now you'd do that with couchup {replicate|rebuild} _users
, the only time -i
makes sense is in couchup -a -i
. Is it worth adding --users
and --replicator
flags when couchup replicate -a && couchup replicate _users
would suffice? I'm unconvinced, but not super opinionated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry, yes, ignore me for now :)
Thanks @janl. I'll correct a few of the more straightforward tweaks today. I'd like to get feedback from at least a couple of people trying the script (other than me!) before merging, but if no one responds on user@ or dev@ by middle of next week, I'll merge anyway. |
592b7fa
to
8f6391c
Compare
This commit adds a new Python-based database migration tool, couchup. It is intended to be used at the command-line on the server being upgraded, before bringing the node (or cluster) into service. couchup provides 4 subcommands to assist in the migration process: * list - lists all CouchDB 1.x databases * replicate - replicates one or more 1.x databases to CouchDB 2.x * rebuild - rebuilds one or more CouchDB 2.x views * delete - deletes one or more CouchDB 1.x databases A typical workflow for a single-node upgrade process would look like: ```sh $ couchdb list $ couchdb replicate -a $ couchdb rebuild -a $ couchdb delete -a ``` A clustered upgrade process would be the same, but must be preceded by setting up all the nodes in the cluster first. Various optional arguments provide for admin login/password, overriding ports, quiet mode and so on. Of special note is that `couchup rebuild` supports an optional flag, `-f`, to filter deleted documents during the replication process. I struggled some with the naming convention. For those in the know, a '1.x database' is a node-local database appearing only on port 5986, and a '2.x database' is a clustered database appearing on port 5984, and in raw, sharded form on port 5986.
Trying to use the replicate function to migrate my 686GB collection of databases... First 3-4 databases (about 5GB) work great... but now I get something like this when I try it /usr/local/couchdb/bin/couchup replicate -l USER -p PASSWORD -t 3600 -f eridu if I curl http://127.0.0.1:5986/eridu/ ( and add the user and pass ) I get {"db_name":"eridu","doc_count":7438871,"doc_del_count":511625,"update_seq":8168757,"purge_seq":0,"compact_running":false,"disk_size":81804304630,"other":{"data_size":109},"data_size":12273332816,"sizes":{"file":81804304630,"active":12273332816,"external":109},"instance_start_time":"1502734780758722","disk_format_version":6,"committed_update_seq":8168757,"compacted_seq":0,"uuid":"4a84be53b812cac9b71915eab09f3be9"} So the db sure seems to be there... I can go to 5986/_utils and see the db... can view data etc Any pointers? |
Please file this as a new issue so we can properly follow this request, not as a comment on the pull request. Please be sure to paste the exact results of the output. If possible please wrap your paste in triple-backticks, like this:
|
Sorry for the confusion on my part…
I have submitted the new issue.
Thanks!
Bob Jones
850.251.7485
… On Aug 14, 2017, at 3:36:53:000PM, Joan Touzet ***@***.***> wrote:
Please file this as a new issue <https://github.com/apache/couchdb/issues/new> so we can properly follow this request, not as a comment on the pull request.
Please be sure to paste the exact results of the output. If possible please wrap your paste in triple-backticks, like this:
```
/usr/local/couchdb/bin/couchup replicate...
```
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#483 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApIKoyAT5dwGmCTw6G8SnwuOgwFlZnCks5sYKHUgaJpZM4NCSOx>.
|
Fix typo in `_scheduler/docs`
This commit adds a new Python-based database migration tool,
couchup
.It is intended to be used at the command-line on the server being
upgraded, before bringing the node (or cluster) into service.
couchup
provides 4 sub-commands to assist in the migration process:list
- lists all CouchDB 1.x databasesreplicate
- replicates one or more 1.x databases to CouchDB 2.xrebuild
- rebuilds one or more CouchDB 2.x viewsdelete
- deletes one or more CouchDB 1.x databasesA typical workflow for a single-node upgrade process would look like:
A clustered upgrade process would be the same, but must be preceded by
setting up all the nodes in the cluster first.
Various optional arguments provide for admin login/password, overriding
ports, quiet mode and so on.
Of special note is that
couchup rebuild
supports an optional flag,-f
, to filter deleted documents during the replication process.I struggled some with the naming convention. For those in the know, a
'1.x database' is a node-local database appearing only on port 5986, and
a '2.x database' is a clustered database appearing on port 5984, and in
raw, sharded form on port 5986.
Testing recommendations
.couch
files into the CouchDB 2.xdata/
directory.couchup
commands to list, replicate, rebuild and delete the databases as desired.