Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to copy datasets to another instance? #52

Closed
timwis opened this issue Jul 12, 2015 · 5 comments
Closed

Best way to copy datasets to another instance? #52

timwis opened this issue Jul 12, 2015 · 5 comments

Comments

@timwis
Copy link

timwis commented Jul 12, 2015

Hello, I'm trying to copy all the dataset metadata from opendataphilly.org to a local CKAN instance I have running on a digitalocean droplet. I'm using the command from the README:

$ ckanapi dump datasets --all -q -r https://opendataphilly.org | ckanapi load datasets -c $CKAN_INI

And I'm getting the error create ValidationError {"owner_org":["Organization does not exist"]} repeatedly. I've tried creating the organization, but I assume because owner_org references the organizations ID instead of its slug/name, and the one I just created has a brand new unique ID, it's still not working.

I've tried doing a dump & load of the organizations but I get a stack trace of a python error:

Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/ckanapi", line 9, in <module>
    load_entry_point('ckanapi==3.4', 'console_scripts', 'ckanapi')()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 70, in main
    return _switch_to_paster(arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 112, in _switch_to_paster
    sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/paster.py", line 29, in command
    return main.main(running_with_paster=True)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 98, in main
    return load_things(ckan, thing[0], arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 41, in load_things
    return load_things_worker(ckan, thing, arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 184, in load_things_worker
    r = ckan.call_action(thing_create, obj)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/localckan.py", line 50, in call_action
    return self._get_action(action)(dict(context), dict(data_dict))
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 424, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 857, in group_create
    return _group_or_org_create(context, data_dict)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 723, in _group_or_org_create
    group = model_save.group_dict_save(data, context)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/model_save.py", line 389, in group_dict_save
    group = d.table_dict_save(group_dict, Group, context)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/__init__.py", line 139, in table_dict_save
    setattr(obj, key, value)
AttributeError: can't set attribute

Any idea the best way to copy these datasets over?

Also note that it doesn't have to be perfect as this is just for testing a script before using it on the production version of opendataphilly.

@deniszgonjanin
Copy link

You can strip the owner_org key from each package object before importing:

ckanapi dump datasets --all -q -r https://opendataphilly.org | jq --compact-output 'del(.owner_org)' | ckanapi load datasets -c $CKAN_INI

Afterwards if you also wanted to import organizations, they should have references to all the datasets and the mappings will be preserved.

@timwis
Copy link
Author

timwis commented Jul 13, 2015

Thanks I'll give that a shot it sounds like it would work. Any idea why the
organizations import failed?
On Jul 13, 2015 12:40 AM, "Denis Zgonjanin" notifications@github.com
wrote:

You can strip the owner_org key from each package object before importing:

ckanapi dump datasets --all -q -r https://opendataphilly.org | jq --compact-output 'del(.owner_org)' | ckanapi load datasets -c $CKAN_INI

Afterwards if you also wanted to import organizations, they should have
references to all the datasets and the mappings will be preserved.


Reply to this email directly or view it on GitHub
#52 (comment).

@deniszgonjanin
Copy link

Probably for the same reason datasets were failing - datasets reference their organizations and vice versa. If the first command works you should be able to import all the orgs with:

ckanapi dump organizations --all -r https://opendataphilly.org | ckanapi load organizations -c $CKAN_INI

I think that will preserve dataset mappings. In any case, it would be good for ckanapi cli to have a command that migrates all datasets, groups, and orgs at once to another CKAN instance. If the above doesn't work, you could open an issue to that effect.

@timwis
Copy link
Author

timwis commented Jul 25, 2015

For anyone else with this issue, I had to delete the following 3 properties on the dataset dump records in order for ckanapi load datasets to work:

$ jq --compact-output 'del(.owner_org,.resources[].revision_id,.organization.revision_id)'

Also, FYI @deniszgonjanin I'm still getting the can't set attribute error when importing organizations. Probably need to delete attributes like I've done above.

Update: I got both organizations and groups to load using the following jq command:

$ jq --compact-output '{title: .title, name: .name}'

@timwis
Copy link
Author

timwis commented Jul 26, 2015

Got it even more concise - rather than deleting specific properties, you can use jq to only use specific properties. I've compiled the 3 commands to copy organizations, groups, and datasets and have the datasets be associated with the organizations and groups:

To copy the organizations

$ ckanapi dump organizations --all -q -r https://opendataphilly.org \
   | jq --compact-output '{name, title, description}' \
   | ckanapi load organizations -c $CKAN_INI

To copy the groups/topics

$ ckanapi dump groups --all -q -r https://opendataphilly.org \
   | jq --compact-output '{name, title, description}' \
   | ckanapi load groups -c $CKAN_INI

To copy the datasets

$ ckanapi dump datasets --all -q -r https://opendataphilly.org \
   | jq --compact-output '{name, title, notes, license_id, license_title, maintainer, maintainer_email, owner_org: .organization.name, extras, groups: [.groups[] | {name}], resources: [.resources[] | {url, name, format, description}]}' \
   | ckanapi load datasets -c $CKAN_INI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants