Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump datasets performance: use package_search for ckan >= 2.2 #44

Closed
wardi opened this issue May 17, 2015 · 1 comment
Closed

dump datasets performance: use package_search for ckan >= 2.2 #44

wardi opened this issue May 17, 2015 · 1 comment

Comments

@wardi
Copy link
Contributor

wardi commented May 17, 2015

with package_search we can dump all datasets in far fewer API calls.

issues:

  • ckan < 2.2 returns different dataset data from package_search and package_show so we'll need to maintain the old code as well
  • we need to request the datasets ordered by id, not modification date, so that we know we have a complete dump and to replicate the current behaviour
  • ckan sites may have limited the number of packages returned from package_search in different ways, maybe detect the limit and work with what we're given, or just revert to package_show method?
@wardi
Copy link
Contributor Author

wardi commented May 17, 2015

@amercader we discussed this at a dev meeting last week. I now think it's impossible to reliably get all the datasets from package_search with the default sort="metadata_modified desc". Any datasets modified while we're dumping will be missed and duplicates will appear as we're paging through because the end of one page will get pushed into the next.

Setting sort="id asc" should work though.

@wardi wardi changed the title dump datasets performance: use package_search dump datasets performance: use package_search for ckan >= 2.2 May 17, 2015
@wardi wardi closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant