You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with package_search we can dump all datasets in far fewer API calls.
issues:
ckan < 2.2 returns different dataset data from package_search and package_show so we'll need to maintain the old code as well
we need to request the datasets ordered by id, not modification date, so that we know we have a complete dump and to replicate the current behaviour
ckan sites may have limited the number of packages returned from package_search in different ways, maybe detect the limit and work with what we're given, or just revert to package_show method?
The text was updated successfully, but these errors were encountered:
@amercader we discussed this at a dev meeting last week. I now think it's impossible to reliably get all the datasets from package_search with the default sort="metadata_modified desc". Any datasets modified while we're dumping will be missed and duplicates will appear as we're paging through because the end of one page will get pushed into the next.
Setting sort="id asc" should work though.
wardi
changed the title
dump datasets performance: use package_search
dump datasets performance: use package_search for ckan >= 2.2
May 17, 2015
with package_search we can dump all datasets in far fewer API calls.
issues:
The text was updated successfully, but these errors were encountered: