-
Notifications
You must be signed in to change notification settings - Fork 182
CKAN commands
To create a system administrator account for CKAN, the user must first exist in the database by first logging into CKAN (through MAX.gov). Users must request access by following the Account Procedures first. This command should be run from one of the harvesters, e.g. catalog-harvester2p.
$ sudo ckan sysadmin add <email-address>
All harvester commands should be run from one of the harvesters, usually catalog-harvester1p.
The harvest run command runs every few minutes to manage pending and in-progress harvest jobs. It will (not necessarily in this order):
- Queue jobs that have been scheduled
- Starts jobs that have been queued
- Clean up jobs that have completed or errored
- Email job results to points of contact
Run the job through supervisor.
$ sudo supervisorctl start harvest-run
The job is logged to /var/log/ckan/harvest-run.log.
ckan --plugin=ckanext-geodatagov geodatagov harvest-job-cleanup
Harvest jobs can get stuck at Running state and stay that way forever. This will reset them and fix any harvest object issues they cause.
ckan --plugin=ckanext-qa qa update_sel
Start QA analysis on all datasets whose 'last modified timestamp' is >= timestamp embedded in the following file: /var/log/qa-metadata-modified.log
ckan --plugin=ckanext-qa qa collect-ids && ckan --plugin=ckanext-qa qa update
Compare to qa update_sel, this qa update will run analysis on ALL datasets. It will take loooooooong to finish.
ckan --plugin=ckanext-geodatagov geodatagov clean-deleted
CKAN keeps deleted package in the DB. This clean command makes sure they are really gone.
ckan tracking update
This needs to be run periodically in order to run analysis on raw data and generate summarized page view tracking data that ckan/solr can use.
ckan --plugin=ckanext-report report generate
This generates /report/broken-links page showing broken link statistics for dataset resources by organization.
ckan --plugin=ckanext-geodatagov geodatagov db_solr_sync
Over time solr can get out of sync from db due to all kind of glitches. This brings them back in sync.
ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-collection.cfg*
This grabs top 20 tags from CKAN and put them into /etc/ckan/pycsw-collection.cfg as CSW service metadata keywords.
ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-all.cfg
This grabs top 20 tags from ckan and put them into /etc/ckan/pycsw-all.cfg as CSW service metadata keywords.
ckan --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw-all.cfg
Accesses CKAN api to load CKAN datasets into pycsw database.
/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py vacuumdb /etc/ckan/pycsw-all.cfg
Does vacuumdb job on pycsw database.
/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py reindex_fts /etc/ckan/pycsw-all.cfg
Rebuilds GIN index on pycsw records table to speed up full text search.
ckan --plugin=ckanext-geodatagov geodatagov combine-feeds
This gathers 20 pages of CKAN feeds from /feeds/dataset.atom and generates /usasearch-custom-feed.xml to feed USAsearch. USAsearch uses Bing index as backend which does not understand pagination in atom feeds.
ckan --plugin=ckanext-geodatagov geodatagov export-csv
This keeps records of all datasets that are tagged with Topic and Topic Categories, and generates /csv/topic_datasets.csv