CKAN commands

System administrator accounts

To create a system administrator account for CKAN, the user must first exist in the database by first logging into CKAN (through MAX.gov). Users must request access by following the Account Procedures first. This command should be run from one of the harvesters, e.g. catalog-harvester2p.

$ sudo ckan sysadmin add <email-address>

Catalog commands

Harvester commands

All harvester commands should be run from one of the harvesters, usually catalog-harvester1p.

Harvest run

The harvest run command runs every few minutes to manage pending and in-progress harvest jobs. It will (not necessarily in this order):

Queue jobs that have been scheduled
Starts jobs that have been queued
Clean up jobs that have completed or errored
Email job results to points of contact

Run the job through supervisor.

$ sudo supervisorctl start harvest-run

The job is logged to /var/log/ckan/harvest-run.log.

Other commands

ckan --plugin=ckanext-geodatagov geodatagov harvest-job-cleanup

Harvest jobs can get stuck at Running state and stay that way forever. This will reset them and fix any harvest object issues they cause.

ckan --plugin=ckanext-qa qa update_sel

Start QA analysis on all datasets whose 'last modified timestamp' is >= timestamp embedded in the following file: /var/log/qa-metadata-modified.log

ckan --plugin=ckanext-qa qa collect-ids && ckan --plugin=ckanext-qa qa update

Compare to qa update_sel, this qa update will run analysis on ALL datasets. It will take loooooooong to finish.

ckan --plugin=ckanext-geodatagov geodatagov clean-deleted

CKAN keeps deleted package in the DB. This clean command makes sure they are really gone.

ckan tracking update

This needs to be run periodically in order to run analysis on raw data and generate summarized page view tracking data that ckan/solr can use.

ckan --plugin=ckanext-report report generate

This generates /report/broken-links page showing broken link statistics for dataset resources by organization.

ckan --plugin=ckanext-geodatagov geodatagov db_solr_sync

Over time solr can get out of sync from db due to all kind of glitches. This brings them back in sync.

ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-collection.cfg*

This grabs top 20 tags from CKAN and put them into /etc/ckan/pycsw-collection.cfg as CSW service metadata keywords.

ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-all.cfg

This grabs top 20 tags from ckan and put them into /etc/ckan/pycsw-all.cfg as CSW service metadata keywords.

ckan --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw-all.cfg

Accesses CKAN api to load CKAN datasets into pycsw database.

/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py vacuumdb /etc/ckan/pycsw-all.cfg

Does vacuumdb job on pycsw database.

/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py reindex_fts /etc/ckan/pycsw-all.cfg

Rebuilds GIN index on pycsw records table to speed up full text search.

ckan --plugin=ckanext-geodatagov geodatagov combine-feeds

This gathers 20 pages of CKAN feeds from /feeds/dataset.atom and generates /usasearch-custom-feed.xml to feed USAsearch. USAsearch uses Bing index as backend which does not understand pagination in atom feeds.

ckan --plugin=ckanext-geodatagov geodatagov export-csv

This keeps records of all datasets that are tagged with Topic and Topic Categories, and generates /csv/topic_datasets.csv

Uh oh!

CKAN commands

System administrator accounts

Catalog commands

Harvester commands

Harvest run

Other commands

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally