This section describes how to install CKAN with docker-compose. The scenario shown here is one of many possibile scenarios and environments in which CKAN can be used with Docker.
We will use a Ubuntu environment, so far this has been tested on:
- Amazon AWS EC2 Ubuntu 14.04 LTS
- Ubuntu 16.04 LTS on a desktop
Docker is installed system-wide following the official Docker CE installation guidelines.
Tip
Using a cloud based VM, external storage volumes are cheaper than VMs and easy to backup. In our use case, we use an EC2 with 16 GB storage, have mounted a 100 GB btrfs-formatted external storage volume, and symlinked /var/lib/docker
to the external volume. This allows us to store the bulky and precious cargo -- Docker images, Docker data volumes containing the CKAN databases, filestore, and config -- on a cheaper service. On the other hand, a snapshotting filesystem like btrfs is ideal for backups, and it is easy to maintain rolling backups. The same cost consideration might apply to other cloud-based providers.
To verify a successful Docker installation, run docker run hello-world
. docker version
should output versions for client and server.
In this step, we will create a virtualenv, clone the CKAN repository and install docker-compose.
- virtualenvwrapper
While not essential, virtualenvwrapper provides convenience methods to manage virtualenvs.
If using virtulenvwrapper, append the following lines (using your preferred and existing locations for virtualenvs and projects) to ~/.bashrc
:
export WORKON_HOME=$HOME/.venvs
export PROJECT_HOME=/path/to/projects
source /usr/local/bin/virtualenvwrapper.sh
export PIP_VIRTUALENV_BASE=WORKON_HOME
- Create virtualenv for CKAN
If using virtualenvwrapper, create a virtualenv for CKAN with mkproject ckan
, else follow the virtulenv setup instructions in install-from-source
.
To verify a successful virtualenvwrapper setup for CKAN, check that workon ckan
will jump you into the CKAN project directory (/path/to/projects/ckan
in our example) and activate the CKAN virtualenv (~/.venvs/ckan
).
- Clone CKAN source
Clone CKAN into the activated virtualenv:
workon ckan
git clone git@github.com:ckan/ckan.git .
- Install docker-compose
In the activated virtualenv, install and verify docker-compose using pip:
workon ckan
pip install docker-compose
docker-compose version
In this step we will build the Docker images and create Docker data volumes with user-defined, sensitive settings (e.g. for database passwords).
- Sensitive settings
In a production environment, copy contrib/docker/.env.template
to contrib/docker/.env
and follow instructions within to set passwords and other sensitive or user-defined variables. The very unimaginative defaults will work fine in a development environment.
- Build the images
With an activated virtualenv:
workon ckan
cd contrib/docker
docker-compose build
docker-compose up -d
For the remainder of this chapter, we assume that docker-compose commands are all run inside contrib/docker
with the ckan
virtualenv activated.
On first runs, the postgres container could need longer to initialise the database cluster than the ckan container will wait for. In this case, simply restart the ckan container a few times:
docker-compose restart ckan
docker ps | grep ckan
docker-compose logs -f ckan
After this step, CKAN should be running at CKAN_SITE_URL
. There should be four containers running (docker ps
)
ckan
: CKAN with standard extensionsdb
: CKAN's database, later also running CKAN's datastore databaseredis
solr
There should be four named Docker volumes (docker volume ls | grep docker
). They will be prefixed with the Docker project name (default: docker
)
docker_ckan_config
: home of ckan.inidocker_ckan_home
: home of ckan venv and source, later also additional CKAN extensionsdocker_ckan_storage
: home of CKAN's filestore (resource files)docker_pg_data
: home of the database files for CKAN's default and datapusher databases
The location of these named volumes need to be backed up in a production environment. To migrate CKAN data between different hosts, simply transfer the content of the named volumes. A detailed use case of data transfer will be discussed later.
To enable the datastore, the datastore database and database users have to be created before enabling the datastore and datapusher settings in the ckan.ini
.
- Create and configure datastore database
With running CKAN containers, execute two prepared scripts against the db
container:
docker exec -it db psql -U ckan -f 00_create_datastore.sql
docker exec -it db psql -U ckan -f 10_set_permissions.sql
The first script will create the datastore database and the datastore readonly user in the db
container. The second script is the output of paster ckan set-permissions
. The effect of these scripts is persisted in the named volume docker_pg_data
.
Note
We re-use the already privileged default user of the CKAN database as read/write user for the datastore. The database user (ckan
) is hard-coded, the password is supplied through the.env
variable POSTGRES_PASSWORD
. A new user datastore_ro
is created (and also hard-coded) as readonly user with password DATASTORE_READONLY_USER
. Hard-coding the database table and usernames allows to prepare the set-permissions SQL script, while not exposing sensitive information to the world outside the Docker host environment.
After this step, the datastore database is ready to be enabled in the ckan.ini
.
- Enable datastore and datapusher in
ckan.ini
Find the path to the ckan.ini
within the named volume:
docker volume inspect docker_ckan_config | grep Mountpoint
# "Mountpoint": "/var/lib/docker/volumes/docker_ckan_config/_data",
Edit the ckan.ini
(note: requires sudo):
sudo vim /var/lib/docker/volumes/docker_ckan_config/_data/ckan.ini
# With your own POSTGRES_PASSWORD and DATASTORE_READONLY_USER
ckan.datastore.write_url = postgresql://ckan:POSTGRES_PASSWORD@db/datastore
ckan.datastore.read_url = postgresql://datastore:DATASTORE_READONLY_PASSWORD@db/datastore
# add datastore and datapusher to plugins
# enable datapusher options
Restart the ckan
container to apply changes to the ckan.ini
:
docker-compose restart ckan
With all four Docker images up and running, create the CKAN admin user (johndoe in this example):
docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan sysadmin -c /etc/ckan/default/ckan.ini add johndoe
Now you should be able to login to the new, empty CKAN. The admin user's API key will be instrumental in tranferring data from other instances.
This section is a stub. Pending testing, and presuming same dataset schema (ckanext-scheming) on source and target CKANs, the process should be as simple as:
rsync
contents of the filestore into the storage volume.- Use python package
ckanapi
to dump orgs, groups and datasets from source CKAN instance, then to load into new instance. - Let datapusher populate datastore.
- Trigger a Solr index rebuild.
This section is a stub.
There are two use cases how to add extensions:
- Developers will want to have access to the extensions' source.
- Maintainers of production instances will want extensions to be part of the
ckan
image and an easy way to enable them in theckan.ini
.
This section is targetted at the CKAN developer seeking to factor out settings as new .env
variables.
The flow of variable substitution is as follows:
.env.template
holds the defaults and the usage instructions for variables..env
is copied and modified from.env.template
with values chosen by the maintainer.- docker-compose interpolates variables in
docker-compose.yml
from.env
. - docker-compose can pass on these variables to the containers as build time variables (when building the images) and / or as run time variables (when running the containers).
ckan-entrypoint.sh
has access to all run time variables of theckan
service.ckan-entrypoint.sh
injects some variables (e.g.CKAN_SQLALCHEMY_URL
) into the runningckan
container, supplementing the CKAN config variables fromckan.ini
.
After adding new, or changing the value of existing .env
variables, locally built images and volumes need to be dropped and rebuilt. Otherwise, docker will re-use cached images with old or without variables:
docker-compose down
docker rmi $(docker images -f dangling=true -q)
docker volume rm $(docker volume ls -f dangling=true -q)
docker-compose build
docker-compose up -d
Warning
Removing named volumes will destroy data. Backup all data when doing this in a production setting.