Skip to content

Latest commit

 

History

History
480 lines (334 loc) · 19.9 KB

install-from-docker-compose.rst

File metadata and controls

480 lines (334 loc) · 19.9 KB

Installing CKAN with docker-compose

This chapter describes how to install the latest CKAN master with Docker Compose. The scenario shown here is one of many possible scenarios and environments in which CKAN can be used with Docker.

This chapter aims to provide a simple, yet fully customizable deployment - easier to configure than a source install, more customizable than a package install.

The discussed setup can be useful as a development / staging environment; additional care has to be taken to use the results in production.

Note

Some design decisions are opinionated (see notes), which does not mean that the alternatives are any worse. Some decisions may or may not be suitable for production scenarios, e.g. the use of CKAN master. Notably, this tutorial does not use Docker Swarm; additional steps may need to be taken to adapt the setup to use Docker Swarm.

1. Environment

In this tutorial, we will use a Ubuntu environment, which has been tested on:

  • Amazon AWS EC2 Ubuntu 14.04 LTS
  • Ubuntu 16.04 LTS (desktop)

The choice of cloud provider and operating system is arbitrary and shall by no means bias the reader.

Include specific instructions for other cloud providers and operating systems. Contributions welcome!

  1. Storage

Using a cloud based VM, external storage volumes are cheaper than VMs and easy to backup. In our use case, we use an AWS EC2 with 16 GB storage, have mounted a 100 GB btrfs-formatted external storage volume, and symlinked /var/lib/docker to the external volume. This allows us to store the bulky and precious cargo -- Docker images, Docker data volumes containing the CKAN databases, filestore, and config -- on a cheaper service. On the other hand, a snapshotting filesystem like btrfs is ideal for rolling backups. The same cost consideration might apply to other cloud-based providers.

Note

This setup stores data in named volumes, mapped to folder locations which can be networked or local storage. An alternative would be to re-write docker-compose.yml to map local storage directly, bypassing named volumes. Both solutions will save data to a specified location.

Further reading: Docker Volumes.

  1. Docker

Docker is installed system-wide following the official Docker CE installation guidelines.

To verify a successful Docker installation, run docker run hello-world. docker version should output versions for client and server.

  1. Docker Compose

Docker Compose is installed system-wide following the official Docker Compose installation guidelines.

To verify a successful Docker Compose installation, run docker-compose version.

  1. CKAN source

Clone CKAN into a directory of your choice:

cd /path/to/my/projects
git clone git@github.com:ckan/ckan.git .
# This will use the latest CKAN master.

# To use a stable version, checkout the respective tag, e.g.:
git checkout tags/ckan-2.6.2

Note

Using master may not be stable enough for production use.

2. Build Docker images

In this step we will build the Docker images and create Docker data volumes with user-defined, sensitive settings (e.g. for database passwords).

  1. Sensitive settings and environment variables

In a production environment, copy contrib/docker/.env.template to contrib/docker/.env and follow instructions within to set passwords and other sensitive or user-defined variables. The defaults will work fine in a development environment.

Note

Related reading:

Docker-compose .env file

Environment variables in Compose

Newcomers to Docker should read the excellent write-up on Docker variables by Vladislav Supalov (GitHub @th4t).

  1. Build the images

Inside the CKAN directory:

cd contrib/docker
docker-compose up -d --build

For the remainder of this chapter, we assume that docker-compose commands are all run inside contrib/docker, where docker-compose.yml and .env are located.

On first runs, the postgres container could need longer to initialize the database cluster than the ckan container will wait for. This time span depends heavily on available system resources. If the CKAN logs show problems connecting to the database, restart the ckan container a few times:

docker-compose restart ckan
docker ps | grep ckan
docker-compose logs -f ckan

Note

Earlier versions of ckan-entrypoint.sh used to wait and ping the db container using detailed db credentials (host, port, user, password). While this delay sometimes worked, it also obfuscated other possible problems. However, the db cluster needs to initialize only once, and starts up quickly in subsequent runs. This setup chose the very opinionated option of doing away with the delay altogether in favour of failing early.

After this step, CKAN should be running at CKAN_SITE_URL.

There should be five containers running (docker ps):

  • ckan: CKAN with standard extensions
  • db: CKAN's database, later also running CKAN's datastore database
  • redis: A pre-built Redis image.
  • solr: A pre-built SolR image set up for CKAN.
  • datapusher: A pre-built CKAN Datapusher image.

There should be four named Docker volumes (docker volume ls | grep docker). They will be prefixed with the Docker Compose project name (default: docker or value of host environment variable COMPOSE_PROJECT_NAME.)

  • docker_ckan_config: home of ckan.ini
  • docker_ckan_home: home of ckan venv and source, later also additional CKAN extensions
  • docker_ckan_storage: home of CKAN's filestore (resource files)
  • docker_pg_data: home of the database files for CKAN's default and datastore databases

The location of these named volumes needs to be backed up in a production environment. To migrate CKAN data between different hosts, simply transfer the content of the named volumes. A detailed use case of data transfer will be discussed in step 5.

3. Datastore and datapusher

To enable the datastore, the datastore database and database users have to be created before enabling the datastore and datapusher settings in the ckan.ini.

  1. Create and configure datastore database

With running CKAN containers, execute the built-in setup scripts against the db container:

docker exec -it db psql -U ckan -f 00_create_datastore.sql
docker exec -it db psql -U ckan -f 10_set_permissions.sql

The first script will create the datastore database and the datastore readonly user in the db container. The second script is the output of paster ckan set-permissions. The effect of these scripts is persisted in the named volume docker_pg_data.

Note

We re-use the already privileged default user of the CKAN database as read/write user for the datastore. The database user (ckan) is hard-coded, the password is supplied through the.env variable POSTGRES_PASSWORD. A new user datastore_ro is created (and also hard-coded) as readonly user with password DATASTORE_READONLY_USER. Hard-coding the database table and usernames allows to prepare the set-permissions SQL script, while not exposing sensitive information to the world outside the Docker host environment.

After this step, the datastore database is ready to be enabled in the ckan.ini.

  1. Enable datastore and datapusher in ckan.ini

Find the path to the ckan.ini within the named volume:

docker volume inspect docker_ckan_home | jq -c '.[] | .Mountpoint'

# "/var/lib/docker/volumes/docker_ckan_config/_data"

# Convenience: set named volumes as env variables on host.
export VOL_CKAN_HOME=`docker volume inspect docker_ckan_home | jq -r -c '.[] | .Mountpoint'`
echo $VOL_CKAN_HOME

export VOL_CKAN_CONFIG=`docker volume inspect docker_ckan_config | jq -r -c '.[] | .Mountpoint'`
echo $VOL_CKAN_CONFIG

Note

We export the folder locations of data inside named volumes as environment variables. We use a prefix VOL_ to avoid overriding variables in docker-compose.yml.

Edit the ckan.ini (note: requires sudo):

sudo vim /var/lib/docker/volumes/docker_ckan_config/_data/ckan.ini

Add datastore datapusher to ckan.plugins and enable the datapusher option ckan.datapusher.formats.

The remaining settings required for datastore and datapusher are already taken care of:

  • ckan.storage_path (/var/lib/ckan) is hard-coded in ckan-entrypoint.sh, docker-compose.yml and CKAN's Dockerfile. This path is hard-coded as it remains internal to the containers, and changing it would have no effect on the host system.
  • ckan.datastore.write_url = postgresql://ckan:POSTGRES_PASSWORD@db/datastore and ckan.datastore.read_url = postgresql://datastore:DATASTORE_READONLY_PASSWORD@db/datastore are provided by docker-compose.yml.

Restart the ckan container to apply changes to the ckan.ini:

docker-compose restart ckan

Now the datastore API should return content when visiting:

CKAN_SITE_URL/api/3/action/datastore_search?resource_id=_table_metadata

4. Create CKAN admin user

With all four Docker images up and running, create the CKAN admin user (johndoe in this example):

docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan sysadmin -c /etc/ckan/ckan.ini add johndoe

Now you should be able to login to the new, empty CKAN. The admin user's API key will be instrumental in tranferring data from other instances.

5. Migrate data

This section illustrates the data migration from an existing CKAN instance assuming direct access to the target host.

  1. Transfer resource files

Assuming the CKAN storage directory on SOURCE_CKAN is located at /path/to/files (containing resource files and uploaded images in resources and storage), we'll simply rsync those into the named volume docker_ckan_storage:

export VOL_CKAN_STORAGE=`docker volume inspect docker_ckan_storage | jq -r -c '.[] | .Mountpoint'`
sudo rsync -Pavvr USER@SOURCE_CKAN:/path/to/files/ $VOL_CKAN_STORAGE
  1. Transfer users

Users could be exported using the python package ckanapi, but their password hashes will be excluded. To transfer users preserving their passwords, we need to dump and restore the user table.

On source CKAN host with access to source db ckan_default, export the user table:

pg_dump -h CKAN_DBHOST -P CKAN_DBPORT -U CKAN_DBUSER -a -O -t user -f user.sql ckan_default

On the target host, make user.sql accessible to the source CKAN container. Transfer user.sql into the named volume $CKAN_HOME and chown it to the docker user:

rsync -Pavvr user@ckan-source-host:/path/to/user.sql $VOL_CKAN_HOME/venv/src

# $VOL_CKAN_HOME is owned by the user "ckan" (UID 900) created in the CKAN Dockerfile
sudo ls -l $VOL_CKAN_HOME
# drwxr-xr-x 1 900 900 62 Jul 17 16:13 venv

# Chown user.sql to the owner of $CKAN_HOME (ckan, UID 900)
sudo chown 900:900 $VOL_CKAN_HOME/venv/src/user.sql

Now the file user.sql is accessible from within the ckan container:

docker exec -it ckan bash

ckan@eca111c06788:/$ psql -U ckan -h db -f $CKAN_VENV/src/user.sql
  1. Export and upload groups, orgs, datasets

Using the python package ckanapi we will dump orgs, groups and datasets from the source CKAN instance, then use ckanapi to load the exported data into the target instance. The datapusher will automatically ingest CSV resources into the datastore.

  1. Rebuild search index

Trigger a Solr index rebuild:

docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan search-index rebuild -c /etc/ckan/ckan.ini

6. Add extensions

There are two scenarios to add extensions:

  • Maintainers of production instances need extensions to be part of the ckan image and an easy way to enable them in the ckan.ini. Automating the installation of existing extensions (without needing to change their source) requires customizing CKAN's Dockerfile and scripted post-processing of the ckan.ini.
  • Developers need to read, modify and use version control on the extensions' source.

For maintainers, the process is:

  • Run a bash shell inside the running ckan container, download and install extension. Alternative: Insert the pip install step into a custom CKAN Dockerfile.
  • Edit ckan.ini. Alternative: use ckanext-envvars to configure the ckan.ini using environment variables, which can be inserted into docker-compose via .env.
  • Restart ckan service, read logs.
  1. Download and install extension from inside ckan container into docker_ckan_home volume

The process is very similar to installing extensions in a source install. The only difference is that the installation steps happen inside the running container, using the virtualenv created inside the ckan image by CKAN's Dockerfile, which is different from the virtualenv we have created on the host machine in step 2.

The downloaded and installed files will be persisted in the named volume docker_ckan_home.

In this example we'll enter the running ckan container to install ckanext-geoview from source, ckanext-showcase from GitHub, and ckanext-envvars from PyPi:

# Enter the running ckan container:
docker exec -it ckan bash

# Inside the running container, activate the virtualenv
source $CKAN_VENV/bin/activate && cd $CKAN_VENV/src/

# Option 1: From source
git clone https://github.com/ckan/ckanext-geoview.git
cd ckanext-geoview
pip install -r pip-requirements.txt
python setup.py install
python setup.py develop

# Option 2: Pip install from GitHub
pip install -e "git+https://github.com/ckan/ckanext-showcase.git#egg=ckanext-showcase"

# Option 3: Pip install from PyPi
pip install ckanext-envvars

# exit the ckan container:
exit

Some extensions require database upgrades, often through paster scripts. E.g., ckanext-spatial:

# Enter the running ckan container:
docker exec -it ckan bash

# Inside the running ckan container
source $CKAN_VENV/bin/activate && cd $CKAN_VENV/src/
git clone https://github.com/ckan/ckanext-spatial.git
cd ckanext-spatial
pip install -r pip-requirements.txt
python setup.py install && python setup.py develop
exit

# On the host
docker exec -it db psql -U ckan -f 20_postgis_permissions.sql
docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckanext-spatial spatial initdb -c /etc/ckan/ckan.ini

sudo vim $VOL_CKAN_CONFIG/ckan.ini

# Inside ckan.ini, add to [plugins]:
spatial_metadata spatial_query

ckanext.spatial.search_backend = solr
  1. Modify CKAN config

Follow the respective extension's instructions to set CKAN config variables:

sudo vim $VOL_CKAN_CONFIG/ckan.ini

Demonstrate how to set ckan.ini settings from environment variables using ckanext-envvars.

  1. Reload and debug
docker-compose restart ckan
docker-compose logs ckan
  1. Develop extensions: modify source, install, use version control

While maintainers will prefer to use stable versions of existing extensions, developers of extensions will need access to the extensions' source, and be able to use version control.

The use of Docker and the inherent encapsulation of files and processes makes the development of extensions harder than a CKAN source install.

Since we have chosen to use named volumes instead of mounted host folders, we have to make the write-protected volumes accessible to a system user. The Ubuntu package bindfs helps here:

sudo apt-get install bindfs
mkdir ~/VOL_CKAN_HOME
sudo chown -R `whoami`:docker $VOL_CKAN_HOME
sudo bindfs --map=900/`whoami` $VOL_CKAN_HOME ~/VOL_CKAN_HOME

cd ~/VOL_CKAN_HOME/venv/src

# Do this with your own extension fork
# Assumption: the host user running git clone (you) has write access to the repository
git clone git@github.com:parksandwildlife/ckanext-datawagovautheme.git

# ... change files, use version control...

Changes in templates and CSS will be visible right away. For changes in code, we'll need to unmount the directory, change ownership back to the ckan user, and follow the previous steps to python setup.py install and pip install -r requirements.txt from within the running container, modify the ckan.ini and restart the container:

sudo umount ~/VOL_CKAN_HOME
sudo chown -R 900:900 $VOL_CKAN_HOME
# Follow steps a-c

Note

Mounting host folders as volumes instead of using named volumes may result in a simpler development workflow.

7. Environment variables

Sensitive settings can be managed in (at least) two ways, either as environment variables, or as Docker secrets. This section illustrates the use of environment variables provided by the docker-compose .env file.

This section is targeted at CKAN maintainers seeking a deeper understanding of variables, and at CKAN developers seeking to factor out settings as new .env variables.

Variable substitution propagates as follows:

  • .env.template holds the defaults and the usage instructions for variables.
  • The maintainer copies .env from .env.template and modifies it following the instructions.
  • Docker Compose interpolates variables in docker-compose.yml from .env.
  • Docker Compose can pass on these variables to the containers as build time variables (when building the images) and / or as run time variables (when running the containers).
  • ckan-entrypoint.sh has access to all run time variables of the ckan service.
  • ckan-entrypoint.sh injects environment variables (e.g. CKAN_SQLALCHEMY_URL) into the running ckan container, overriding the CKAN config variables from ckan.ini.

See /maintaining/configuration for a list of environment variables (e.g. CKAN_SQLALCHEMY_URL) which CKAN will accept to override ckan.ini.

After adding new or changing existing .env variables, locally built images and volumes may need to be dropped and rebuilt. Otherwise, docker will re-use cached images with old or missing variables:

docker-compose down
docker-compose up -d --build

# if that didn't work, try:
docker rmi $(docker images -f dangling=true -q)
docker-compose up -d --build

# if that didn't work, try:
docker rmi $(docker images -f dangling=true -q)
docker volume rm $(docker volume ls -f dangling=true -q)
docker-compose up -d --build

Warning

Removing named volumes will destroy data. Backup all data when doing this in a production setting.

8. Steps towards production

As mentioned above, some design decisions may not be suitable for a production setup.

A possible path towards a production-ready environment is:

  • Use the above setup to build docker images.
  • Add and configure extensions.
  • Make sure that no sensitive settings are hard-coded inside the images.
  • Push the images to a docker repository.
  • Build a separate "production" docker-compose.yml which uses the custom built images.
  • Run the "production" docker-compose.yml on the production server with appropriate settings.
  • Transfer production data into the new server as described above.
  • Bonus: contribute a write-up of working production setups to the CKAN documentation.