Docker installation: how to customise and add extensions? #3649

florianm · 2017-06-29T01:51:15Z

CKAN Version if known (or site URL)

My fork of master (2.8.0a) built with docker-compose, incorporates #3651

docker-compose version 1.14.0, build c7bdf9e
docker-py version: 2.3.0
CPython version: 2.7.13
OpenSSL version: OpenSSL 1.0.1t  3 May 2016

docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:19:16 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:17:13 2017
 OS/Arch:      linux/amd64
 Experimental: false

Please describe the expected behaviour

A working, up to date docker-compose with custom variables, persistence, together with instructions targetted at non-expert software maintainers like me
Some explicit explanations to delineate CKAN from Docker from devops concerns. These are probably trivial to CKAN developers but hard to distinguish for first time maintainers.

Please describe the actual behaviour

Datacats works great, but is sadly not supported any longer.
CKAN docs are inconclusive (Docker install just links to datacats #2820)
Docker is now at v17.06 vs legacy CKAN docker docs were written using Docker v1.0.1
current docker-compose.yml is format v1

Solution

I've got a working docker-compose setup with docs/maintaing/install-from-docker.rst using my fork:

git clone git@github.com:parksandwlidlife/ckan.git
git checkout 3649-docker-upgrade

The text was updated successfully, but these errors were encountered:

wardi · 2017-06-29T15:04:00Z

@mattfullerton what do you think about the state of docker installation. Is it polished enough to include in the install docs?

mattfullerton · 2017-06-30T09:12:05Z

Yes, I think it is. And we should recommend and possibly only document (at this stage) install with docker-compose as people are almost always going to need the other services (solr/redis/postgres) and docker-compose is quite mature by now (even if we only are using a very early version of the config file format).

Documenting other situations (manual creation of containers and setting links, or having an external Postgres for example) I would hold off on but should also be documented.

florianm · 2017-06-30T09:27:06Z

@mattfullerton I've updated my working steps in the original issue above. Happy to contribute to docs if you could help me figure out the remaining steps.

florianm · 2017-07-06T09:27:46Z

@mattfullerton @wardi I'm experimenting in my fork at https://github.com/parksandwildlife/ckan/tree/3649-docker-upgrade (this includes PR #3651) but as I'm documenting above, docker-compose is a world of pain in a desert of confusing docs.
Update: docker-compose v3 kinda working. Some questions remaining.
Update: a few days' work left, looks all manageable now.

florianm · 2017-07-13T07:32:15Z

Update: I'm drafting docs for docker-compose here. Any feedback highly appreciated.
TODO: datapusher and extensions.
Update: datapusher works, thanks to Docker image provided by @clementmouchet

Vanuan · 2017-07-16T13:50:34Z

@florianm
Here's my feedback:

I would put installing docker, virtualenv and docker-compose out of this doc. An official way of installation is this:

curl -L https://github.com/docker/compose/releases/download/$dockerComposeVersion/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose

so virtualenv isn't needed.
So section 2 should be completely removed.

In the production environment sensitive settings should be extracted to docker secrets. So I would create a wrapper script which will put these settings into corresponding sections of the template
There's an option --build, so separate docker-compose build isn't necessary.
Restart containers a few times? This is a smell that there's something wrong with initialization. Maybe increase timeouts?
Instead of named volumes I would recommend mounting real directories. In production you'll need some kind of network storage anyway.
Customization, like enabling plugins and setting up an administrative user can be put to environment variables + a wrapper script
I feel that docker installation is more suitable for a ready-to-use fully customized CKAN instance rather than an instruction on how run a general non-customized instance. In that case it would only require 2 steps:

Docker part: Install docker, setup docker swarm, create/mount directories, set up secrets
CKAN part: docker deploy -c docker-compose.yml ckan

florianm · 2017-07-17T05:10:54Z

@Vanuan thanks for the detailed feedback!

I've pushed an update to the docs integrating your feedback.
Notably I've made some opinionated choices (which make perfect sense on my end), but mileage may vary -- I've added some notes to inform readers of rationale and other options.

re named volumes: As I'm already mounting /var/lib/docker to a backed up network drive, are there any other advantages of mounting local folders over using named volumes?

re secrets vs. .env: is an .env file insecure? would it work with docker swarm or are docker secrets the only way? Since docker compose supports sensitive settings via .env I assumed it was not too un-idiomatic to use .env. I might be wrong?

"Ready-to use and fully customized" sounds very good to me, but I'm cautious of hard-coding too many assumptions, like which extensions to add, into the build.
I would like to see docker-compose docs helping the target audience getting a vanilla CKAN up and running with less trouble than a source install, and being able to then experiment with extensions until a deployment candidate is found. You have a very good point in wanting to then automate all build steps. I've left a note that this would be a good point for contributions.

mattfullerton · 2017-07-17T06:49:02Z

@florianm Thanks for forging ahead with this, if there's anything you need from me, just ask (doesn't look like it though!)

florianm · 2017-07-17T07:44:25Z

@mattfullerton thanks! There are a few points I still need to get my head around - pointers welcome:

docker swarm vs docker compose
I'm trying to get my head around using CKAN in swarm mode. Currently I'm not discussing that at all in the PR for the docs.
My own setup is having a local machine to develop, and two EC2 VMs for uat/prod. The VMs sit behind my Department's firewall/proxy/auth. I have ssh access to both and can do my own deployments.
I'm not sure how docker swarm would fit into that picture.

the rest as mentioned above:

.env vs docker secrets
Is one significantly better than the other? Should I re-write docker-compose.yml to use secrets instead?
I used .env because I got it to work, used the concept of .env files in other projects, and docker secrets were yet another concept to wrap my head around. None of those points are valid arguments pro .env though.
local mounts vs named volumes
Here I went "the docker way" with named volumes as it appeared cleaner than mounting local folders. Note I mount the top level /var/lib/docker to network storage in my VMs (not on local machine though).
dropping the ping&wait loop from the entrypoint
In addition to mentioned reasons, this removed the need to keep db host/port/user/pw separately in addition to the complete sqlalchemy url. This removed double-handling of db credentials, but loses the ability to do the ping&wait loop. Is there any clean way to ping&wait for the db container using only the sqlalchemy url?
extensions impacting ckan install
Example ckanext-spatial: requires system-level packages installed (CKAN Dockerfile), postgis (different postgres FROM image), db updates (change ownership of postgis tables), paster commands (add db migration), .ini changes. Would it be ok to modify the CKAN and postgres Dockerfiles to cater for the most used ckan extensions?
update on 5: just pushed modifications required for ckanext-spatial to my related PR

Vanuan · 2017-07-17T12:34:16Z

re named volumes: As I'm already mounting /var/lib/docker to a backed up network drive, are there any other advantages of mounting local folders over using named volumes?

The rationale is the following:
When you have multiple servers (for high availability/fault tolerance), it is easier when all of them are homogeneous, i.e. you don't care which service is on which host. Since docker doesn't automatically migrate volumes between machines, you usually end up with some global filesystem which is reachable from all hosts in a cluster. AFAIK, "/var/lib/docker" can't be shared between multiple docker engines, it's unique for each machine. So network storage is a must. Another alternative is to hard-code all the services with persistence to corresponding machines which kind of impairs reliability.

re secrets vs. .env: is an .env file insecure? would it work with docker swarm or are docker secrets the only way? Since docker compose supports sensitive settings via .env I assumed it was not too un-idiomatic to use .env. I might be wrong?

The rationale is that you can include all the settings into the image except external secrets (as image is usually considered a public resource). The question is how container will access these secrets. As I described above, you can mount settings (.env) file from a network storage. That's fine. But in that case you must not forget that you shouldn't mount that file or a folder containing that file to other services. Otherwise those services can access those secrets. So secrets is a feature which enables you to store secrets in more secure and explicit way than in a plain-text file sitting on a network storage.

The complexity might not be worth it, it's just a suggestion.

"Ready-to use and fully customized" sounds very good to me, but I'm cautious of hard-coding too many assumptions, like which extensions to add, into the build.

I would like to see docker-compose docs helping the target audience getting a vanilla CKAN up and running with less trouble than a source install, and being able to then experiment with extensions until a deployment candidate is found. You have a very good point in wanting to then automate all build steps. I've left a note that this would be a good point for contributions.

I see. Yeah, I was impressed how simple it is to setup udata: https://github.com/opendatateam/docker-udata
It looks like they include everything by default rather than letting the user to cherry-pick the needed parts. The only thing that user chooses is which of the predefined configurations to use.

docker swarm vs docker compose

I use this rule of the thumb: if it's production, use docker swarm, if it's development, use docker compose. For development you'd want to mount source directory so that you can change it without rebuilding. There's also a third use case like "QA instance" when you don't want to change source code, but only want to test it. In that case docker compose is fine too.

Is one significantly better than the other? Should I re-write docker-compose.yml to use secrets instead?

No, of course it is not necessary, was just a suggestion.

Here I went "the docker way" with named volumes as it appeared cleaner than mounting local folders. Note I mount the top level /var/lib/docker to network storage in my VMs (not on local machine though).

That explains the choices you made. Unfortunately, production environment is still quite different than development. Named volumes were invented long before docker swarm, and it looks like they've abandoned that idea. The idea was that you can easily move named volumes from one host to another.
I couldn't find a claim that supports or denies this: https://docs.docker.com/engine/swarm/services/#give-a-service-access-to-volumes-or-bind-mounts

I.e. there's no documentation on what happens when a service with a named volume is moved to another host. It might be that in the future versions volumes would be automatically moved to where they're needed. But I can't imagine how can that be done quickly for gigabytes of data.

florianm · 2017-07-17T13:53:42Z

@Vanuan thanks for the detailed explanations, that makes a lot of sense to me. Let me digest those and see what I can factor in.

Vanuan · 2017-08-07T04:57:03Z

I've created this somewhat simplified production-oriented instruction and dockerfile: https://github.com/Vanuan/ckan-base

That is a level of simplicity I think would be great for production-like environments.

amercader assigned wardi Jun 29, 2017

florianm changed the title ~~Docker installation~~ Docker installation: how to customise and add extensions? Jul 7, 2017

florianm mentioned this issue Jul 14, 2017

3649 docker upgrade #3692

Merged

5 tasks

wardi closed this as completed in #3692 Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker installation: how to customise and add extensions? #3649

Docker installation: how to customise and add extensions? #3649

florianm commented Jun 29, 2017 •

edited

wardi commented Jun 29, 2017

mattfullerton commented Jun 30, 2017 •

edited

florianm commented Jun 30, 2017

florianm commented Jul 6, 2017 •

edited

florianm commented Jul 13, 2017 •

edited

Vanuan commented Jul 16, 2017 •

edited

florianm commented Jul 17, 2017

mattfullerton commented Jul 17, 2017

florianm commented Jul 17, 2017 •

edited

Vanuan commented Jul 17, 2017 •

edited

florianm commented Jul 17, 2017

Vanuan commented Aug 7, 2017

Docker installation: how to customise and add extensions? #3649

Docker installation: how to customise and add extensions? #3649

Comments

florianm commented Jun 29, 2017 • edited

CKAN Version if known (or site URL)

Please describe the expected behaviour

Please describe the actual behaviour

Solution

wardi commented Jun 29, 2017

mattfullerton commented Jun 30, 2017 • edited

florianm commented Jun 30, 2017

florianm commented Jul 6, 2017 • edited

florianm commented Jul 13, 2017 • edited

Vanuan commented Jul 16, 2017 • edited

florianm commented Jul 17, 2017

mattfullerton commented Jul 17, 2017

florianm commented Jul 17, 2017 • edited

Vanuan commented Jul 17, 2017 • edited

florianm commented Jul 17, 2017

Vanuan commented Aug 7, 2017

florianm commented Jun 29, 2017 •

edited

mattfullerton commented Jun 30, 2017 •

edited

florianm commented Jul 6, 2017 •

edited

florianm commented Jul 13, 2017 •

edited

Vanuan commented Jul 16, 2017 •

edited

florianm commented Jul 17, 2017 •

edited

Vanuan commented Jul 17, 2017 •

edited