Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker installation: how to customise and add extensions? #3649

Closed
florianm opened this issue Jun 29, 2017 · 12 comments · Fixed by #3692
Closed

Docker installation: how to customise and add extensions? #3649

florianm opened this issue Jun 29, 2017 · 12 comments · Fixed by #3692
Assignees

Comments

@florianm
Copy link
Contributor

florianm commented Jun 29, 2017

CKAN Version if known (or site URL)

My fork of master (2.8.0a) built with docker-compose, incorporates #3651

docker-compose version 1.14.0, build c7bdf9e
docker-py version: 2.3.0
CPython version: 2.7.13
OpenSSL version: OpenSSL 1.0.1t  3 May 2016

docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:19:16 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:17:13 2017
 OS/Arch:      linux/amd64
 Experimental: false

Please describe the expected behaviour

  • A working, up to date docker-compose with custom variables, persistence, together with instructions targetted at non-expert software maintainers like me
  • Some explicit explanations to delineate CKAN from Docker from devops concerns. These are probably trivial to CKAN developers but hard to distinguish for first time maintainers.

Please describe the actual behaviour

Solution

I've got a working docker-compose setup with docs/maintaing/install-from-docker.rst using my fork:

  • git clone git@github.com:parksandwlidlife/ckan.git
  • git checkout 3649-docker-upgrade
@wardi
Copy link
Contributor

wardi commented Jun 29, 2017

@mattfullerton what do you think about the state of docker installation. Is it polished enough to include in the install docs?

@mattfullerton
Copy link
Contributor

mattfullerton commented Jun 30, 2017

Yes, I think it is. And we should recommend and possibly only document (at this stage) install with docker-compose as people are almost always going to need the other services (solr/redis/postgres) and docker-compose is quite mature by now (even if we only are using a very early version of the config file format).

Documenting other situations (manual creation of containers and setting links, or having an external Postgres for example) I would hold off on but should also be documented.

@florianm
Copy link
Contributor Author

@mattfullerton I've updated my working steps in the original issue above. Happy to contribute to docs if you could help me figure out the remaining steps.

@florianm
Copy link
Contributor Author

florianm commented Jul 6, 2017

@mattfullerton @wardi I'm experimenting in my fork at https://github.com/parksandwildlife/ckan/tree/3649-docker-upgrade (this includes PR #3651) but as I'm documenting above, docker-compose is a world of pain in a desert of confusing docs.
Update: docker-compose v3 kinda working. Some questions remaining.
Update: a few days' work left, looks all manageable now.

@florianm florianm changed the title Docker installation Docker installation: how to customise and add extensions? Jul 7, 2017
@florianm
Copy link
Contributor Author

florianm commented Jul 13, 2017

Update: I'm drafting docs for docker-compose here. Any feedback highly appreciated.
TODO: datapusher and extensions.
Update: datapusher works, thanks to Docker image provided by @clementmouchet

@florianm florianm mentioned this issue Jul 14, 2017
5 tasks
@Vanuan
Copy link

Vanuan commented Jul 16, 2017

@florianm
Here's my feedback:

  1. I would put installing docker, virtualenv and docker-compose out of this doc. An official way of installation is this:
curl -L https://github.com/docker/compose/releases/download/$dockerComposeVersion/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose

so virtualenv isn't needed.
So section 2 should be completely removed.

  1. In the production environment sensitive settings should be extracted to docker secrets. So I would create a wrapper script which will put these settings into corresponding sections of the template

  2. There's an option --build, so separate docker-compose build isn't necessary.

  3. Restart containers a few times? This is a smell that there's something wrong with initialization. Maybe increase timeouts?

  4. Instead of named volumes I would recommend mounting real directories. In production you'll need some kind of network storage anyway.

  5. Customization, like enabling plugins and setting up an administrative user can be put to environment variables + a wrapper script

  6. I feel that docker installation is more suitable for a ready-to-use fully customized CKAN instance rather than an instruction on how run a general non-customized instance. In that case it would only require 2 steps:

  • Docker part: Install docker, setup docker swarm, create/mount directories, set up secrets
  • CKAN part: docker deploy -c docker-compose.yml ckan

@florianm
Copy link
Contributor Author

@Vanuan thanks for the detailed feedback!

I've pushed an update to the docs integrating your feedback.
Notably I've made some opinionated choices (which make perfect sense on my end), but mileage may vary -- I've added some notes to inform readers of rationale and other options.

re named volumes: As I'm already mounting /var/lib/docker to a backed up network drive, are there any other advantages of mounting local folders over using named volumes?

re secrets vs. .env: is an .env file insecure? would it work with docker swarm or are docker secrets the only way? Since docker compose supports sensitive settings via .env I assumed it was not too un-idiomatic to use .env. I might be wrong?

"Ready-to use and fully customized" sounds very good to me, but I'm cautious of hard-coding too many assumptions, like which extensions to add, into the build.
I would like to see docker-compose docs helping the target audience getting a vanilla CKAN up and running with less trouble than a source install, and being able to then experiment with extensions until a deployment candidate is found. You have a very good point in wanting to then automate all build steps. I've left a note that this would be a good point for contributions.

@mattfullerton
Copy link
Contributor

@florianm Thanks for forging ahead with this, if there's anything you need from me, just ask (doesn't look like it though!)

@florianm
Copy link
Contributor Author

florianm commented Jul 17, 2017

@mattfullerton thanks! There are a few points I still need to get my head around - pointers welcome:

  1. docker swarm vs docker compose
    I'm trying to get my head around using CKAN in swarm mode. Currently I'm not discussing that at all in the PR for the docs.
    My own setup is having a local machine to develop, and two EC2 VMs for uat/prod. The VMs sit behind my Department's firewall/proxy/auth. I have ssh access to both and can do my own deployments.
    I'm not sure how docker swarm would fit into that picture.

the rest as mentioned above:

  1. .env vs docker secrets
    Is one significantly better than the other? Should I re-write docker-compose.yml to use secrets instead?
    I used .env because I got it to work, used the concept of .env files in other projects, and docker secrets were yet another concept to wrap my head around. None of those points are valid arguments pro .env though.

  2. local mounts vs named volumes
    Here I went "the docker way" with named volumes as it appeared cleaner than mounting local folders. Note I mount the top level /var/lib/docker to network storage in my VMs (not on local machine though).

  3. dropping the ping&wait loop from the entrypoint
    In addition to mentioned reasons, this removed the need to keep db host/port/user/pw separately in addition to the complete sqlalchemy url. This removed double-handling of db credentials, but loses the ability to do the ping&wait loop. Is there any clean way to ping&wait for the db container using only the sqlalchemy url?

  4. extensions impacting ckan install
    Example ckanext-spatial: requires system-level packages installed (CKAN Dockerfile), postgis (different postgres FROM image), db updates (change ownership of postgis tables), paster commands (add db migration), .ini changes. Would it be ok to modify the CKAN and postgres Dockerfiles to cater for the most used ckan extensions?
    update on 5: just pushed modifications required for ckanext-spatial to my related PR

@Vanuan
Copy link

Vanuan commented Jul 17, 2017

re named volumes: As I'm already mounting /var/lib/docker to a backed up network drive, are there any other advantages of mounting local folders over using named volumes?

The rationale is the following:
When you have multiple servers (for high availability/fault tolerance), it is easier when all of them are homogeneous, i.e. you don't care which service is on which host. Since docker doesn't automatically migrate volumes between machines, you usually end up with some global filesystem which is reachable from all hosts in a cluster. AFAIK, "/var/lib/docker" can't be shared between multiple docker engines, it's unique for each machine. So network storage is a must. Another alternative is to hard-code all the services with persistence to corresponding machines which kind of impairs reliability.

re secrets vs. .env: is an .env file insecure? would it work with docker swarm or are docker secrets the only way? Since docker compose supports sensitive settings via .env I assumed it was not too un-idiomatic to use .env. I might be wrong?

The rationale is that you can include all the settings into the image except external secrets (as image is usually considered a public resource). The question is how container will access these secrets. As I described above, you can mount settings (.env) file from a network storage. That's fine. But in that case you must not forget that you shouldn't mount that file or a folder containing that file to other services. Otherwise those services can access those secrets. So secrets is a feature which enables you to store secrets in more secure and explicit way than in a plain-text file sitting on a network storage.

The complexity might not be worth it, it's just a suggestion.

"Ready-to use and fully customized" sounds very good to me, but I'm cautious of hard-coding too many assumptions, like which extensions to add, into the build.

I would like to see docker-compose docs helping the target audience getting a vanilla CKAN up and running with less trouble than a source install, and being able to then experiment with extensions until a deployment candidate is found. You have a very good point in wanting to then automate all build steps. I've left a note that this would be a good point for contributions.

I see. Yeah, I was impressed how simple it is to setup udata: https://github.com/opendatateam/docker-udata
It looks like they include everything by default rather than letting the user to cherry-pick the needed parts. The only thing that user chooses is which of the predefined configurations to use.

docker swarm vs docker compose

I use this rule of the thumb: if it's production, use docker swarm, if it's development, use docker compose. For development you'd want to mount source directory so that you can change it without rebuilding. There's also a third use case like "QA instance" when you don't want to change source code, but only want to test it. In that case docker compose is fine too.

Is one significantly better than the other? Should I re-write docker-compose.yml to use secrets instead?

No, of course it is not necessary, was just a suggestion.

Here I went "the docker way" with named volumes as it appeared cleaner than mounting local folders. Note I mount the top level /var/lib/docker to network storage in my VMs (not on local machine though).

That explains the choices you made. Unfortunately, production environment is still quite different than development. Named volumes were invented long before docker swarm, and it looks like they've abandoned that idea. The idea was that you can easily move named volumes from one host to another.
I couldn't find a claim that supports or denies this: https://docs.docker.com/engine/swarm/services/#give-a-service-access-to-volumes-or-bind-mounts

I.e. there's no documentation on what happens when a service with a named volume is moved to another host. It might be that in the future versions volumes would be automatically moved to where they're needed. But I can't imagine how can that be done quickly for gigabytes of data.

@florianm
Copy link
Contributor Author

@Vanuan thanks for the detailed explanations, that makes a lot of sense to me. Let me digest those and see what I can factor in.

@Vanuan
Copy link

Vanuan commented Aug 7, 2017

I've created this somewhat simplified production-oriented instruction and dockerfile: https://github.com/Vanuan/ckan-base

That is a level of simplicity I think would be great for production-like environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants