Separate docker-compose files for test run and persistent db #37

datarttu · 2021-04-08T15:57:54Z

This PR creates docker-compose files for two different cases:

docker-compose.test.yml: One-off test run of launching a database instance, running the DDL commands and importing example data from scratch -> developer can see if all this runs without errors. No database data on volumes is left hanging around.
docker-compose.db.yml: Launching a database instance for more continuous use. This version uses a separate db_volume Docker volume to store the database data even if the services are removed and restarted later. The database is also made accessible vie localhost. This can be used for exploring with psql or QGIS, for example.

Later on, a separate docker-compose file should probably be created for production use too, but for now we will do well with these two.

There are still some problems and todos with this PR, I will comment them in more detail.

For some reason 'timescaledb' will not be written to shared_preload_libraries of postgresql.conf if 001_timescaledb_tune.sh is copied to /docker-entrypoint-initdb.d/ by Dockerfile, but it works if the sh script is brought there by mapping it from db_test/.

depends_on does not work like we would like to: v3 does not support condition: ... mapping under depends_on.

This instance will be accessible through POSTGRES_LOCAL_PORT and can be used for exploration, e.g. with QGIS.

Document test run and persistent db separately.

datarttu · 2021-04-08T16:14:47Z

I was not able to include condition: service_healthy under depends_on of the dataimporter service. Apparently this is no longer supported in docker-compose file v3. Using v2 caused a whole lot of other problems, so I wanted to stick to v3. To prevent dataimporter from running the imports before the db service has run the DDL commands, I simply postponed the command: command: sh -c "sleep 5 && psql -f /data/import.sql". But this is not a neat way to do it...
It would be nice to run timescaledb-tune straight away when the image is built, but it is not possible because the /var/lib/postgresql/data directory does not even exist before the container is run for the first time and docker-entrypoint.sh is run. So now the TimescaleDB configuration is done and the database is restarted every time the container is started.
Originally, 001_timescaledb_tune.sh was copied to /docker-entrypoint-initdb.d already in the Dockerfile. It was then run there before other scripts brought there by the volume mapping in docker-compose files. But this eventually resulted in timescaledb not existing in postgresql.conf shared_preload_libraries, which in turn caused CREATE EXTENSION timescaledb to fail, and I could not figure out why. Now that 001_timescaledb_tune.sh is mapped from db/db_test and run like the DDL scripts there, the TimescaleDB extension seems to load without problems...
As for the "persistent" version of the database, the dataimporter service fails if the database has already been hydrated, i.e. the db service is started with a pre-existing db_volume. Of course, in that case there is no need to re-import the data that is already in the database. Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?

Works well with docker-compose version 1.29.0 now.

datarttu · 2021-04-12T14:18:46Z

Added a commit that implements condition: service_healthy, the problem was solved by upgrading docker-compose to version 1.29.0.

haphut · 2021-04-16T08:21:23Z

Great stuff!

haphut · 2021-04-16T08:22:11Z

"It would be nice to run timescaledb-tune straight away when the image is built --"

This would work out only as long as the image is not pushed into a Docker image repository and is not used by others.

On the other hand, maybe the tuning is not necessary at all. We're not measuring performance on our development laptops. And the tests are probably not that much slower untuned.

On the third hand, we might gain some time when we run the tuning for the persistent development database.

We could tune one and not the other Compose setup. Yet we wish to catch bugs early. We can aid catching bugs early by keeping environments similar to each other. The current approach facilitates that.

I would keep it as it is but it's your call.

haphut · 2021-04-16T08:24:49Z

"Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?"

Some alternatives off the top of my head:

Remove the dataimporter service from the local development Compose file. Run it manually as a one-off docker run --rm script, instead.
Use IF NOT EXISTS etc. diligently and run the migrations every time.
Write migrations conditionally by probing for sentinel values: "If this schema exists, we are probably done".

I would go for alternative 1.

Alternative 2 wastes our time by running the migrations every time. IF NOT EXISTS makes sense for idempotent migrations but we are not there yet regarding our migrations. Even with idempotent schema changes I would go for alternative 1.

Alternative 3 sounds messy and error-prone.

haphut · 2021-04-16T08:49:57Z

Individual commits

I prefer to see individual commits that each make a coherent contribution towards the goal of the PR branch. Once the review starts, we should aim to no longer having WIP commits in the PR. In this case it made sense as we built WIP commit together.

GitHub has a feature called draft PR that allows the developer to toggle, when a PR is ready for review and for merging from their point of view. Setting the draft boolean allows one to run CI checks without signalling that the work is finished.

Ideally, if every commit compiles and passes tests, one can rely on git bisect to hunt bugs. It sounds like a powerful tool but frankly I rarely work with a codebase that would support using it in practice.

Squashing

Some prefer to squash a PR into just one commit. I think that is rare.

What is more divisive is whether to

keep reiterating the commits during the review phase, constantly converging towards the ideal state of individual, correct commits, or
add new fixup commits on top and after passing review autosquash the fixups before merging.

I think that is also dependent on the review tools available. I prefer option 1.

Merge commits vs linear history

As GitHub makes it pretty hard to review individual commits instead of the full PR, they also default to merging with --no-ff, using explicit merge commits. This can make git history hard to follow. On the other hand it's enough that the merge commits are correct while the "internal" PR commits can be messy.

GitHub has an option to "Rebase and merge" which uses --ff-only. This gives a pretty, linear git history but demands more from individual commits so git bisect remains usable.

Branching strategies

There are several popular git workflows. Git flow is one. GitHub flow another. I often find GitLab flow useful. It's actually a set of alternative flows but they make sense to me.

haphut · 2021-04-16T08:56:03Z

I like your approach. It's clean and documented.

A general suggestion: For this PR, I might create roughly these individual commits:

Create test data
Create file .env_test
Create docker-compose.test.yml
Modify docker-compose.local.yml
Update README.md

haphut · 2021-04-16T09:07:27Z

I had a bunch of problems getting this PR to run. I use an unusual umask of 0077. That upset this docker-compose.test.yml. As I have not had similar troubles with other Compose files, I think it might be because of the bind volume types.

To be able to run the Compose file, I had to run:

chmod -R go+r . && find ./* -type d -print0 | xargs -0 chmod -R go+rx

I did not try switching to type volume.

haphut

Stellar work! You were really quick to create this PR.

.example_env

README.md

docker-compose.db.yml

docker-compose.test.yml

example_data/import.sql

docker-compose.test.yml

haphut · 2021-04-16T09:47:48Z

Oh yeah, I could not find an issue on the board that would correspond to this PR. If there were one, adding "Fixes #123" into the description of this PR would connect this PR with that issue. Then if the kanban board automation is set up as it already is, merging this PR would automatically close the issue and move the issue on the board to "Done".

haphut · 2021-04-16T09:57:07Z

Many people find pair work more productive than code reviews. It takes time to hone a great review process in a team.

One source regarding code reviews that I enjoy is https://mtlynch.io/tags/code-review/.

datarttu · 2021-04-19T12:34:00Z

"It would be nice to run timescaledb-tune straight away when the image is built --"

This would work out only as long as the image is not pushed into a Docker image repository and is not used by others.

On the other hand, maybe the tuning is not necessary at all. We're not measuring performance on our development laptops. And the tests are probably not that much slower untuned.

On the third hand, we might gain some time when we run the tuning for the persistent development database.

We could tune one and not the other Compose setup. Yet we wish to catch bugs early. We can aid catching bugs early by keeping environments similar to each other. The current approach facilitates that.

I would keep it as it is but it's your call.

Good point. Probably we don't need to worry about timescaledb-tune at this point, at least it is not needed for quick tests with small amount of example data. The main reason for using Timescale anyway is to handle massive amounts of HFP data in a reasonable way (+ we might use some nice time series specific functions from the extension) but we are not quite there yet.

datarttu · 2021-04-19T12:38:49Z

"Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?"

Some alternatives off the top of my head:
1. Remove the dataimporter service from the local development Compose file. Run it manually as a one-off `docker run --rm` script, instead.

2. Use `IF NOT EXISTS` etc. diligently and run the migrations every time.

3. Write migrations conditionally by probing for sentinel values: "If this schema exists, we are probably done".
I would go for alternative 1.

Alternative 2 wastes our time by running the migrations every time. IF NOT EXISTS makes sense for idempotent migrations but we are not there yet regarding our migrations. Even with idempotent schema changes I would go for alternative 1.

Alternative 3 sounds messy and error-prone.

Thank you, alternative 1 sounds good! I just have to document the commands needed in the readme. Wrote #38 about this.

datarttu · 2021-04-19T12:42:48Z

Individual commits

I prefer to see individual commits that each make a coherent contribution towards the goal of the PR branch. Once the review starts, we should aim to no longer having WIP commits in the PR. In this case it made sense as we built WIP commit together.

GitHub has a feature called draft PR that allows the developer to toggle, when a PR is ready for review and for merging from their point of view. Setting the draft boolean allows one to run CI checks without signalling that the work is finished.

Ideally, if every commit compiles and passes tests, one can rely on git bisect to hunt bugs. It sounds like a powerful tool but frankly I rarely work with a codebase that would support using it in practice.

Squashing

Some prefer to squash a PR into just one commit. I think that is rare.

What is more divisive is whether to
1. keep reiterating the commits during the review phase, constantly converging towards the ideal state of individual, correct commits, or

2. add new fixup commits on top and after passing review autosquash the fixups before merging.
I think that is also dependent on the review tools available. I prefer option 1.

Merge commits vs linear history

As GitHub makes it pretty hard to review individual commits instead of the full PR, they also default to merging with --no-ff, using explicit merge commits. This can make git history hard to follow. On the other hand it's enough that the merge commits are correct while the "internal" PR commits can be messy.

GitHub has an option to "Rebase and merge" which uses --ff-only. This gives a pretty, linear git history but demands more from individual commits so git bisect remains usable.

Branching strategies

There are several popular git workflows. Git flow is one. GitHub flow another. I often find GitLab flow useful. It's actually a set of alternative flows but they make sense to me.

Also thank you for the helpful comments about commits and branching, I will get back to them as I learn more about PR workflows and try them in practice!

Co-authored-by: haphut <haphut@mistmap.com>

datarttu added 12 commits April 8, 2021 11:52

docker-compose rewrite WIP

5c270c7

Copy example data on import

7048dbd

Update docker-compose.yml

307b64c

depends_on does not work like we would like to: v3 does not support condition: ... mapping under depends_on.

Move db_test/ under db/

3ee0e77

Add TODOs, use sleep command to postpone import run

c7551eb

Add POSTGRES_USER

116a9a1

Rename docker-compose for test runs

0a38ce4

Create separate compose file for persistent db

bd7f623

This instance will be accessible through POSTGRES_LOCAL_PORT and can be used for exploration, e.g. with QGIS.

Use separate env file for test runs

eb67689

Add TODO

4e8a4b6

Update README

e6cc03a

Document test run and persistent db separately.

datarttu marked this pull request as ready for review April 8, 2021 16:14

datarttu requested a review from haphut April 8, 2021 17:03

Use service_healthy, remove sleep command

299ebe7

Works well with docker-compose version 1.29.0 now.

haphut requested changes Apr 16, 2021

View reviewed changes

datarttu mentioned this pull request Apr 19, 2021

Run dataimporter separately when using docker-compose.db.yml #38

Closed

datarttu mentioned this pull request Apr 19, 2021

Make a test env file that works directly for test runs #39

Closed

Update docker-compose.test.yml

24b2f2b

Co-authored-by: haphut <haphut@mistmap.com>

datarttu mentioned this pull request Apr 19, 2021

Add build command before running docker-compose #40

Closed

datarttu and others added 2 commits April 19, 2021 15:52

Update docker-compose.test.yml

caf17f0

Co-authored-by: haphut <haphut@mistmap.com>

Update docker-compose.test.yml

9d87338

Co-authored-by: haphut <haphut@mistmap.com>

datarttu mentioned this pull request Apr 19, 2021

Add all required pg variables to .env_test #41

Closed

Update docker-compose.test.yml

c5a34d9

Co-authored-by: haphut <haphut@mistmap.com>

datarttu mentioned this pull request Apr 19, 2021

Use psql -1 in docker-compose.test.yml #42

Closed

datarttu and others added 3 commits April 19, 2021 16:00

Update docker-compose.test.yml

41be82f

Co-authored-by: haphut <haphut@mistmap.com>

Update docker-compose.test.yml

6db5a5f

Co-authored-by: haphut <haphut@mistmap.com>

Update docker-compose.test.yml

de58b5a

Co-authored-by: haphut <haphut@mistmap.com>

datarttu merged commit 70255e3 into dev Apr 19, 2021

datarttu deleted the test-compose branch April 19, 2021 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate docker-compose files for test run and persistent db #37

Separate docker-compose files for test run and persistent db #37

datarttu commented Apr 8, 2021

datarttu commented Apr 8, 2021

datarttu commented Apr 12, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut left a comment

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

datarttu commented Apr 19, 2021

datarttu commented Apr 19, 2021

datarttu commented Apr 19, 2021

Individual commits

Squashing

Merge commits vs linear history

Branching strategies

Separate docker-compose files for test run and persistent db #37

Separate docker-compose files for test run and persistent db #37

Conversation

datarttu commented Apr 8, 2021

datarttu commented Apr 8, 2021

datarttu commented Apr 12, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

Individual commits

Squashing

Merge commits vs linear history

Branching strategies

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

haphut left a comment

Choose a reason for hiding this comment

haphut commented Apr 16, 2021

haphut commented Apr 16, 2021

datarttu commented Apr 19, 2021

datarttu commented Apr 19, 2021

datarttu commented Apr 19, 2021

Individual commits

Squashing

Merge commits vs linear history

Branching strategies