Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate docker-compose files for test run and persistent db #37

Merged
merged 20 commits into from
Apr 19, 2021

Conversation

datarttu
Copy link
Owner

@datarttu datarttu commented Apr 8, 2021

This PR creates docker-compose files for two different cases:

  1. docker-compose.test.yml: One-off test run of launching a database instance, running the DDL commands and importing example data from scratch -> developer can see if all this runs without errors. No database data on volumes is left hanging around.
  2. docker-compose.db.yml: Launching a database instance for more continuous use. This version uses a separate db_volume Docker volume to store the database data even if the services are removed and restarted later. The database is also made accessible vie localhost. This can be used for exploring with psql or QGIS, for example.

Later on, a separate docker-compose file should probably be created for production use too, but for now we will do well with these two.

There are still some problems and todos with this PR, I will comment them in more detail.

For some reason 'timescaledb' will not be written to 
shared_preload_libraries of postgresql.conf if 001_timescaledb_tune.sh 
is copied to /docker-entrypoint-initdb.d/ by Dockerfile, but it works if 
the sh script is brought there by mapping it from db_test/.
depends_on does not work like we would like to: v3 does not support 
condition: ... mapping under depends_on.
This instance will be accessible through POSTGRES_LOCAL_PORT and can be 
used for exploration, e.g. with QGIS.
Document test run and persistent db separately.
@datarttu
Copy link
Owner Author

datarttu commented Apr 8, 2021

  • I was not able to include condition: service_healthy under depends_on of the dataimporter service. Apparently this is no longer supported in docker-compose file v3. Using v2 caused a whole lot of other problems, so I wanted to stick to v3. To prevent dataimporter from running the imports before the db service has run the DDL commands, I simply postponed the command: command: sh -c "sleep 5 && psql -f /data/import.sql". But this is not a neat way to do it...
  • It would be nice to run timescaledb-tune straight away when the image is built, but it is not possible because the /var/lib/postgresql/data directory does not even exist before the container is run for the first time and docker-entrypoint.sh is run. So now the TimescaleDB configuration is done and the database is restarted every time the container is started.
  • Originally, 001_timescaledb_tune.sh was copied to /docker-entrypoint-initdb.d already in the Dockerfile. It was then run there before other scripts brought there by the volume mapping in docker-compose files. But this eventually resulted in timescaledb not existing in postgresql.conf shared_preload_libraries, which in turn caused CREATE EXTENSION timescaledb to fail, and I could not figure out why. Now that 001_timescaledb_tune.sh is mapped from db/db_test and run like the DDL scripts there, the TimescaleDB extension seems to load without problems...
  • As for the "persistent" version of the database, the dataimporter service fails if the database has already been hydrated, i.e. the db service is started with a pre-existing db_volume. Of course, in that case there is no need to re-import the data that is already in the database. Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?

@datarttu datarttu marked this pull request as ready for review April 8, 2021 16:14
@datarttu datarttu requested a review from haphut April 8, 2021 17:03
Works well with docker-compose version 1.29.0 now.
@datarttu
Copy link
Owner Author

Added a commit that implements condition: service_healthy, the problem was solved by upgrading docker-compose to version 1.29.0.

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

Great stuff!

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

"It would be nice to run timescaledb-tune straight away when the image is built --"

This would work out only as long as the image is not pushed into a Docker image repository and is not used by others.

On the other hand, maybe the tuning is not necessary at all. We're not measuring performance on our development laptops. And the tests are probably not that much slower untuned.

On the third hand, we might gain some time when we run the tuning for the persistent development database.

We could tune one and not the other Compose setup. Yet we wish to catch bugs early. We can aid catching bugs early by keeping environments similar to each other. The current approach facilitates that.

I would keep it as it is but it's your call.

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

"Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?"

Some alternatives off the top of my head:

  1. Remove the dataimporter service from the local development Compose file. Run it manually as a one-off docker run --rm script, instead.
  2. Use IF NOT EXISTS etc. diligently and run the migrations every time.
  3. Write migrations conditionally by probing for sentinel values: "If this schema exists, we are probably done".

I would go for alternative 1.

Alternative 2 wastes our time by running the migrations every time. IF NOT EXISTS makes sense for idempotent migrations but we are not there yet regarding our migrations. Even with idempotent schema changes I would go for alternative 1.

Alternative 3 sounds messy and error-prone.

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

Individual commits

I prefer to see individual commits that each make a coherent contribution towards the goal of the PR branch. Once the review starts, we should aim to no longer having WIP commits in the PR. In this case it made sense as we built WIP commit together.

GitHub has a feature called draft PR that allows the developer to toggle, when a PR is ready for review and for merging from their point of view. Setting the draft boolean allows one to run CI checks without signalling that the work is finished.

Ideally, if every commit compiles and passes tests, one can rely on git bisect to hunt bugs. It sounds like a powerful tool but frankly I rarely work with a codebase that would support using it in practice.

Squashing

Some prefer to squash a PR into just one commit. I think that is rare.

What is more divisive is whether to

  1. keep reiterating the commits during the review phase, constantly converging towards the ideal state of individual, correct commits, or
  2. add new fixup commits on top and after passing review autosquash the fixups before merging.

I think that is also dependent on the review tools available. I prefer option 1.

Merge commits vs linear history

As GitHub makes it pretty hard to review individual commits instead of the full PR, they also default to merging with --no-ff, using explicit merge commits. This can make git history hard to follow. On the other hand it's enough that the merge commits are correct while the "internal" PR commits can be messy.

GitHub has an option to "Rebase and merge" which uses --ff-only. This gives a pretty, linear git history but demands more from individual commits so git bisect remains usable.

Branching strategies

There are several popular git workflows. Git flow is one. GitHub flow another. I often find GitLab flow useful. It's actually a set of alternative flows but they make sense to me.

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

I like your approach. It's clean and documented.

A general suggestion: For this PR, I might create roughly these individual commits:

  • Create test data
  • Create file .env_test
  • Create docker-compose.test.yml
  • Modify docker-compose.local.yml
  • Update README.md

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

I had a bunch of problems getting this PR to run. I use an unusual umask of 0077. That upset this docker-compose.test.yml. As I have not had similar troubles with other Compose files, I think it might be because of the bind volume types.

To be able to run the Compose file, I had to run:

chmod -R go+r . && find ./* -type d -print0 | xargs -0 chmod -R go+rx

I did not try switching to type volume.

Copy link
Collaborator

@haphut haphut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stellar work! You were really quick to create this PR.

.example_env Show resolved Hide resolved
README.md Show resolved Hide resolved
docker-compose.db.yml Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
example_data/import.sql Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
docker-compose.test.yml Outdated Show resolved Hide resolved
@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

Oh yeah, I could not find an issue on the board that would correspond to this PR. If there were one, adding "Fixes #123" into the description of this PR would connect this PR with that issue. Then if the kanban board automation is set up as it already is, merging this PR would automatically close the issue and move the issue on the board to "Done".

@haphut
Copy link
Collaborator

haphut commented Apr 16, 2021

Many people find pair work more productive than code reviews. It takes time to hone a great review process in a team.

One source regarding code reviews that I enjoy is https://mtlynch.io/tags/code-review/.

@datarttu
Copy link
Owner Author

"It would be nice to run timescaledb-tune straight away when the image is built --"

This would work out only as long as the image is not pushed into a Docker image repository and is not used by others.

On the other hand, maybe the tuning is not necessary at all. We're not measuring performance on our development laptops. And the tests are probably not that much slower untuned.

On the third hand, we might gain some time when we run the tuning for the persistent development database.

We could tune one and not the other Compose setup. Yet we wish to catch bugs early. We can aid catching bugs early by keeping environments similar to each other. The current approach facilitates that.

I would keep it as it is but it's your call.

Good point. Probably we don't need to worry about timescaledb-tune at this point, at least it is not needed for quick tests with small amount of example data. The main reason for using Timescale anyway is to handle massive amounts of HFP data in a reasonable way (+ we might use some nice time series specific functions from the extension) but we are not quite there yet.

@datarttu
Copy link
Owner Author

"Could we come up with a neat solution that would allow us to run the imports automatically if the database is created from scratch, and omit the imports if it the database already exists on db_volume?"

Some alternatives off the top of my head:

1. Remove the dataimporter service from the local development Compose file. Run it manually as a one-off `docker run --rm` script, instead.

2. Use `IF NOT EXISTS` etc. diligently and run the migrations every time.

3. Write migrations conditionally by probing for sentinel values: "If this schema exists, we are probably done".

I would go for alternative 1.

Alternative 2 wastes our time by running the migrations every time. IF NOT EXISTS makes sense for idempotent migrations but we are not there yet regarding our migrations. Even with idempotent schema changes I would go for alternative 1.

Alternative 3 sounds messy and error-prone.

Thank you, alternative 1 sounds good! I just have to document the commands needed in the readme. Wrote #38 about this.

@datarttu
Copy link
Owner Author

Individual commits

I prefer to see individual commits that each make a coherent contribution towards the goal of the PR branch. Once the review starts, we should aim to no longer having WIP commits in the PR. In this case it made sense as we built WIP commit together.

GitHub has a feature called draft PR that allows the developer to toggle, when a PR is ready for review and for merging from their point of view. Setting the draft boolean allows one to run CI checks without signalling that the work is finished.

Ideally, if every commit compiles and passes tests, one can rely on git bisect to hunt bugs. It sounds like a powerful tool but frankly I rarely work with a codebase that would support using it in practice.

Squashing

Some prefer to squash a PR into just one commit. I think that is rare.

What is more divisive is whether to

1. keep reiterating the commits during the review phase, constantly converging towards the ideal state of individual, correct commits, or

2. add new fixup commits on top and after passing review autosquash the fixups before merging.

I think that is also dependent on the review tools available. I prefer option 1.

Merge commits vs linear history

As GitHub makes it pretty hard to review individual commits instead of the full PR, they also default to merging with --no-ff, using explicit merge commits. This can make git history hard to follow. On the other hand it's enough that the merge commits are correct while the "internal" PR commits can be messy.

GitHub has an option to "Rebase and merge" which uses --ff-only. This gives a pretty, linear git history but demands more from individual commits so git bisect remains usable.

Branching strategies

There are several popular git workflows. Git flow is one. GitHub flow another. I often find GitLab flow useful. It's actually a set of alternative flows but they make sense to me.

Also thank you for the helpful comments about commits and branching, I will get back to them as I learn more about PR workflows and try them in practice!

Co-authored-by: haphut <haphut@mistmap.com>
datarttu and others added 2 commits April 19, 2021 15:52
Co-authored-by: haphut <haphut@mistmap.com>
Co-authored-by: haphut <haphut@mistmap.com>
Co-authored-by: haphut <haphut@mistmap.com>
datarttu and others added 3 commits April 19, 2021 16:00
Co-authored-by: haphut <haphut@mistmap.com>
Co-authored-by: haphut <haphut@mistmap.com>
Co-authored-by: haphut <haphut@mistmap.com>
@datarttu datarttu merged commit 70255e3 into dev Apr 19, 2021
@datarttu datarttu deleted the test-compose branch April 19, 2021 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants