Triage database provisioner #840

thcrock · 2021-03-30T22:02:40Z

Not quite ready for merge (I would add some documentation around this and maybe clean up the code a bit), but I wanted to get thoughts on this interface for a local Dockerized database provisioner. The idea is Triage users could use this to easily set up a database outside of a Dirtyduck context but without having to do Postgres on their own. With just docker as a dependency, they can:

Bring up the thing from scratch; build the image, run Postgres through the image, and write a database.yaml file using the credentials created by the script.
Start the container if if stopped
Rebuild the container if it was removed

All using the command 'triage db up'.

Note: I don't know what's going on with the Crosstabs CLI but I couldn't get this to run while using those imports. Did that crosstab code get removed? It's obviously unrelated to this PR but if I can't run the CLI with those lines in there I had to take them out.

nanounanue · 2021-04-01T14:20:04Z

Thanks for this @thcrock

The idea here is to create just a dockerized DB or create the dockerized dirtyduck DB?
I am not a big fan of database.yaml thing (is another file, prone to errors, security risk, etc) but if we want to keep that interface we should support the environment variables (DATABASE_URL or the PG_... ones)

nanounanue · 2021-04-01T14:21:23Z

@thcrock weird the crosstabs thingy, last week I ran it from the repo without problems. I will check it to track the bug

thcrock · 2021-04-05T03:11:32Z

It does appear that I'm the only one who's had the crosstabs problem so I will try building my environment again.

As for environment variables: we do support environment variables for database credentials in Triage, but this database provisioner is just creating the credentials, not using them. Any way of attempting to automatically set the environment variables for the user would be shell-specific and OS-specific, and not work until they restart the shell. Do you have something in mind? Just print the credentials to screen and tell the user to set those environment variables?

thcrock · 2021-04-05T03:15:57Z

@nanounanue about the first question: This came up at the meeting. Ultimately it seems the team wants to use it for Dirty duck (as in, remove the docker-compose and bastion dependencies from Dirty Duck). That would require much more work than is present here. However, we did decide that we can merge this PR, as currently scoped, to just be an option for regular Triage users for now. So you might do this once you finish the tutorial and want to start running Triage on your own data.

shaycrk · 2021-04-06T18:26:52Z

Yeah, just installed triage from master in a new environment and didn't run into issues with crosstabs in the CLI either.

In terms of testing out here, I did a pip install git+https://github.com/dssg/triage.git@triage_db in a new environment and then tried running triage db up but hit this error:

(temp-3.7.2) code $ triage db up
[ProcessExecutionError] Command line: ['/usr/local/bin/docker', 'build', './dirtyduck/food_db/', '-t', 'triage_db']
Exit code: 1
Stderr:  | unable to prepare context: path "./dirtyduck/food_db/" not found

I think we might need to change it to look for food_db relative to the code path?

thcrock · 2021-04-10T21:06:55Z

@shaycrk So far I've been unable to get the contents of the dirtyduck folder to show up in the installed package, which is necessary for that docker image to be buildable.

But maybe I don't need to. For this purpose, are PostGIS and the dirtyduck data necessary? I could more easily just have the triage db command use a vanilla Postgres image, at least for now.

nanounanue · 2021-04-10T21:39:41Z

Manifests are a big problem, I solved this issue in the pastñet me check my notes ...

…

On Sat, Apr 10, 2021, at 16:07, Tristan Crockett wrote: @shaycrk <https://github.com/shaycrk> So far I've been unable to get the contents of the dirtyduck folder to show up in the installed package, which is necessary for that docker image to be buildable. But maybe I don't need to. For this purpose, are PostGIS and the dirtyduck data necessary? I could more easily just have the triage db command use a vanilla Postgres image, at least for now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#840 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADYXQFGQVSASH2JLV3G6P3TIC4XXANCNFSM42DCBEOA>.

thcrock · 2021-04-28T03:02:35Z

@nanounanue I spent some more time on the Manifest. Apparently the 'dirtyduck' directory is successfully getting into the 'sdist' (what the Manifest.in is supposed to do) but is not getting copied to the site-packages/triage directory. This is despite 'include_package_data' being set to True in setup.py

thcrock · 2021-05-05T03:32:55Z

@nanounanue Here is the deal with the manifest. The 'dirtyduck' directory wasn't getting into the build directory despite the MANIFEST.in and include_package_data=True combo because it's not within a package directory. If I move it to src/triage, it works.

I tried this based on this reply on a related issue: pypa/sampleproject#30 (comment)

Moving it into that directory may change some other things. Before deciding to do this, I think we should confirm that we want the database provisioner to actually include the dirtyduck data scripts. If the user types triage db up, should they get a vanilla working postgres database or should they get the food database sample data (including the one or two minutes it might take for the system to download the inspections data and load it)? @shaycrk

shaycrk · 2021-05-05T21:55:20Z

Thanks @thcrock! I've definitely been thinking about this mostly as motivated by making the dirty duck tutorial a little more seamless (as well as making it possible to run after a pip install triage without having to clone the repo), though I can certainly see some utility in providing an empty database, too. My sense is it might be nice to support both unless that seems unreasonable (e.g., maybe triage tutorial up vs triage db up?). Anyway, we also have our planning meeting tomorrow, so might be good to talk it through then if you can both make it?

thcrock · 2021-05-13T03:41:59Z

Hey @shaycrk this is ready for review again, with just the vanilla db.

One thing you'll note is that the docs/sources/index.md changed a bunch. This is because I ran 'manage docs', which copies README.md to docs/sources/index.md. This is something I put in a while ago to solve the problem of either 1. having content split between the documentation site and the README or 2. having the README be a stub and punt the user to the documentation site.

We can iterate on this process depending on what the team wants, but for now it looks README and docs/sources/index.md were out of sync. I'm not sure which version of this page correct. README was modified much more recently than index.md, so I'm guessing README is the right version to use for now, and if that's true this PR syncs them up again. But check it out in case anything is wrong.

shaycrk · 2021-05-22T01:38:56Z

docs/sources/db.md

+The command does require some version of Docker. We recommend getting it from the [official Docker downloads page](https://docs.docker.com/get-docker/).
+
+### Can't log in
+Because of the way Docker volumes work, if you manually remove the Docker container created by `triage db up`, the volume will still be around. This is usually fine, but the superuser credential information will persist as well, which means the next time you spawn the database, *the Postgres server will not take the new credential information into account*. Under normal usage (simply calling `triage db up` and never removing the container), you will never run into this situation. But if you do, and you would like to use a new username/password, you will have to remove the volume before recreating. This can be done with `docker volume rm db-data`. This will also remove all of the stored data in Postgres, so beware!


I wonder if we might want to name the volume something like triage-db-data as db-data feels a little too generic/possible to clash with something else a user might have?

shaycrk · 2021-05-22T01:46:42Z

src/triage/cli.py

+                return
+            if args.password:
+                password = getpass(prompt='Enter a password for your new database user: ')
+                logger.info(self.local['docker']['run', '-d', '-p', '5432:5432', '-e', 'POSTGRES_HOST=0.0.0.0', '-e', 'POSTGRES_USER=triage_user', '-e', 'POSTGRES_PORT=5432', '-e', f'POSTGRES_PASSWORD={password}', '-e', 'POSTGRES_DB=triage', '-v', 'db-data:/var/lib/postgresql/data', '--name', 'triage_db', 'postgres:12']())


maybe for the future, but would it be useful to allow the user pass an argument to map the postgres data directory to (e.g., possibly allowing them to use a separate, larger drive) as well as a custom port (e.g., if they do have a local postgres running but don't want to use that for triage)

shaycrk · 2021-05-22T01:49:10Z

src/triage/cli.py

+        (retcode, stdout, stderr) = self.local['docker']['container', 'inspect', '-f', "'{{.State.Status}}'", 'triage_db'].run(retcode=None)
+        if retcode != 0:
+            logger.info('Container does not exist')
+            if os.path.exists('database.yaml'):


This is checking in the working directory where the triage command is being run from, right? That's probably fine, but should make sure the documentation calls out that the CLI cares what directory it's being run from

shaycrk · 2021-05-22T01:50:44Z

src/triage/cli.py

+                        'db': 'triage'
+                    }
+                    out_fd.write(yaml.dump(config))
+                logger.info('New database created with credentials saved to database.yaml. You can watch it boot up with "docker logs triage_db --follow", and wait until it says "database system is ready to accept connections. At that point, you can either psql into it using the credentials in database.yaml, or use other triage commands which will look for the credentials in database.yaml')


Missing " after "database system is ready to accept connections

shaycrk · 2021-05-22T01:51:56Z

docs/sources/db.md

+`triage db up`
+
+This command attempts to use docker to spawn a new Postgres 12 database. If successful, it will prompt you for a password to use for a user, and populate the connection information in database.yaml. The next time you run `triage db up`, it will look for the existing container and reuse it.
+


Might be useful to provide some info/link out to installing psql and/or a GUI like dbeaver for people who aren't too familiar with using postgres generally?

shaycrk · 2021-05-22T01:52:46Z

Thanks @thcrock! Added a few inline comments, mostly around things like docs and naming, but looks good to me overall

thcrock · 2021-05-23T20:13:41Z

Thanks @shaycrk I just uploaded some changes based on your review

shaycrk · 2021-05-28T18:26:35Z

Thanks @thcrock -- looks good to me! Feel free to merge if you're happy with it

Triage database provisioner

c735181

thcrock requested review from nanounanue and shaycrk March 30, 2021 22:02

thcrock added 5 commits April 10, 2021 15:39

Include in manifest and use relative to code path

a1ca206

Fix use of pathlib

24a33e1

Include all of dirtyduck

5174529

Try going back

34715d9

Try graft

cd83048

thcrock added 2 commits May 12, 2021 22:27

Revert to vanilla Postgres, add documentation

7898743

Add note in README instead

87783fe

thcrock marked this pull request as ready for review May 13, 2021 03:37

shaycrk reviewed May 22, 2021

View reviewed changes

shaycrk approved these changes May 22, 2021

View reviewed changes

Changes from review

9f74d10

Merge branch 'master' into triage_db

bf9d31d

shaycrk merged commit a994f3e into master Aug 26, 2021

shaycrk deleted the triage_db branch August 26, 2021 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage database provisioner #840

Triage database provisioner #840

thcrock commented Mar 30, 2021

nanounanue commented Apr 1, 2021

nanounanue commented Apr 1, 2021

thcrock commented Apr 5, 2021

thcrock commented Apr 5, 2021

shaycrk commented Apr 6, 2021

thcrock commented Apr 10, 2021

nanounanue commented Apr 10, 2021 via email

thcrock commented Apr 28, 2021

thcrock commented May 5, 2021

shaycrk commented May 5, 2021

thcrock commented May 13, 2021

shaycrk May 22, 2021

thcrock May 23, 2021

shaycrk May 22, 2021

shaycrk May 22, 2021

shaycrk May 22, 2021

shaycrk May 22, 2021

shaycrk commented May 22, 2021

thcrock commented May 23, 2021

shaycrk commented May 28, 2021

		`triage db up`

		This command attempts to use docker to spawn a new Postgres 12 database. If successful, it will prompt you for a password to use for a user, and populate the connection information in database.yaml. The next time you run `triage db up`, it will look for the existing container and reuse it.

Triage database provisioner #840

Triage database provisioner #840

Conversation

thcrock commented Mar 30, 2021

nanounanue commented Apr 1, 2021

nanounanue commented Apr 1, 2021

thcrock commented Apr 5, 2021

thcrock commented Apr 5, 2021

shaycrk commented Apr 6, 2021

thcrock commented Apr 10, 2021

nanounanue commented Apr 10, 2021 via email

thcrock commented Apr 28, 2021

thcrock commented May 5, 2021

shaycrk commented May 5, 2021

thcrock commented May 13, 2021

shaycrk May 22, 2021

Choose a reason for hiding this comment

thcrock May 23, 2021

Choose a reason for hiding this comment

shaycrk May 22, 2021

Choose a reason for hiding this comment

shaycrk May 22, 2021

Choose a reason for hiding this comment

shaycrk May 22, 2021

Choose a reason for hiding this comment

shaycrk May 22, 2021

Choose a reason for hiding this comment

shaycrk commented May 22, 2021

thcrock commented May 23, 2021

shaycrk commented May 28, 2021