-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triage database provisioner #840
Conversation
Thanks for this @thcrock
|
@thcrock weird the crosstabs thingy, last week I ran it from the repo without problems. I will check it to track the bug |
It does appear that I'm the only one who's had the crosstabs problem so I will try building my environment again. As for environment variables: we do support environment variables for database credentials in Triage, but this database provisioner is just creating the credentials, not using them. Any way of attempting to automatically set the environment variables for the user would be shell-specific and OS-specific, and not work until they restart the shell. Do you have something in mind? Just print the credentials to screen and tell the user to set those environment variables? |
@nanounanue about the first question: This came up at the meeting. Ultimately it seems the team wants to use it for Dirty duck (as in, remove the docker-compose and bastion dependencies from Dirty Duck). That would require much more work than is present here. However, we did decide that we can merge this PR, as currently scoped, to just be an option for regular Triage users for now. So you might do this once you finish the tutorial and want to start running Triage on your own data. |
Yeah, just installed triage from master in a new environment and didn't run into issues with crosstabs in the CLI either. In terms of testing out here, I did a
I think we might need to change it to look for |
@shaycrk So far I've been unable to get the contents of the dirtyduck folder to show up in the installed package, which is necessary for that docker image to be buildable. But maybe I don't need to. For this purpose, are PostGIS and the dirtyduck data necessary? I could more easily just have the triage db command use a vanilla Postgres image, at least for now. |
Manifests are a big problem, I solved this issue in the pastñet me check my notes ...
…On Sat, Apr 10, 2021, at 16:07, Tristan Crockett wrote:
@shaycrk <https://github.com/shaycrk> So far I've been unable to get the contents of the dirtyduck folder to show up in the installed package, which is necessary for that docker image to be buildable.
But maybe I don't need to. For this purpose, are PostGIS and the dirtyduck data necessary? I could more easily just have the triage db command use a vanilla Postgres image, at least for now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#840 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADYXQFGQVSASH2JLV3G6P3TIC4XXANCNFSM42DCBEOA>.
|
@nanounanue I spent some more time on the Manifest. Apparently the 'dirtyduck' directory is successfully getting into the 'sdist' (what the Manifest.in is supposed to do) but is not getting copied to the site-packages/triage directory. This is despite 'include_package_data' being set to True in setup.py |
@nanounanue Here is the deal with the manifest. The 'dirtyduck' directory wasn't getting into the build directory despite the I tried this based on this reply on a related issue: pypa/sampleproject#30 (comment) Moving it into that directory may change some other things. Before deciding to do this, I think we should confirm that we want the database provisioner to actually include the dirtyduck data scripts. If the user types |
Thanks @thcrock! I've definitely been thinking about this mostly as motivated by making the dirty duck tutorial a little more seamless (as well as making it possible to run after a |
Hey @shaycrk this is ready for review again, with just the vanilla db. One thing you'll note is that the docs/sources/index.md changed a bunch. This is because I ran 'manage docs', which copies README.md to docs/sources/index.md. This is something I put in a while ago to solve the problem of either 1. having content split between the documentation site and the README or 2. having the README be a stub and punt the user to the documentation site. We can iterate on this process depending on what the team wants, but for now it looks README and docs/sources/index.md were out of sync. I'm not sure which version of this page correct. README was modified much more recently than index.md, so I'm guessing README is the right version to use for now, and if that's true this PR syncs them up again. But check it out in case anything is wrong. |
docs/sources/db.md
Outdated
The command does require some version of Docker. We recommend getting it from the [official Docker downloads page](https://docs.docker.com/get-docker/). | ||
|
||
### Can't log in | ||
Because of the way Docker volumes work, if you manually remove the Docker container created by `triage db up`, the volume will still be around. This is usually fine, but the superuser credential information will persist as well, which means the next time you spawn the database, *the Postgres server will not take the new credential information into account*. Under normal usage (simply calling `triage db up` and never removing the container), you will never run into this situation. But if you do, and you would like to use a new username/password, you will have to remove the volume before recreating. This can be done with `docker volume rm db-data`. This will also remove all of the stored data in Postgres, so beware! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we might want to name the volume something like triage-db-data
as db-data
feels a little too generic/possible to clash with something else a user might have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/triage/cli.py
Outdated
return | ||
if args.password: | ||
password = getpass(prompt='Enter a password for your new database user: ') | ||
logger.info(self.local['docker']['run', '-d', '-p', '5432:5432', '-e', 'POSTGRES_HOST=0.0.0.0', '-e', 'POSTGRES_USER=triage_user', '-e', 'POSTGRES_PORT=5432', '-e', f'POSTGRES_PASSWORD={password}', '-e', 'POSTGRES_DB=triage', '-v', 'db-data:/var/lib/postgresql/data', '--name', 'triage_db', 'postgres:12']()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe for the future, but would it be useful to allow the user pass an argument to map the postgres data directory to (e.g., possibly allowing them to use a separate, larger drive) as well as a custom port (e.g., if they do have a local postgres running but don't want to use that for triage)
(retcode, stdout, stderr) = self.local['docker']['container', 'inspect', '-f', "'{{.State.Status}}'", 'triage_db'].run(retcode=None) | ||
if retcode != 0: | ||
logger.info('Container does not exist') | ||
if os.path.exists('database.yaml'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking in the working directory where the triage
command is being run from, right? That's probably fine, but should make sure the documentation calls out that the CLI cares what directory it's being run from
src/triage/cli.py
Outdated
'db': 'triage' | ||
} | ||
out_fd.write(yaml.dump(config)) | ||
logger.info('New database created with credentials saved to database.yaml. You can watch it boot up with "docker logs triage_db --follow", and wait until it says "database system is ready to accept connections. At that point, you can either psql into it using the credentials in database.yaml, or use other triage commands which will look for the credentials in database.yaml') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing "
after "database system is ready to accept connections
`triage db up` | ||
|
||
This command attempts to use docker to spawn a new Postgres 12 database. If successful, it will prompt you for a password to use for a user, and populate the connection information in database.yaml. The next time you run `triage db up`, it will look for the existing container and reuse it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to provide some info/link out to installing psql
and/or a GUI like dbeaver
for people who aren't too familiar with using postgres generally?
Thanks @thcrock! Added a few inline comments, mostly around things like docs and naming, but looks good to me overall |
Thanks @shaycrk I just uploaded some changes based on your review |
Thanks @thcrock -- looks good to me! Feel free to merge if you're happy with it |
Not quite ready for merge (I would add some documentation around this and maybe clean up the code a bit), but I wanted to get thoughts on this interface for a local Dockerized database provisioner. The idea is Triage users could use this to easily set up a database outside of a Dirtyduck context but without having to do Postgres on their own. With just docker as a dependency, they can:
All using the command 'triage db up'.
Note: I don't know what's going on with the Crosstabs CLI but I couldn't get this to run while using those imports. Did that crosstab code get removed? It's obviously unrelated to this PR but if I can't run the CLI with those lines in there I had to take them out.