Docker container with Databaker, Jupyter, Pandas and other handy data wrangling tools.
This container is used by Jenkins and Data Engineers to build/run data pipelines within GSS' IDP Dissemination Branch.
There are the internally hosted GSS utilities installed along with common python data wrangling tools, these are:
Is a convenience for local development purposes, this container is a customised version of master using the :dev image tag, additions are:
- All Python packages listed under
[dev-packages]in this repospyproject.tomlare installed. - The GSS-Cogs tool reposync is also installed.
- System package gpg2 is installed.
To use reposync, see the reposync README.md.
If you want to add a python package add it to the pyproject.toml then run make. Do not run poetry lock locally to update it. There are mac os assumptions that we don't want to be making on the linux container environment.
Is handled by the github actions defined in /.github/workflows. To check build status click Actions above.
-
gsscogs/databaker:latestandgsscogs/databaker:<release_tag>are built when (a) you create areleaseand (b) the commit of the release matches the commit of the master branch. -
gsscogs/databaker:devis built whenever you push or merge a change to master.
Note: I mean release literally, just adding a tag will not trigger a release build.