-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dockerfile and workflow for building and pushing Docker image to GHCR #25
Add Dockerfile and workflow for building and pushing Docker image to GHCR #25
Conversation
3488fe1
to
60dfd2c
Compare
60dfd2c
to
c565cd1
Compare
.github/workflows/deploy.yaml
Outdated
# Tag the following types of images: | ||
# * On a branch, tag with the branch name (e.g. `master`) | ||
# * On a PR, tag with the PR number (e.g. `pr-12`) | ||
# * On all events, tag with the short git SHA (e.g. `e956384`) | ||
tags: | | ||
type=ref,event=branch | ||
type=ref,event=pr | ||
type=sha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this seem like the correct tagging scheme to you?
One subtlety here is that images persist indefinitely in GHCR, so by tagging every image with its sha
we run the risk of using a very large amount of storage for images that correspond to individual commits. This strikes me as a good thing for the main branches (master
and *-assessment-year
) since we want to be able to recover old models that were used on those branches, but I bet we could reduce the storage use by not producing SHA-tagged images on pull requests.
That being said, my read of the pricing for GitHub Packages suggests that storage is free and unlimited for public packages, so maybe we just push everything and don't worry too much about it. I'm not sure, curious what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Instead of tagging and creating an image for every commit, let's create an image for each git tag. That way, when we tag assessment-year-2024
, we'll make a long-lived image with the same name that has the dependencies necessary to run the 2024 model already built/bundled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I switched this to type=ref,event=tag
in 4699510.
fe65257
to
fac8fc5
Compare
be14c44
to
92c2b1e
Compare
Dockerfile
Outdated
ENV RENV_CONFIG_REPOS_OVERRIDE "https://cloud.r-project.org/" | ||
ENV RENV_PATHS_LIBRARY renv/library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied these variables over from the GitLab version of the workflow, but I'm not 100% confident they're necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RENV_PATHS_LIBRARY
is likely necessary. If we want to setup package caching using Actions instead of (or in addition to) the Docker build cache, then we should set the other ones too. RENV_CONFIG_REPOS_OVERRIDE
was the product of some weird bugs on GitLab's runners that I now can't recall.
suggestion: We might want to set RENV_CONFIG_REPOS_OVERRIDE
to point to the Posit package mirror, since they have pre-built linux binaries. That would speed up our build times a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking RE: PPM! I refactored to use it in 35b15ed and reduced the cache miss build time to 8min (see this workflow).
I think I'll leave off the cache variables until we decide to implement package manager caching.
Dockerfile
Outdated
RUN apt-get update && apt-get install --no-install-recommends -y \ | ||
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \ | ||
libudunits2-dev python3-dev python3-pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These system dependencies were also copied over from GitLab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Let's add the following as well, as they'll be needed down the road:
libgdal-dev libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I added these in c41c3a7 👍🏻
Dockerfile
Outdated
# Install renv for R dependencies | ||
RUN Rscript -e "install.packages('renv')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: This is probably redundant since renv will bootstrap itself in the presence of .Rprofile
files.
suggestion: I would change L29 to copy renv.lock
, .Rprofile
and renv/
to get the full bootstrap going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dockerfile
Outdated
COPY renv.lock . | ||
|
||
# Install R dependencies | ||
RUN Rscript -e 'renv::restore()' | ||
|
||
# Copy the directory into the container | ||
ADD ./ model-res-avm/ | ||
WORKDIR model-res-avm/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: You might run into issues here at runtime because renv (and all its libraries) are installed at .
, while the working directory is model-res-avm
. Thus if you do something like docker exec -it $CONTAINER_NAME R
, it wouldn't load renv and the restored libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I copied over the renv
directory to model-res-avm
in 3edd026.
# Copy pipenv files into the image. The reason this is a separate step from | ||
# the later step that adds files from the working directory is because we want | ||
# to avoid having to reinstall dependencies every time a file in the directory | ||
# changes, as Docker will bust the cache of every layer following a layer that | ||
# needs to change | ||
COPY Pipfile . | ||
COPY Pipfile.lock . | ||
|
||
# Install Python dependencies | ||
RUN pipenv install --system --deploy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (maybe blocking): We might be able to dump a huge proportion of the pip requirements, or even drop them entirely, depending on how things will run on AWS.
Basically, the only things using Python in this repo are sales val (which will shortly be removed) and DVC. If AWS is going to use the container image directly, then we should keep pip/DVC. If AWS is going to use a separate runner and then use the Docker container only to run the R pipeline, then we can remove the pip reqs here and move them to the runner.
Either way, we should only need to install the boto + DVC stuff from here on out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I switched to just installing dvc[s3]
in Pipfile
in 62c144a and that seems to keep DVC and the S3 stuff 👍🏻
Dockerfile
Outdated
ENV RENV_CONFIG_REPOS_OVERRIDE "https://cloud.r-project.org/" | ||
ENV RENV_PATHS_LIBRARY renv/library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RENV_PATHS_LIBRARY
is likely necessary. If we want to setup package caching using Actions instead of (or in addition to) the Docker build cache, then we should set the other ones too. RENV_CONFIG_REPOS_OVERRIDE
was the product of some weird bugs on GitLab's runners that I now can't recall.
suggestion: We might want to set RENV_CONFIG_REPOS_OVERRIDE
to point to the Posit package mirror, since they have pre-built linux binaries. That would speed up our build times a lot.
Dockerfile
Outdated
RUN apt-get update && apt-get install --no-install-recommends -y \ | ||
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \ | ||
libudunits2-dev python3-dev python3-pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Let's add the following as well, as they'll be needed down the road:
libgdal-dev libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc
.github/workflows/deploy.yaml
Outdated
permissions: | ||
contents: read | ||
packages: write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Are these strictly necessary? I haven't set them up in the other GHA Docker builds and they seem to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I got these permissions from the docs on publishing a package using an action! Perhaps they're outdated? In any case, if app-shiny-rpie
pushes successfully without these permissions then I think we don't need them. Removed in 4e3ab96.
.github/workflows/deploy.yaml
Outdated
# Tag the following types of images: | ||
# * On a branch, tag with the branch name (e.g. `master`) | ||
# * On a PR, tag with the PR number (e.g. `pr-12`) | ||
# * On all events, tag with the short git SHA (e.g. `e956384`) | ||
tags: | | ||
type=ref,event=branch | ||
type=ref,event=pr | ||
type=sha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Instead of tagging and creating an image for every commit, let's create an image for each git tag. That way, when we tag assessment-year-2024
, we'll make a long-lived image with the same name that has the dependencies necessary to run the 2024 model already built/bundled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @jeancochrane. Eager to see this in action!
Two quick asks:
- It might be worthwhile to clear out the tagged images here once this is merged: https://github.com/ccao-data/model-res-avm/pkgs/container/model-res-avm
- In the Packages sidebar, it doesn't correctly display the container type if attestation is enabled. You can turn it off by setting
provenance: false
like this: https://github.com/ccao-data/app-shiny-rpie/blob/master/.github/workflows/docker-build.yaml#L42C11-L42C11
Good call @dfsnow, done in edcfeeb!
Hm, I don't seem to have permissions to delete images from our Packages repository -- are you able to either do that or grant me permissions to do so? |
This PR adds a Dockerfile and a new GitHub Actions workflow to perform the following operations:
Closes #22.