Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dockerfile and workflow for building and pushing Docker image to GHCR #25

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Oct 24, 2023

This PR adds a Dockerfile and a new GitHub Actions workflow to perform the following operations:

  • Run on every pull request, commit to the main branch, or commit to a long-lived yearly assessment branch
  • Build a Docker image for the model
  • Publish it to the GitHub Container Registry

Closes #22.

@jeancochrane jeancochrane linked an issue Oct 24, 2023 that may be closed by this pull request
@jeancochrane jeancochrane force-pushed the jeancochrane/22-infra-updates-build-and-push-a-docker-image-for-the-model branch 5 times, most recently from 3488fe1 to 60dfd2c Compare October 24, 2023 18:35
@jeancochrane jeancochrane force-pushed the jeancochrane/22-infra-updates-build-and-push-a-docker-image-for-the-model branch from 60dfd2c to c565cd1 Compare October 24, 2023 18:42
.github/workflows/deploy.yaml Show resolved Hide resolved
.github/workflows/deploy.yaml Show resolved Hide resolved
Comment on lines 38 to 45
# Tag the following types of images:
# * On a branch, tag with the branch name (e.g. `master`)
# * On a PR, tag with the PR number (e.g. `pr-12`)
# * On all events, tag with the short git SHA (e.g. `e956384`)
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this seem like the correct tagging scheme to you?

One subtlety here is that images persist indefinitely in GHCR, so by tagging every image with its sha we run the risk of using a very large amount of storage for images that correspond to individual commits. This strikes me as a good thing for the main branches (master and *-assessment-year) since we want to be able to recover old models that were used on those branches, but I bet we could reduce the storage use by not producing SHA-tagged images on pull requests.

That being said, my read of the pricing for GitHub Packages suggests that storage is free and unlimited for public packages, so maybe we just push everything and don't worry too much about it. I'm not sure, curious what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Instead of tagging and creating an image for every commit, let's create an image for each git tag. That way, when we tag assessment-year-2024, we'll make a long-lived image with the same name that has the dependencies necessary to run the 2024 model already built/bundled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I switched this to type=ref,event=tag in 4699510.

@jeancochrane jeancochrane force-pushed the jeancochrane/22-infra-updates-build-and-push-a-docker-image-for-the-model branch from fe65257 to fac8fc5 Compare October 24, 2023 21:12
@jeancochrane jeancochrane force-pushed the jeancochrane/22-infra-updates-build-and-push-a-docker-image-for-the-model branch from be14c44 to 92c2b1e Compare October 24, 2023 21:25
Dockerfile Outdated
Comment on lines 3 to 4
ENV RENV_CONFIG_REPOS_OVERRIDE "https://cloud.r-project.org/"
ENV RENV_PATHS_LIBRARY renv/library
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied these variables over from the GitLab version of the workflow, but I'm not 100% confident they're necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RENV_PATHS_LIBRARY is likely necessary. If we want to setup package caching using Actions instead of (or in addition to) the Docker build cache, then we should set the other ones too. RENV_CONFIG_REPOS_OVERRIDE was the product of some weird bugs on GitLab's runners that I now can't recall.

suggestion: We might want to set RENV_CONFIG_REPOS_OVERRIDE to point to the Posit package mirror, since they have pre-built linux binaries. That would speed up our build times a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking RE: PPM! I refactored to use it in 35b15ed and reduced the cache miss build time to 8min (see this workflow).

I think I'll leave off the cache variables until we decide to implement package manager caching.

Dockerfile Outdated
Comment on lines 7 to 9
RUN apt-get update && apt-get install --no-install-recommends -y \
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
libudunits2-dev python3-dev python3-pip
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These system dependencies were also copied over from GitLab.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Let's add the following as well, as they'll be needed down the road:

libgdal-dev libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I added these in c41c3a7 👍🏻

@jeancochrane jeancochrane marked this pull request as ready for review October 25, 2023 14:41
Dockerfile Outdated
Comment on lines 14 to 15
# Install renv for R dependencies
RUN Rscript -e "install.packages('renv')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: This is probably redundant since renv will bootstrap itself in the presence of .Rprofile files.

suggestion: I would change L29 to copy renv.lock, .Rprofile and renv/ to get the full bootstrap going.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense! Done in 3edd026 and d1e9ff7.

Dockerfile Outdated
Comment on lines 29 to 36
COPY renv.lock .

# Install R dependencies
RUN Rscript -e 'renv::restore()'

# Copy the directory into the container
ADD ./ model-res-avm/
WORKDIR model-res-avm/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: You might run into issues here at runtime because renv (and all its libraries) are installed at ., while the working directory is model-res-avm. Thus if you do something like docker exec -it $CONTAINER_NAME R, it wouldn't load renv and the restored libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I copied over the renv directory to model-res-avm in 3edd026.

Comment on lines +17 to +26
# Copy pipenv files into the image. The reason this is a separate step from
# the later step that adds files from the working directory is because we want
# to avoid having to reinstall dependencies every time a file in the directory
# changes, as Docker will bust the cache of every layer following a layer that
# needs to change
COPY Pipfile .
COPY Pipfile.lock .

# Install Python dependencies
RUN pipenv install --system --deploy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (maybe blocking): We might be able to dump a huge proportion of the pip requirements, or even drop them entirely, depending on how things will run on AWS.

Basically, the only things using Python in this repo are sales val (which will shortly be removed) and DVC. If AWS is going to use the container image directly, then we should keep pip/DVC. If AWS is going to use a separate runner and then use the Docker container only to run the R pipeline, then we can remove the pip reqs here and move them to the runner.

Either way, we should only need to install the boto + DVC stuff from here on out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I switched to just installing dvc[s3] in Pipfile in 62c144a and that seems to keep DVC and the S3 stuff 👍🏻

Dockerfile Outdated
Comment on lines 3 to 4
ENV RENV_CONFIG_REPOS_OVERRIDE "https://cloud.r-project.org/"
ENV RENV_PATHS_LIBRARY renv/library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RENV_PATHS_LIBRARY is likely necessary. If we want to setup package caching using Actions instead of (or in addition to) the Docker build cache, then we should set the other ones too. RENV_CONFIG_REPOS_OVERRIDE was the product of some weird bugs on GitLab's runners that I now can't recall.

suggestion: We might want to set RENV_CONFIG_REPOS_OVERRIDE to point to the Posit package mirror, since they have pre-built linux binaries. That would speed up our build times a lot.

Dockerfile Outdated
Comment on lines 7 to 9
RUN apt-get update && apt-get install --no-install-recommends -y \
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
libudunits2-dev python3-dev python3-pip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Let's add the following as well, as they'll be needed down the road:

libgdal-dev libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc

Comment on lines 16 to 18
permissions:
contents: read
packages: write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Are these strictly necessary? I haven't set them up in the other GHA Docker builds and they seem to work fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I got these permissions from the docs on publishing a package using an action! Perhaps they're outdated? In any case, if app-shiny-rpie pushes successfully without these permissions then I think we don't need them. Removed in 4e3ab96.

Comment on lines 38 to 45
# Tag the following types of images:
# * On a branch, tag with the branch name (e.g. `master`)
# * On a PR, tag with the PR number (e.g. `pr-12`)
# * On all events, tag with the short git SHA (e.g. `e956384`)
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Instead of tagging and creating an image for every commit, let's create an image for each git tag. That way, when we tag assessment-year-2024, we'll make a long-lived image with the same name that has the dependencies necessary to run the 2024 model already built/bundled.

.github/workflows/deploy.yaml Show resolved Hide resolved
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @jeancochrane. Eager to see this in action!

Two quick asks:

@jeancochrane jeancochrane merged commit 5d117e9 into master Oct 25, 2023
2 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/22-infra-updates-build-and-push-a-docker-image-for-the-model branch October 25, 2023 20:13
@jeancochrane
Copy link
Contributor Author

In the Packages sidebar, it doesn't correctly display the container type if attestation is enabled. You can turn it off by setting provenance: false like this: https://github.com/ccao-data/app-shiny-rpie/blob/master/.github/workflows/docker-build.yaml#L42C11-L42C11

Good call @dfsnow, done in edcfeeb!

It might be worthwhile to clear out the tagged images here once this is merged: https://github.com/ccao-data/model-res-avm/pkgs/container/model-res-avm

Hm, I don't seem to have permissions to delete images from our Packages repository -- are you able to either do that or grant me permissions to do so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Infra updates] Build and push a Docker image for the model
2 participants