Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: run base image builds in parallel #120

Merged
merged 12 commits into from
Oct 19, 2023
Merged

Conversation

anayden
Copy link
Collaborator

@anayden anayden commented Oct 13, 2023

There are 2 changes to the base image build process in this PR:

  1. Parallelise different stages and platforms for the base image build. This per se reduces full build time from ~90 minutes to ~45, while arm build on an x86 machine being the slowest part: 2 sub-builds for arm on x86 via QEMU take 40 minutes each, whereas everything else is built in ~10 minutes. Nevertheless, this change can be accepted in isolation for moving from 90 to 45 minutes per full build.
  2. An additional change is to use BuildJet's (https://buildjet.com) native ARM worker for arm builds. This significantly speeds the ARM builds up (everything is under 10 minutes now), but costs money: 1 minute of ARM builder is $0.004; full re-build (without cache) is ~15 minutes of workers total or $0.06; 10 base image builds per month is $0.6. However, a malicious actor might in theory create hundreds of PRs, triggering hundreds of builds and costing money. Doing so doesn't make much sense for the attacker (unless R2DT has enemies qualified in Dockerfile trickery, or unless BuildJet itself doesn't decide to earn extra $10 on this initiative), but is a possibility.

We can adopt just item 1, or both 1+2, or alternatively we could deploy a GitHub Actions runner on the free Oracle Cloud ARM64 machine, but I'm not sure how much of a speedup that would be. That would be an alternative to the item 2.

@github-actions
Copy link

github-actions bot commented Oct 13, 2023

Docker image tag(s) pushed:

rnacentral/r2dt:pr-120

Labels added to images:

org.opencontainers.image.created=2023-10-13T16:16:05.545Z
org.opencontainers.image.description=Visualise RNA secondary structure in consistent, reproducible and recognisable layouts
org.opencontainers.image.licenses=Apache-2.0
org.opencontainers.image.revision=6b912eaf0a37beca7693b1b1d6735eb9cbf977b6
org.opencontainers.image.source=https://github.com/RNAcentral/R2DT
org.opencontainers.image.title=R2DT
org.opencontainers.image.url=https://github.com/RNAcentral/R2DT
org.opencontainers.image.version=pr-120

@AntonPetrov
Copy link
Member

The cost is not a problem, I am happy to bank roll this for now. Thank you for thoroughly investigating all the options!

Just one quick question: how would you suggest we manage the tags on Docker Hub? It seems like soon we'd need some kind of policy for cleaning old tags without deleting useful cache accidentally:
https://hub.docker.com/r/rnacentral/r2dt-base/tags

I can think of someone renaming a couple of build stages for readability without thinking that it would cause cache rebuilding and stale images on Docker Hub.

@anayden
Copy link
Collaborator Author

anayden commented Oct 17, 2023

The cost is not a problem, I am happy to bank roll this for now

Then could you please signup at buildjet.com and add this repository to your account? I'll then remove it from mine. It's not urgent, though.

how would you suggest we manage the tags on Docker Hub? It seems like soon we'd need some kind of policy for cleaning old tags without deleting useful cache accidentally:

Hm…I wouldn't care much about cleaning up old tags: that's an open source repo and DockerHub provides unlimited storage for such repos, right? People who use the images would normally only use ones that are either versioned (v1.4, etc.) or belong to their PRs. You might want to put a bit of more info to the overview page, listing all useful tags (for instance, see how Python lists tags in their overview page).

I can think of someone renaming a couple of build stages for readability without thinking that it would cause cache rebuilding and stale images on Docker Hub.

Even if someone does that to the dockerfile, it's not a big deal. We'll lose cache and the build would take +10 extra minutes once (before new caches are populated) and that's it. It's also an easy issue to notice when reviewing a pull request. Again, I wouldn't be worried much about this.

@AntonPetrov
Copy link
Member

Then could you please signup at buildjet.com and add this repository to your account? I'll then remove it from mine. It's not urgent, though.

Done 🤞

I wouldn't care much about cleaning up old tags

Cool.

listing all useful tags

I listed one tag (v1.4) but I'll need to figure out how to do it more systematically and link to Dockerfiles later.

it's not a big deal

👍

I am happy to merge this PR once I can verify that an image built like this is fully functional - but for that I need to deal with #119.

@AntonPetrov AntonPetrov merged commit b6ce5e7 into develop Oct 19, 2023
16 checks passed
@AntonPetrov AntonPetrov deleted the parallel-base-build branch October 19, 2023 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants