Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docker): speed up your build using caching #25422

Closed
wants to merge 0 commits into from

Conversation

alekseyolg
Copy link
Contributor

SUMMARY

Speed up your build using caching. Caching will prevent you from rebuilding all the layers that are already in Dockerhub.
https://docs.docker.com/engine/reference/commandline/buildx_build/

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@sebastianliebscher
Copy link
Contributor

Nice improvements!

ARCHITECTURE_FOR_BUILD="linux/amd64 linux/arm64"
else
# Login and push
docker logout
docker login --username "${DOCKERHUB_USER}" --password "${DOCKERHUB_TOKEN}"
DOCKER_ARGS="--push"
DOCKER_ARGS="--push --cache-from=type=registry,ref=apache/superset:cache --cache-to type=registry,ref=apache/superset:cache,mode=max"
Copy link
Contributor

@sebastianliebscher sebastianliebscher Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious. Doesn't this always overwrite the same image cache during a single build if using the same --cache-to arg for every image? E.g. superset:websocket overwrites superset:py310 cache. And right after that, superset:dockerize overwrtites superset:websocket cache etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebastianliebscher Thanks for your comment, you were completely right. I did what you said and got the cache overwritten by the new cache.
I eliminated this oversight and now everything should work fine in my opinion.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Sep 28, 2023
else
# Login and push
docker logout
docker login --username "${DOCKERHUB_USER}" --password "${DOCKERHUB_TOKEN}"
DOCKER_ARGS="--push"
ARCHITECTURE_FOR_BUILD="linux/amd64,linux/arm64"
CACHE_TO="--cache-to type=registry,mode=max,ref=apache/superset:cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cruft up the apache/superset registry and will clobber the tag with each branch build. Let's just leave the local cache-to for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new cache will only be uploaded to Dockerhub after the doc is added to the master branch. Tests for pull requests will be collected much faster due to the presence of a cache, since only what has changed will need to be collected. Thus, this will significantly increase the speed of building the image and will allow you to use the ARM64 architecture for supersets in the future, as I tried and the current image based on the ARM64 architecture takes more than an hour to build due to the fact that many libraries need to be compiled, unlike AMD64. Or we switch to using the arm64 architecture and wait for all tests to complete for approximately 1.25 hours, which in my opinion is too long.
Actually, because of this, I came up with the idea of ​​giving the superset to use the cache for assembly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, and this is great! I just think that it would be better to have a dedicated "cache" registry instead of conflating the "main" registry's contents. We can raise this in the next community meeting to see how folks are thinking around this area. I think that for now, using a local cache will probably suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be ready to help if necessary, this way we will quickly come to the possibility of building both the AMD 64 and ARM 64 versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds good. I think that we can still build both versions, just without pushing to the main registry. If you don't mind switching back to the local cache, I'm happy to approve/merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's then try to build both architectures with a local cache, but I repeat - then with each code proposal you will have to wait more than an hour.
And we can return to the current issue after the group meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to speed this up by parallelizing the builds? It looks like the current setup is just running each CMD in a serial fashion. we would probably benefit from breaking this workflow up per platform/image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ephemeral env uses an AWS image registry (I think sponsored by Preset?). Maybe this could be used instead of Dockerhub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craig-rueda It is possible to parallelize the build process, but this will need to be done step by step, since various problems may arise during the build. To begin with, I plan to separate the assembly of the superset itself, websocket and dockerize. After adding a code proposal to the master branch, you can try to assemble the ARM 64 and AMD 64 architecture in parallel.
@sebastianliebscher Personally, for me it makes no difference where to load the cache for the build, I just need an example that can be used to build the Superset. Then I can configure the build, and you can check the quality of my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants