Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to improve buildkit performance #671

Closed
wants to merge 3 commits into from

Conversation

beastawakens
Copy link
Collaborator

@nightfury1204 Following up from #670 I thought I'd take a stab at improving things.
Haven't tested this locally or anything so feel free to change whatever needs changing, but something like this feels right to me.
I also added some custom buildkit config following on https://www.sliceofexperiments.com/p/a-comprehensive-guide-for-the-fastest as it could be the default garbage collection also could decrease cache hits.
I think if you get this right, you could make some serious dents in everyone's build times!

@beastawakens
Copy link
Collaborator Author

@nightfury1204 Hey man, any movement on this at all? Would be a game changed in build times for v3!

@nightfury1204
Copy link
Collaborator

@nightfury1204 Hey man, any movement on this at all? Would be a game changed in build times for v3!

@beastawakens sorry I still haven't got the time to look into it. I will try to look into it, hopefully this week

@nightfury1204
Copy link
Collaborator

my understanding is this leverages caching for faster build which requires good amount storage in the node volume, so i don't want to enable it for all cases. This storage could cause problem with application workload depending on the ephemeral volume usages. If you can provide granular control on enabling this like enable it on the dedicated build node only, enable it by flag, control on the cache storage

@beastawakens
Copy link
Collaborator Author

Hmmm, considering that prior to the buildkit move, all the docker builds (build node or no build node) used docker cache in a similar way, I would hope this is equivalent. But yeah, maybe there's some issues with buildkit that need more management: moby/buildkit#1896 ?
It's just as it currently stands, you will never get any cache hit on any build on any convox version running buildkit, which is really bad from a build performance point of view 😔

@nightfury1204
Copy link
Collaborator

Hmmm, considering that prior to the buildkit move, all the docker builds (build node or no build node) used docker cache in a similar way, I would hope this is equivalent. But yeah, maybe there's some issues with buildkit that need more management: moby/buildkit#1896 ? It's just as it currently stands, you will never get any cache hit on any build on any convox version running buildkit, which is really bad from a build performance point of view pensive

I understand your point. But can you add the flexibility of enabling and disabling it and on the cache size

@beastawakens
Copy link
Collaborator Author

Hmmm, considering that prior to the buildkit move, all the docker builds (build node or no build node) used docker cache in a similar way, I would hope this is equivalent. But yeah, maybe there's some issues with buildkit that need more management: moby/buildkit#1896 ? It's just as it currently stands, you will never get any cache hit on any build on any convox version running buildkit, which is really bad from a build performance point of view pensive

I understand your point. But can you add the flexibility of enabling and disabling it and on the cache size

@nightfury1204 To keep things simpler, I've set it to only happen when you have a build node, and allow you to specify the build node disk size as well!

@beastawakens
Copy link
Collaborator Author

@nightfury1204 Any movement on this please?

@nightfury1204
Copy link
Collaborator

@nightfury1204 Any movement on this please?

Sorry I will test this next week

@beastawakens
Copy link
Collaborator Author

@nightfury1204 I saw you created some test releases 😄 Is this going to land soon? It will make a huge difference to lots of users!

@nightfury1204
Copy link
Collaborator

nightfury1204 commented Aug 22, 2023

@nightfury1204 I saw you created some test releases 😄 Is this going to land soon? It will make a huge difference to lots of users!

yeah i am testing but it's not properly passing the tests
so i am thinking that storage is the main issue here
I think it's need proper research on the implementation and how its working

@rhysawilliams2010
Copy link

@beastawakens thanks for creating this PR. I'm anxious to be able to share layer caching between build pods as well.
@nightfury1204 it would be great to be able to have an update to convox so that we can use layer caching on our builds.

@beastawakens
Copy link
Collaborator Author

@nightfury1204 I saw you created some test releases 😄 Is this going to land soon? It will make a huge difference to lots of users!

yeah i am testing but it's not properly passing the tests so i am thinking that storage is the main issue here I think it's need proper research on the implementation and how its working

Please see the original blog post I referenced for detail around the custom buildkit configuration: https://www.sliceofexperiments.com/i/94286668/configuring-a-more-generous-garbage-collection-policy-for-buildkit

You can also see more detail in the buildkit repo documentation here: https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md

@nightfury1204
Copy link
Collaborator

@nightfury1204 I saw you created some test releases 😄 Is this going to land soon? It will make a huge difference to lots of users!

yeah i am testing but it's not properly passing the tests so i am thinking that storage is the main issue here I think it's need proper research on the implementation and how its working

Please see the original blog post I referenced for detail around the custom buildkit configuration: https://www.sliceofexperiments.com/i/94286668/configuring-a-more-generous-garbage-collection-policy-for-buildkit

You can also see more detail in the buildkit repo documentation here: https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md

i went through it before that doesn't explain everything

@beastawakens
Copy link
Collaborator Author

i went through it before that doesn't explain everything

What's missing? I'm not sure I can see your test run results... Happy to help out on this if I can! I see in your release you changed the toml to use static byte sizes rather than percentages. Did the build node have a sufficient disk size attached?

@nightfury1204
Copy link
Collaborator

nightfury1204 commented Aug 28, 2023

i went through it before that doesn't explain everything

What's missing? I'm not sure I can see your test run results... Happy to help out on this if I can! I see in your release you changed the toml to use static byte sizes rather than percentages. Did the build node have a sufficient disk size attached?

@beastawakens percentages not working that's why i used static bytes and your code needed some fix. the way we run is that when we spin up a build pod it has buildkitd running. So may be each pod tries to access that amount storage or some something needed more tuning. That's why without seamless passing the tests and understanding how it allocates the volume if several build pod is running, i can not add this feature. I already discussed with the team(added the ticket in board), but we have other some high priorities.
For now the main questions that's needs answer:

  • if multiple build pods spins up, how it's going to allocate volume, since buildkitd running in each pods with that configuration?
  • Is the configuration applied per build pods buildkitd?
  • We don't keep buildkitd running. How's the gc will work in this case to clean the storage that's exceeding the storage?
  • If parallel build pods are running will it cause error for simultaneous write in the storage volume?

@beastawakens
Copy link
Collaborator Author

@beastawakens percentages not working that's why i used static bytes and your code needed some fix. the way we run is that when we spin up a build pod it has buildkitd running. So may be each pod tries to access that amount storage or some something needed more tuning. That's why without seamless passing the tests and understanding how it allocates the volume if several build pod is running, i can not add this feature. I already discussed with the team(added the ticket in board), but we have other some high priorities. For now the main questions that's needs answer:

  • if multiple build pods spins up, how it's going to allocate volume, since buildkitd running in each pods with that configuration?
  • Is the configuration applied per build pods buildkitd?
  • We don't keep buildkitd running. How's the gc will work in this case to clean the storage that's exceeding the storage?
  • If parallel build pods are running will it cause error for simultaneous write in the storage volume?

As we should be volume mounting from the underlying instance into each build pod that runs, they will ultimately share the underlying FS. In terms of how buildkit sees the usage of that volume, I'm not sure...
I would assume that gc would start when the buildkitd starts and analyse and take action if required.
Interestingly, buildkit provides some example of using fleets of build pods with a shared local cache here: https://github.com/moby/buildkit/tree/master/examples/kubernetes/consistenthash but that suggests the cache is dispersed between them and you should connect to the 'correct' pod to utilise the cache... 🤔

Maybe ultimately it will be easier to upgrade to buildkit 0.12 as it appears ECR now supports registry caching with that version... aws/containers-roadmap#876

@beastawakens
Copy link
Collaborator Author

@nightfury1204 I opened a separate PR #694 for the buildkit update 😄

@nightfury1204
Copy link
Collaborator

closed by #694

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants