Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: record seen cache volumes at resolver level #5786

Merged
merged 2 commits into from
Sep 18, 2023

Conversation

marcosnils
Copy link
Contributor

@marcosnils marcosnils commented Sep 15, 2023

Fixes https://linear.app/dagger/issue/DEV-2629/automatically-create-cache-volumes-in-cloud-derived-from-pipelines

This PR tracks the seen cache volumes at the resolver level so they
can be automatically discovered for syncronization at the engine start
/ stop time. This solution isn't ideal since it should be getting the
cache mounts from buildkit's store instead of the resolver but we
couldn't find a way to make it work with @aluzzardi. We're falling back
to this workaround until maybe @sipsma can shed some light.

Signed-off-by: Marcos Lilljedahl marcosnils@gmail.com

@marcosnils marcosnils requested a review from a team as a code owner September 15, 2023 18:12
@marcosnils marcosnils force-pushed the fix/sync_seen_cache_volumes branch 3 times, most recently from 4f9d1f0 to a1e35f2 Compare September 15, 2023 18:26
@marcosnils marcosnils requested review from vito and aluzzardi and removed request for a team September 15, 2023 18:57
Copy link
Member

@aluzzardi aluzzardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left a few nits

engine/cache/mountsync.go Outdated Show resolved Hide resolved
allCacheMounts[k.(string)] = struct{}{}
return true
})

for _, syncedCacheMount := range syncedCacheMounts {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want to support manually synced mounts? I guess that's for backward compat / avoiding regressions. Should probably put a fat warning that we should remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually now that you mention about it I think we can remove this altogether without any regressions. IIUC the current code will always call the stopCacheMountSync regardless if the volume if used or not. Now that we can detect which volumes the engine ever used during its lifetime, seems more accurate to only sync those. WDYT?

@@ -560,6 +561,8 @@ func (container *Container) WithMountedFile(ctx context.Context, bk *buildkit.Cl
return container.withMounted(ctx, bk, target, file.LLB, file.File, file.Services, owner)
}

var SeenCacheKeys sync.Map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe by convention in the codebase, this should be achieved by passing something in core/schema, and having that something passed again to containerSchema (just like the leaseManager, buildCache, etc) rather than using a global variable.

/cc @vito who added the most in there recently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and the thing is that I'd have to pass this "something" across multiple layers so it gets all the way to the schema package. Thing is that the CacheManager defined here (https://github.com/marcosnils/dagger/blob/a1e35f20a546f00c4e4fcc13581b1ab833a1640d/cmd/engine/main.go#L760) which has the mount synchronization logic a methods gets passed as a solver.CacheManager here (https://github.com/marcosnils/dagger/blob/a1e35f20a546f00c4e4fcc13581b1ab833a1640d/cmd/engine/main.go#L802) to the buildkit controller and there's not to many things we could do from there. Additionally, the cache mount synchronization process happens in the main package here (https://github.com/marcosnils/dagger/blob/a1e35f20a546f00c4e4fcc13581b1ab833a1640d/cmd/engine/main.go#L802) which doesn't have way to access the schema or core structs to fetch the seen cache mount keys 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK as a special-case since this is really an engine-wide concern. We'll see if anything changes once we figure out cache namespacing/etc.

This PR tracks the seen cache volumes at the resolver level so they
can be automatically discovered for syncronization at the engine start
/ stop time. This solution isn't ideal since it should be getting the
cache mounts from buildkit's store instead of the resolver but we
couldn't find a way to make it work with @aluzzardi. We're falling back
to this workaround until maybe @sipsma can shed some light.

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
core/container.go Outdated Show resolved Hide resolved
Co-authored-by: Alex Suraci <suraci.alex@gmail.com>
Signed-off-by: Marcos Nils <1578458+marcosnils@users.noreply.github.com>
@gerhard gerhard added this to the v0.8.6 milestone Sep 18, 2023
Copy link
Member

@aluzzardi aluzzardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marcosnils marcosnils merged commit 669898a into dagger:main Sep 18, 2023
32 checks passed
@marcosnils marcosnils deleted the fix/sync_seen_cache_volumes branch September 18, 2023 16:10
marcosnils added a commit to marcosnils/dagger that referenced this pull request Sep 18, 2023
this is a follow-up of dagger#5786 where
we have introduced automatic discovery of cache mounts so it's not
needed to check if the list is empty now.

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
marcosnils added a commit that referenced this pull request Sep 18, 2023
this is a follow-up of #5786 where
we have introduced automatic discovery of cache mounts so it's not
needed to check if the list is empty now.

Signed-off-by: Marcos Lilljedahl <marcosnils@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants