New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic namespacing in API #3345
Comments
"Session" and the use of "project" here are related but not entirely the same. Session is an already existing concept today. It's how buildkit scopes certain resources (localdirs, gateway containers, among others). Sessions have a random ID (they also have a name you can set, but that's different than the ID) and are tied to a single client connection. Once that connection is closed, the session is gone.
Project as it's used here (and as I understand it, correct me if wrong) would be more persistent than a session because it would be scoping cache dirs. If you use a cache dir, disconnect and then reconnect, it should be possible for the cache dir to still exist (as contrasted with e.g. gateway containers where they are gone when the session ends). The direction I've been leaning more and more towards for a while is that the session concept we get from Buildkit is really hard to work with and often not what we want. What we really want is more like the "project" concept above where resources are less ephemeral. You should be able to let a service keep running even if you disconnect.
The vague, incomplete idea that's been knocking around my head is that we maybe should invent our own "dagger session" concept which encompasses not only services and local dirs (like Buildkit sessions) but probably also the loaded schema. Can add cache dirs to that too. Dagger sessions would have some degree of persistence. If we wanted to call "dagger session" a project instead that'd make sense to me in isolation, we just need to sort of the conflicts with the existing concept called project. The two concepts have overlap in that they may both be associated with a schema, but in other ways they are not really the same. I'd actually be in favor of calling the "persistent-session thing" a "project" and the "config for an extension" (i.e. what's currently called "project") something else. Hard parts:
|
In order to avoid confusion with existing definitions of "session" and "project", for the purpose of this discussion I propose using the more neutral and functional "namespace". Analogy is eg. to kubernetes namespaces, containerd namespaces, and perhaps docker context? Doesn't have to be the actual name we use, it just seems useful to have our own name for now, to avoid interference. EDIT: add comparison to containerd namespace |
containerd has namespaces too, which could be especially relevant to us if we decide to use the containerd worker |
Update: we have gone with option 2 (defer namespace entirely). Now we are facing an urgent need to introduce namespacing, in order to implement persistence of cache volumes in Dagger Cloud. I am resurrecting this thread to make sure we avoid breaking the UX. |
@shykes What is the connection between namespaces and cache volume syncing? I can imagine updating the cache volume API to support settings that would impact the cloud's sync of them, but I don't see yet the connection point with namespacing, which has more to do with multitenancy in my mind. |
The connection is that when creating a cache volume in your code, you need to give it a name; that name is chosen with an assumption that cache volumes are namespaced. How namespacing works exactly, will impact when my cache volume is reused, and when it's not. My understanding is that, in the current implementation of Dagger Cloud, cache volume names are global to the entire Cloud organization. So if I run 5 unrelated pipelines, all with a cache volume named "npm", they will be synced as a single volume. This is leading to a) stopgaps where eg. @jpadams is writing code with globally unique cache volume names such as Instead of designing a cloud-specific way to namespace cache volumes, I would like us to design namespacing platform-wide, then implement it in Cloud. |
I'm still missing how "cache volumes are shared" leads to us needing to namespace by To be clear, I'm not saying we don't want some type of namespacing ever (namespacing by npm version could be a nice feature to have for this specific case), it just feels independent of cloud synchronization.
@marcosnils can you clarify this point? I understand that in the backend we store layers+cache-mounts under org-specific prefixes, but that's a 100% totally internal implementation detail that users need zero awareness of. Where else does namespacing come into play that it impacts users? |
Sorry that was confusing on my part. I’m conflating cross-org global namespacing (unrelated and very temporary cloud issue) with org-wide namespacing, which would require volume names more like Also just illustrating that design considerations on the cloud side have impact on DX on the engine side, so they should be considered carefully. |
Oh okay, so we're thinking about when users want to split up their org into separate "sub-orgs" (teams, etc.) and then have independent synchronization of cache mounts? If so, makes much more sense and I can see the conceptual connection. In terms of the implementation, I believe it still is orthogonal to cloud synchronization right now, but that could change in the future. Reason being that cloud sync currently works with the low-level name of the cache mount in buildkit, which would be one or two layers of abstraction below namespacing dagger applies. So it shouldn't need any awareness of namespacing, it just syncs based on whatever the namespace "compiles" to in terms of the low-level cache mount name. One hypothetical situation where a connection point arises is if we want some sort of "tiered" caching that reflects org structures. So if you are in OrgFoo and on TeamBar, which is a part of OrgFoo, then maybe you want to be able to say something like: "give me the npm cache for TeamBar, but if there isn't any then check to see if there's any in OrgFoo". I would have previously assumed that was a feature a ways off in terms of needing, but maybe not? Either way, agree it's worth more thought if we are starting down the "org hierarchy" path already. We should also consider whether layer caches would be impacted by this too though (mostly in terms of storage, pruning, pricing, etc.). |
One way to describe the problem: when I write code describing my pipeline logic (which includes creating and naming my cache volumes). I shouldn't have to know everything else that runs, or will run in the future, in the same Dagger Cloud org as my pipeline. But if Dagger Cloud considers cache volume names to be global to the org, then that's effectively what I have to do.
What is an example of such a low-level volume ID, and what other inputs go into "compiling" it? I still feel like when developing my pipeline logic, I need to know how my cache volumes will be namespaced - "don't worry about it" doesn't help me decide how to name my volume. |
Agree that's a problem, though I think the solution to that in particular is either (or both) of the following:
But I don't think this has to do with namespacing. Even if you could namespace, you'd still have the same problem of needing to know what exists in your namespace. Maybe we have different definitions of what namespacing means? From the original description of this issue, I thought it was just a way of scoping the same name under different "contexts" (aka namespaces) for the purposes of isolation and multitenancy, which I think only comes into play here if we want tiered caches from org->team that I mentioned in my previous comment.
Right now (with no namespacing), the name is a just a hash of the cache volume id, which currently only contains the name the user provided. With namespacing, presumably the namespace hierarchy would also be mixed in to the hash. I agree that none of that helps you decide what to name your volume, the point was that magicache doesn't need to care about this, it just syncs what it's told to sync. It would only need awareness of namespacing if we want tiered caching, otherwise it's just an opaque ID of a cache volume to sync. |
I didn't understand this part, what does "vendored out to users in your org" mean? |
I'm imagining a top-hat for an org creating their org's standalone environment and including The DX would be that when users in that org need to use a cache volume, they just import their org's environment and then can choose cache volumes from it. So if top-hat wants their support
Either one of those above would be returning a The overall idea is that Zenith gives orgs a way of cataloguing cache volumes for end-users, which solves the problem of "when I write code describing my pipeline logic (which includes creating and naming my cache volumes). I shouldn't have to know everything else that runs, or will run in the future, in the same Dagger Cloud org as my pipeline." You know what to use because your org's environment tells you what is available to use. There's actually a bunch more interesting features we could layer on top of something like this, but I'll try to not derail the conversation too much yet :-) |
OK that's pretty cool - perhaps even mindblowing ;) This gives me a glimpse of a deeper integration with Dagger Cloud, where devs can take advantage of its features programmatically. Which I love. BUT it also scares me a little bit, wouldn't this cause cloud-specific, and org-specific details to leak all over the place into otherwise perfectly portable code? I feel like we could have the best of both worlds, if we could make this similar to the dynamic secrets API: 1) fully programmatic; but also 2) traces a path to keeping the code portable in the future, with "secret providers". I feel like your idea for cache volumes isn't quite there, but perhaps it could be? |
@shykes Sorry I just remembered this whole thread and realized I never got back
I'd argue that "cloud-specific, and org-specific details" are not something to be avoided. We absolutely 100% want portable, re-usable environments, but ultimately end-users need to take those environments and use them to their precise needs, and it's great if that's just code too. I guess the analogies would be:
I can't totally imagine what this would look like in a way that's different than just having the ability to import a cache mount definition that's defined in an external environment, might need clarification on what you are imagining for something like this. |
@marcosnils @aluzzardi can you provide additional context on the current status of this? |
This is what I ran into recently: I have a project with two Go binary builds and an e2e test run (not to mention a golangci-lint module that uses the same Go module under the hood). All three steps use the same Go module that define the following cache volumes: WithMountedCache("/root/.cache/go-build", dag.CacheVolume("go-build")).
WithMountedCache("/go/pkg/mod", dag.CacheVolume("go-mod")) That results in the behavior outlined above: essentially the last one to complete wins and overwrites the cache of the other two. In this case each three step would need their own "namespace" to work. I also wonder if the namespace should include other information, like branch or commit hash (similarly to how caching works on GHA and other SaaS CI providers). Let's say I have two PRs running the same builds in parallel. They would also overwrite the cache. One potential solution I can imagine is initializing a new module with some context accessible from the module: // When calling the module
dag.Go(/*add some context here*/).FromVersion("1.21.5)
// In module
dag.Context() I considered adding a cache key argument to my Go module (I may still do it), but considering the rather complex API (partly due to the method chaining) it wouldn't be trivial. (For reference: here is the module) |
@sagikazarmark in your case I think you are running into 3 distinct (but related) consequences of lack of namespacing :
In summary: the core namespacing problem is not specific to Dagger Cloud, but it can manifest itself in more visible ways when using Dagger Cloud with cache volumes, and that is what you are encountering. If I’m right, then grouping all three steps in the same Dagger session (ie. in the same CI job) should solve the “last writer wins” problem. |
@shykes makes sense, I'll try that, thanks. TBH, I'm not that comfortable with bundling all CI steps into a single run though. For one, I don't get the same feedback on GH right now with checks if I do that. Also, in some projects it's just not possible to scale the runner vertically (eg. OSS projects), so splitting CI runs is often for performance reasons (I don't have the numbers to back that up though...yet). |
I completely understand. To be clear I only suggested this as a stopgap to alleviate the immediate pain. I absolutely agree that you shouldn’t have to combine runs in order to get the cache volume sharing semantics you expect. |
I am deprecating this issue in favor of #7211 , which is more up-to-date and more narrowly focused on cache volumes. |
Overview
In the new cloak core API, cache volumes are associated with a key provided by the client. Buildkit does not provide a facility for namespacing these keys, so by default every client that looks up the cache volume
foo
on the same engine, will use the same cache volume. This is not practical, and will probably lead to ad-hoc namespacing solutions. It would be better for Dagger itself to handle namespacing. But how?As explained by @vito (see original discussion below), the answer is probably namespacing by something roughly equivalent to a "project". This issue is to discuss whether this feature is actually needed, and if so how to design it.
Original discussion
As discussed in #3287 (comment)_ :
You are right. I see two possible paths here:
I'm leaning towards option 1. Mostly because it seems really hard to ship a UX with global scoping, then change it to support namespacing that can support something like a project. I could be wrong though.
Note: I think @vito 's use of "project" is roughly equivalent to @sipsma 's use of "session" in this discussion, is this accurate?
This part I'm OK with deferring to later. I think if we have a good project scoping primitive, we can find a way to add this feature later.
IMO we need a UX primitive that works in a single-tenant and multi-tenant context. Totally agree that we don't want to actually make a buildkit instance multi-tenant (at least not with untrusted tenants) but we might want to implement multi-tenancy at a higher level (with cloud etc). So if the UX has a hook for that, it would be ideal.
The text was updated successfully, but these errors were encountered: