Support `Socket` args from the CLI #6747

sipsma · 2024-02-27T03:56:29Z

Right now you can't invoke Functions that accept args of type Socket from the CLI. In theory, this should be as simple as supporting e.g. dagger call fn --sock unix:///var/run/docker/docker.sock or dagger fn --sock tcp://localhost:1234, etc.

However, we need to be very careful when implementing this to not accidentally create a backdoor where any (malicious) module could gain access to any host socket. I think this will require explicit tracking of which modules have access to a given socket based on being passed it as an arg (or being passed an arg like a Container that has a socket embedded somewhere in its DAG definition).

Also need to make sure this is done in such a way that we don't invalidate cache with any random IDs

Related: #6726 (comment)
Also related (plugs previous hole that made it possible to access these sockets from a module): #6748

The text was updated successfully, but these errors were encountered:

jedevc · 2024-02-27T11:14:49Z

I think this will require explicit tracking of which modules have access to a given socket based on being passed it as an arg

I think this sounds similar to what I was suggesting for secrets as well - #6601 (comment). Like I mentioned there, I think the tricky part is needing to handle the case where you pass objects around that contain secret fields (in a potentially nested way).

But I do really like this as the way that we perform isolation between these sensitive types, Sockets, Secrets, potentially even Files/Containers (not sure if these can be accessed via raw graphql today).

Regardless, kinda neat to have both thought up the same way of performing better module isolation, it definitely has my vote.

sipsma · 2024-03-20T17:40:43Z

Just to update, been slowly chipping away at the pre-reqs needed to support this (#6806 and #6836). There's one last PR needed to add support; it involves some internal refactoring that I need to coordinate with other refactorings going on now, but should make it to a release in the near future.

helderco · 2024-05-10T12:16:45Z

@sipsma, can you give an update on the pre-reqs here?

sipsma · 2024-05-10T16:04:40Z

@helderco Yeah sorry there's been a ton of work going on that I've been mentioning in other various PRs but I will summarize the status here.

Basically, in order to support this while also not creating a backdoor that would allow any malicious module to access to any socket on the host, we need fine grain isolation of session resources. But before I started this effort, the lack of isolation was a very deeply embedded assumption over almost the entire codebase.

Worth noting that this same problem applies to secrets, host services, registry auth, etc. so those holes are all gonna get plugged alongside this effort. The same general problem also applies to cache volumes once we want to isolate those.

There was a path to do this that just layered more hacks and confusing code on top of our existing pile of hacks and confusing code (which had grown "organically" during our various efforts over the past year, as is natural), but this felt like the right time to do things cleanly, so I have been going with that.

Thankfully it's already benefitted quite a few seemingly unrelated issues too.

The things already merged as part of this effort:

dagql ID format + digest improvements to support ID walking #6836
- Not related to isolation directly, but needed in order to identify what session resources are part of a given call without hitting O(2^n) codepaths, which will be needed for fine-grain isolation
- Also reduced the size of IDs from O(2^n) to O(n), which helps performance across the engine and cloud
engine: isolate buildkit client+session to each client #6806 + engine: nested exec simplifications and service fixes #7213
- Did some initial isolation and simplification of how sessions+clients are configured
- Also ended up being needed to fix a bug with services (which are tied to sessions)
engine: remove ftp_proxy hack #7228
- FIxed a really dumb hack we'd been relying on for a long time around passing random IDs associated with sessions/clients without busting their cache.
- Ended up also reducing the flakiness of our CI enormously
core: support automatic installation of custom CA certs. #7067 and engine: support for system proxy settings #7255
- While not directly related, support for CAs/proxies relied on some of the above PRs and also took care of a few more things needed for this effort (specifically, introduction of our own buildkit worker implementation that can be customized per session+client+llb vertex)
core: don't bust cache with random OTEL values in function calls #7336
- Once again not directly related to isolation, but was a bug fix that was only possible via the above PRs

What I'm working on now:

WIP refactor of server/sessions/buildkit-interfaces #7315
- This is a follow up to the first refactor that did some initial isolation and finishes the rest of isolation required of various buildkit/dagger "entities" involved in all this
- Also does a ton of simplification, which is becoming more and more necessary as the current convolution is proving to be a bountiful source of bugs
The last follow up after the above will be the actual isolation of the resources themselves, using dagql IDs to find exactly what a given function call should have access to and only giving it access to those resources.

Other things that will be made possible once this is finished:

We will no longer need the shim at all, which is currently a source of performance overhead and extreme convolution
We can stop doing insane grpc-in-grpc tunneling (something we've picked up from buildkit) just for modules/nested execs to connect back to the engine, which is another source of performance overhead and confusing code/behavior.
Storage drivers, specifically cache volumes, have a much simpler path to implementation via the new buildkit worker added to support all this

cc @aweris @shykes I know you have been eagerly anticipating this and it's been taking a while, but it's all nearing it's end now and in the big scheme of things will benefit quite a bit besides the issue itself

helderco · 2024-05-10T23:11:14Z

Outstanding! 🙌

Thank you for that amazing update (and amazing work!). I’ve seen those issues and PRs popup here and there but didn’t have time to dig into everything, so I very much appreciate the birds-eye view here. It's exactly what I needed.

There was a path to do this that just layered more hacks and confusing code on top of our existing pile of hacks and confusing code (which had grown "organically" during our various efforts over the past year, as is natural), but this felt like the right time to do things cleanly, so I have been going with that.

Definitely! I always appreciate investing in some housekeeping to keep things more maintainable, otherwise we drown in technical debt.

sipsma added this to the v0.10.x milestone Feb 27, 2024

sipsma mentioned this issue Feb 27, 2024

core: only allow Host.unixSocket to be used from main client #6748

Merged

sipsma modified the milestones: v0.10.x, v0.10.1 Feb 27, 2024

sipsma self-assigned this Feb 27, 2024

sipsma mentioned this issue Mar 2, 2024

engine: isolate buildkit client+session to each client #6806

Merged

sipsma modified the milestones: v0.10.1, v0.10.x Mar 5, 2024

sipsma mentioned this issue Mar 6, 2024

dagql ID format + digest improvements to support ID walking #6836

Merged

3 tasks

sipsma mentioned this issue Mar 21, 2024

[WIP] Finish isolating sessions to each client #6916

Closed

shykes mentioned this issue Apr 27, 2024

Add socket support to CurrentModule #6726

Open

sipsma mentioned this issue Apr 29, 2024

Namespace cache volumes by module #7211

Open

sipsma mentioned this issue May 13, 2024

Remove shim #7367

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `Socket` args from the CLI #6747

Support `Socket` args from the CLI #6747

sipsma commented Feb 27, 2024 •

edited

jedevc commented Feb 27, 2024

sipsma commented Mar 20, 2024

helderco commented May 10, 2024

sipsma commented May 10, 2024

helderco commented May 10, 2024

Support Socket args from the CLI #6747

Support Socket args from the CLI #6747

Comments

sipsma commented Feb 27, 2024 • edited

jedevc commented Feb 27, 2024

sipsma commented Mar 20, 2024

helderco commented May 10, 2024

sipsma commented May 10, 2024

helderco commented May 10, 2024

Support `Socket` args from the CLI #6747

Support `Socket` args from the CLI #6747

sipsma commented Feb 27, 2024 •

edited