-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make passing caller environment variables to modules more seamless #6112
Comments
My preference would be to start with something more lightweight, like adding a convenience in the CLI, rather than making irreversible changes in the core... We have a lot to lose if we get this wrong: part of the appeal of the Zenith design is the encapsulation. It's easy to understand what the inputs and outputs of a function are. We should be very cautious about anything that risks breaking that. Here's what a CLI convenience might look like:
The end result of this CLI convenience is that, in the most common cases, one could set env variables in advance, and omit the bulk of repetitive arguments. This would solve the UX problem that is causing this feature request in the first place. All without compromising the hermetic nature of functions. |
When designing this, please consider the Enterprise use cases too. For example, I have to pass-through my company proxy (sometimes authed) to multiple areas for us to be able to use dagger. Because anything that goes out of our network has to go through our egress proxy. This is kinda? related - #6077 |
@shykes I like the general idea of your proposal in that we could make this a CLI-exclusive piece of functionality to start. The auto-matching of arg name to env var would be the most magical part and I can definitely think of various corner cases or situations it might not handle super well. But that's obviously fine provided we at least improve the situation significantly. I'm thinking about:
But again, entirely possible those cases don't matter, so I'm curious from users who have run into a need for this: would the idea Solomon outlined work for you? I'm especially interested in the env vars you want from the caller and whether those translate well into required args w/ matching names. |
I have two use cases in mind where passing environment variables through would be beneficial:
I understand using GHA's or Jenkins' secret management solution is not ideal for many reasons, but it's the reality today, especially for companies just getting started. Being able to introduce Dagger to an existing pipeline step-by-step is crucial for adoption. On the DX side: I realized I just don't enjoy writing a function with dozens of parameter. Everything in me tells me not to do it. If I have to give up those principles in Dagger...that's not a great experience. On the UX side: I don't enjoy passing an endless number of arguments to commands either. Regardless if it's in the GHA yaml hell or in my Makefile. I'm not sure adding new behavior on top of function arguments is going to improve that DX/UX for me. I would have to keep in mind how it works, that behavior is not immediately obvious from the code. Also, having to match parameter names with env var names....is a deep rabbit hole I'd like to stay away from (learned my lesson with Viper). Combining this issue with another one (too many function arguments), one potential solution is being able to pass in structs to functions: type Args struct {
RunningInGHA Optional[bool] `env:CI`
GitHubToken Optional[*Secret] `env:GITHUB_TOKEN`
}
func (m *Ci) Ci(args Args) error {
// ...
} Keeping security in mind, I agree that this should probably be limited to main modules (though it may become an inconvenience at some point if you have to pass a GitHub Token to every single module). I could also imagine adding an explicit list of env vars to dagger.json to pass through. I wouldn't mind using some sort of experimental flag/config for this while the final details are being worked out (eg. dagger.experimental.json), although that's another incredibly deep rabbit hole. I don't think the case for env vars is going away anytime soon and changing the semantics drastically and implicitly is going to make it harder for people to understand what's going on. |
Thanks for the thoughts @sagikazarmark, still digesting fully but want to leave some immediate thoughts before I forget:
func (m *MyMod) Fn(struct {
Foo string
Bar Optional[int]
}) error {
// ...
} But we haven't fully fleshed it out and there's a handful of problems to figure out still. Support for custom struct tags (which could encompass the idea of the Just letting you know that it's on our radar.
The idea being that we bucket the sorts of use cases that call for "complex interaction between the caller host and modules" into "advanced use cases" and thus require that rather than invoking the CLI directly users write Dagger SDK code that implements the logic (reading the right env vars or other host resources and passing them as the right arguments). So with #5993 you'd be able to write a short dagger script that runs on the host but can still easily call modules. You'd just use I think approach strongly requires that the majority of use cases calling for this behavior are those that run in CI or other "non-human" environments. If, however, many users want this "auto-read env var" support for use cases that involve direct human interaction, it breaks down and would at minimum require tons more polish or just be non-viable. The most compelling argument for this is that Dagger is already scriptable and runnable outside of modules, so why not use that existing functionality to support these sorts of use cases. It also makes sense in that it's may be a sisyphean task to really support every possible variation on how users could want to map caller env vars (and other host resources) to module function args. I could keep going but curious what others' immediate reactions are. |
This is a MAJOR part of the appeal of dagger for us. Simply being able to run the same pipeline in local and in CI is a top feature of dagger for us. We simply can't drop our existing CI systems and with dagger, having to maintain dagger code + CI system DSL will reduce the appeal of dagger within my company. Can I add another suggestion? Tell me if I'm crazy and this is a no-no. How about using dagger.json to have a We can even give the choice to the module developer on when they can use the ENVs. They are exposed as internal ENV to every function within the module but the function dev will have to do something like I don't know if there are any architectural/technical limitations to something like that. But I can see a few benefits Pros
Cons
|
Thanks @sipsma!
Yeah, I'm aware of that feature, but it falls in the bucket of "I don't enjoy writing that code".
I haven't decided if it's the right way to think about this, but one of the reasons I liked switching to Zenith is the ability to drop Mage (and even Go) as a dependency for running my pipelines. If I understand correctly, this would essentially bring those back as dependencies. I can't really imagine how this would look like, particularly the env var pass through, but I probably wouldn't want some additional step or code in my pipeline just to read env vars.
My immediate reaction to your second point: I just want env vars to work. 😄 I'd prefer if I it would require minimal work from me to achieve that. And I also want it to be obvious to whoever reads my pipeline code how it works. Doesn't mean I'm right though. 😄 |
This is essentially what I'm looking for as well. Env vars 'just work' in almost every tool out there, I don't want to have to explain to colleagues that they now need to do something else with env vars.
I like this idea, but I'm not sure how well it would work in practice. The config side is simple andeasy to understand, Some form of explicit fetch would be required to avoid breaking the above mentioned workflow of "if a parameter value is provided use it, if none is provided use a default value I set here" by injecting a value in the middle of those conditions, but that also introduces another condition a module might have to check...
Also fine with an implementation like this, except the interactive prompt bit. No need for that if it can be explicitly enabled/disabled. As long as it's quick and simple to 1) see if this behaviour is on/off and 2) change the behaviour to on/off I'm happy. A really clear way of displaying this to users is also going to be super helpful - perhaps something in the |
Unfortunately, the reason env vars "just work" in other tools is also the reason devops is such a mess, and Dagger is needed to clean it up: the composition model is 40 years old, and it's just not good enough. If duct-taping shell scripts together with env variables and yaml actually scaled, we wouldn't be here having this conversation. A big part of the Dagger "magic" is balancing a) a better composition system, with b) a familiar user experience that fits into your existing stack. So that's our dilemma: to deliver on a), Dagger must break free of the traditional use of env vars for composition (because they are basically a worse version of arguments). But to deliver on b), we need env vars to be a first-class citizen in our UX. I just want to manage expectations: we will bend over backwards to make the UX better. But we will not, cannot compromise on the scalable composition model. A good rule of thumb for evaluating whether a proposal "breaks the composition model": does it create a system of "shadow arguments"? If the answer is yes, then it breaks the composition model and we shouldn't do it.
IMO this is a shadow argument system. Just use arguments + UX convenience.
IMO there's no benefit to asking module developers to think about env variable names. They already chose a name for their argument. Why make them choose a different name? That's just a step towards a shadow argument system.
It's a UX detail, but I think it will need for security reasons. Picture an untrusted module with an argument called
Yes agreed. |
I see what you mean here and I agree the I'll give an example where just arguments would be messy and repetitive. Within my company, we have to utilize egress proxies for a lot of stuff we do, so if we pass the proxies as args, Don't get me wrong, I really like the isolation and composition nature of what zenith is aspiring to do. But, having to jump through hoops to set up a ubiquitous env like the proxies might become too tedious. No matter how hard we try, we can't have dagger modules behave exactly the same way in CI systems like they behave on local machines. There are restrictions that make it necessary to tweak the behavior of a module based on where it's running. Environment variables play a big part in CI systems and having a easy way to pass them through will help adoption of Dagger. Less verbose we can make our existing DSL (which is still needed in some capacity to run dagger) the better chance people will like/use it.
Being in the face of the module consumer (warnings/documentation etc.) about env var consumption of the module could alleviate this (could also get annoying). But yeah this is an issue inherent to most CLIs today and I haven't come across many that actually warn/inform users of ENV var usage. I am glad Dagger is trying to break away from this.
Don't module developers still have to think about env vars like proxies? Without them, their module won't work.
An interactive prompt won't work in CI though. So there will have to be some way to tell a module it's running in CI and run non-interactively. Today with non-zenith dagger I do this by looking at the CI system env vars. I apologize It's a long winded response without any new proposals for a solution. But I am trying my best to represent the big Enterprise landscape from my experience and help Dagger run virtually anywhere. |
The only place where using env vars are beneficial is where you execute a pipeline using ‘dagger call’. I’d probably focus on those cases. I agree that env vars break composition in every other scenario (they are like global vars). But for pipeline entrypoints called from CLI, I’d argue they are actually part of the interface. Here is what I would really like to avoid: passing countless arguments to commands just to avoid using env vars AND passing countless arguments to functions. As I said before: I don’t enjoy writing and maintaining those at all. |
All good points - I didn't properly consider how some automagic around env vars might compromise the composition model. Thanks for explaining that! If the end result of improving the env vars experience is a CLI convenience and more explicitness in the CLI that's fine - I'd be quite happy with that.
Interactive prompts (and digging through what is often several wrong answers before finding the right way to disable) are often a pain point, alleviating that is going to be important. To reply to your example, other marketplaces typically put the onus on the user to be responsible for knowing what they're downloading/running. It's trivially simple to install and run a malicious package from npmjs/pypi for example, and so the process of using those package repositories is super quick with no/minimal barriers. Not saying that's necessarily the right idea, just outlining what I think developers/engineers are used to. This does put Dagger a step ahead of other marketplaces in security posture, which if implemented I'd definitely highlight when talking about it! |
@nipuna-perera Proxy env vars are a great point to bring up, but I think they should be treated as an exceptional case. I definitely don't think module authors should ever have to think about them, because that's tedious repetitive work across the entire ecosystem. Like you said, proxy env config applies to everything. If every module author had to remember to add params for it, anyone using a proxy would end up constantly blocked. In Concourse we decided that this was an infra-level concern. Workers would pick up the "standard" Aside from proxies, I'm in full agreement with #6112 (comment) |
Just one more thought: when designing this, consider that proxy settings can be challenging as they apply to the engine, not the client. In my case, I had the engine running on an AKS in the Azure cloud, which did not require any proxy settings, while the client was on my local computer in the company's network, which did require proxy settings. This automation caused complications because the proxy settings from my client were automatically applied to the container. However, since the proxy was only accessible from the company's network and not from the Azure cloud, I had to manually override the proxy environment variables, to make networking work. |
@skycaptain that is a great and important point. Since 0.9, Dagger now has the plumbing to explicitly tunnel outbound traffic from the container back through the client's network stack. At the moment we don't use that capability as a way to address the problem you describe (build tools naively applying on a remote host proxy rules that will only work on the client). But with a little bit of polish, I think we could. Here's an artisanal example of something you could do today, to illustrate the general idea: func example(ctx context.Context, c *dagger.Client) {
// A simple container
ctr := c.Container().From("ubuntu")
// Create a tunnel to an external service on the host network
googleDotCom := c.Host().Service([]dagger.PortForward{
{Frontend: 80, Backend: 80},
{Frontend: 443, Backend: 443},
}
// Bind the host service to the container
ctr = ctr.WithServiceBinding("www.google.com", googleDotCom)
// Consume the service tunneled via the host network.
result, err := ctr.WithExec([]string{"curl", "-L", "www.google.com"}).Stdout(ctx)
} This should successfully pick up your client host's VPN configuration etc. It wouldn't apply HTTP proxy configuration, but support for that could be added to the Dagger CLI. And of course you would want a way to route all a container's outbound traffic via the host, so the API would need some sort of catch-all, which doesn't exist today. So to be clear, this isn't a working solution, just a thought experiment. cc @vito |
Thanks for highlighting this feature. I can envision using it to access private services within our company's network, while still running builds in the cloud and without the need to make these services accessible in the cloud. However, I believe there may be use cases where tunneling all traffic the might not be optimal, considering traffic speed (upload speed from my network) and costs with large build artefacts, especially in the cloud.
We are currently exploring the use of Dagger as the single solution to use in both local builds and CI. Dagger is appealing because it allows us to seamlessly integrate various build tools such as Npm, Python, Yocto, Conan, CMake, etc. It eliminates the need for any setup on a developer's machine (except dagger) and isolates all builds. However, in a zero-trust company environment, developers are required to authenticate against all services (Gitlab, binary repository manager, code quality services, etc.) using tokens whenever they access private dependencies, sources and artefacts during local development. This requires a convenient method for developers to provide their credentials to a container. Currently, we are utilising a combination of .netrc files, .env files (dotenv-cli), and password managers (such as 1password-cli), which pass secrets as env variables to sub-shells. It would be great to have a convenient way for developers/users of my modules to pass their credentials without having to pass them one by one as command line arguments.
Wouldn't that apply to the entire software supply chain when using environments to store secrets, since, any library, module, or whatever is being run could be malicious and leak your secrets? Without having too deep an understanding of the architecture of Dagger, I could imagine functions such as |
To pass the envs from the caller shell to the module, I just wrote this dirty workaround. It's surely terrible, but I still want to share it with you. I'm not sure the same is possible with the other SDKs. import functools
import inspect
import json
import os
from typing import Optional
from dagger import function
def accept_envs(f):
@functools.wraps(f)
def _(*args, **kwargs):
envs = kwargs.get("envs")
if envs is not None:
os.environ.update(json.loads(envs))
return f(*args[:-1]) # Remove the envs from args
# Faking the signature to trick dagger to accept the envs parameter
signature = inspect.signature(f)
signature = signature.replace(
parameters=[
*signature.parameters.values(),
inspect.Parameter(
"envs",
inspect.Parameter.KEYWORD_ONLY,
default=None,
annotation=Optional[str],
),
]
)
_.__signature__ = signature
_.__annotations__ = {**f.__annotations__, "envs": Optional[str]}
return _
@function
@accept_envs
def hello(name: str | None = None) -> str:
if name is None:
name = os.getenv("USER", "world")
return f"Hello {name}!"
#!/usr/bin/env python3
import json
import os
import subprocess
import sys
if __name__ == "__main__":
envs = json.dumps(os.environ.copy())
if len(sys.argv) == 1:
sys.exit(0)
if sys.argv[1] == "--":
args = sys.argv[2:]
else:
args = sys.argv[1:]
subprocess.call([*args, "--envs", envs]) This can then be called with |
Would be awesome to have something like this to inject envs from .env file: // List files uploaded to Container
func (m *Ci) ListDir(ctx context.Context, dir *Directory) (string, error) {
godotenv.Load("../.env")
tag := os.Getenv("REGISTRY_IMAGETAG")
reg_user := os.Getenv("REGISTRY_USER")
return dag.
Container().
From("alpine").
WithEnvVariable("REGISTRY_IMAGETAG", tag).
WithEnvVariable("REGISTRY_USER", reg_user).
WithMountedDirectory("/dir", dir).
WithExec([]string{"sh", "-c", "printenv"}).Stdout(ctx)
} and run:
|
Env vars are a standard mechanism for handling secrets and configs, but they are primarily secrets. On any CI/CD, we have several of those env vars, most of them critical from a security perspective; passing all those env vars using cli args is not only cumbersome but also a security risk from many standpoints. Almost any tools are allowed to immediately inject OS environments, and Dagger should do the same or allow an easy way to do this.
Dagger can be used in many different scenarios, but this is what stands on the main page.
and on CI/CD scenarios, the env var handling its critical |
@bitgandtter having to explicitly pass all env variables can indeed be cumbersome (and we will add conveniences to address that in the future) but it is NOT a security risk. On the contrary, sandboxing Dagger Functions by default makes them more secure. I am curious to hear how you think it would hurt security? Thanks. |
@shykes, indeed, having it all in a sandbox makes the inner process more secure. It does not directly prevent secret exposure. But that said, getting back to the argument passing, if it's true that it's not a dagger issue itself, developers or ci/cd tools can make mistakes in exposing/printing those secrets to the logs/output example:
the second argument directly depends on the ci/cd tool handling/masking the output of that inner cli exec. Usually most ci/cd tools and devops/developers knows about this and take care. But what if dagger automatically loads all those environments? Then there is no need for that extra step, helping DevOps/developers on hiding those values. Not sure if I manage to explain the use case and how dagger can help there. PD: sadly no all developers/DevOps checks their code/pipelines to prevent that data leak |
You shouldn’t really be using print-access-token for automation and instead making a secret JWT made available via Vault, Infisical, Doppler; etc. All of those would deliver the token securely to an environment variable and would work across environments. |
Agreed, @rawkode. My point is that neither Dagger nor anyone/tool can control how DevOps/developers use it. That was just an example of how it can be used. If dagger ingests env vars automatically, then there is no need for that kind of approach. |
Perhaps there’s a compromise where the dagger.json can accept a list of environment variables to automatically propagate and then each module can be very specific about what it needs; this adds some security that we can vet remote modules and their env access. |
With this approach, our arguments to functions could be annotated to attach to existing environment variables. |
I understand. Dagger is not affected by this particular problem, because it doesn't allow passing the value of the env variable: you have to pass the name of the variable. So in your example, that potentially insecure command will not work. Instead you would use
This convenience comes at a steep price: a weaker sandbox, which makes reusing Dagger Functions less safe and less reliable. It's not a price we're willing to pay. But, the good news is that there are other ways to add convenience without compromising on strong sandboxing. In the meantime, you can safely pass environment variables from your CI environment to any Dagger Function, knowing that you are doing so securely. |
@shykes when you said
So in your example, that potentially insecure command will not work. Instead you would use --my-secret=env:MY_SECRET. I'm not sure. I'm using it right now sort of like that, I mean
on both local and GHA |
@bitgandtter you must be using an old |
using locally
|
@bitgandtter that is very surprising. Definitely not expected behavior. Possible explanations:
🤷 |
option 4 for sure ;) that said and as I mention before its a DX the need of passing the values put developers in place of either:
this can be solve by
yes this can be naive but I just wanna highlight the point of developers been allowed to expose secret values |
You're right that there will always be a way to leak secrets. Dagger doesn't aim to solve every security issue. If we 1) don't make security worse, and 2) make security better on average; we are in a good place. |
Right now callers can pass their environment variables to modules by just reading them and setting them to function arg values, e.g. w/ the cli:
This suffices for straightforward use cases but multiple users have found it to be tedious + clunky once you need to pass a lot of env vars. It also isn't ideal for some use cases where the caller may not know what they are even supposed to pass.
This issue is to discuss any possible improvements we can make.
One very obvious potential solution would be to just give module code an api to read these from the caller. We intentionally didn't go that route because it pokes a really big whole in the idea of modules being sandboxed. This is bad both from a security and portability perspective.
Some ideas around potential solutions (not mutually exclusive):
The text was updated successfully, but these errors were encountered: