Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for setting default env for run() invocations #36440

Open
staticfloat opened this issue Jun 26, 2020 · 10 comments
Open

API for setting default env for run() invocations #36440

staticfloat opened this issue Jun 26, 2020 · 10 comments

Comments

@staticfloat
Copy link
Sponsor Member

JLL packages would benefit greatly from an API to set default environment variables for future run() invocations; like a default setenv() that other setenv() invocations eventually get merged into. We don't want to modify ENV itself, as that is thread-unsafe.

Proposed API: Provide a with_child_env(pairs...) API just like setenv and withenv. This will set a dictionary that is merged into by run() that causes child processes to be initialized with that environment.

@StefanKarpinski
Copy link
Sponsor Member

What had been proposed in the previous github issue discussing that changing ENV is inherently thread-unsafe was that we initialize ENV as a normal Dict{String,String} that is a copy of the real environment and when spawning child tasks we populate the child tasks environment from ENV. Would that suffice to cover what you want? The only case that didn't cover was communicating with in-process libraries that are configured via the environment, which is hopefully rare and which we could potentially have some sort of very slow stop-the-world API for doing.

@staticfloat
Copy link
Sponsor Member Author

In the world of binary dependencies, we have to setenv() (right now, done by ENV[key] = value) all the time, as that's the main way that we are able to communicate to those libraries. Two examples off the top of my head:

From what I have read online, as long as we're (a) using glibc, and (b) loading external libraries (that can start their own threads) it will always be unsafe to touch the environment at all, because if at any point anyone does a getenv() while someone else does a setenv(), bad things can happen. That being said, if we assume that most programs are generally well-behaved, putting a lock around it to prevent Julia from stepping on its own toes should probably work for most things.

Your proposed changes would work for this usecase; we wouldn't even need to change anything; using withenv() as we're doing now would continue to "just work", although the Julia packages we would need to alter our ENV[key] = value mappings to use whatever setenv() API we end up creating.

@StefanKarpinski
Copy link
Sponsor Member

Unclear if that's a "yes" or a "no" on if my proposed approach would work...

@staticfloat
Copy link
Sponsor Member Author

Yes, it will work. That's what I was trying to communicate in my last paragraph.

@tkf
Copy link
Member

tkf commented Jun 30, 2020

@staticfloat Any reason why JLL packages do not provide a Cmd with appropriate env bundled in it (instead of the withenv-based API)?

This doesn't help the Cairo.jl and Gtk.jl examples, of course. But the point in the OP seems to be specific to run.

@staticfloat
Copy link
Sponsor Member Author

staticfloat commented Jul 1, 2020

Two reasons:

  1. Because back in Julia 1.2 (when I was writing the JLL package wrappers) command interpolation with setenv() didn't work at all:
$ docker run -ti julia:1.2
julia> bash_cmd_with_env = setenv(`/bin/bash`, "FOO" => "BAR")
setenv(`/bin/bash`,["FOO=BAR"])

julia> run(`$bash_cmd_with_env -c "echo \$FOO"`);


julia>

In Julia 1.3+, it does! :)

$ docker run -ti julia:1.3
julia> bash_cmd_with_env = setenv(`/bin/bash`, "FOO" => "BAR")
setenv(`/bin/bash`,["FOO=BAR"])

julia> run(`$bash_cmd_with_env -c "echo \$FOO"`);
BAR

julia>
  1. But even today, setenv() doesn't merge things properly:
$ docker run -ti julia:1.4
julia> bash_cmd_with_env = setenv(`/bin/bash`, "FOO" => "BAR")
setenv(`/bin/bash`,["FOO=BAR"])

julia> run(setenv(`$bash_cmd_with_env -c "echo \$FOO \$BAZ"`, "BAZ" => "QUX"));
QUX

So if a user needs to provide environment variables to this particular invocation of a command, it drops anything that was set previously. :(

@tkf
Copy link
Member

tkf commented Jul 1, 2020

setenv() doesn't merge things properly

Isn't it better/easier to fix this? For example, maybe we can have (say) addenv to appropriately merge the environment variables.

@staticfloat
Copy link
Sponsor Member Author

staticfloat commented Jul 1, 2020

I think we should fix setenv(), for sure. That doesn't necessarily mean that this PR is undesirable though. I can imagine loading, for instance, a GTK Themes package that needs to set environment variables for future processes I will spawn to use those themes; but it may not have anything to integrate into a Cmd object. I don't want to have to cmd = GTKThemes.setup_env(cmd) for every command I might possibly want to run, I just want run(cmd) to automatically inherit something from a global ENV.

And as we have already discovered; using ENV["FOO"] = "bar" as it is implemented today is not good.

@tkf
Copy link
Member

tkf commented Jul 1, 2020

Hmm... I'm a bit uneasy about the dynamic-scoping nature of with* block but I do see that it can be handy. If we keep supporting such construct, I think we'd need to make environment variables task-local and propagate them to child tasks (as @StefanKarpinski mentioned). We can't implement a safe withenv even if we lock ENV (since concurrent withenv blocks do not necessarily nest). I think introducing some kind of "context variables" #35833 would be a better foundation to fix this.

@tkf
Copy link
Member

tkf commented Jul 2, 2020

BTW, another related API is cd(f, dir). Currently, it's not "concurrent-safe" but a similar technique can be used to make it safe to use it with @async/@spawn. I'm not 100% sure if that's a good approach but I do find it's handy to cd into a working directory and then run a bunch of git commands without setting cwd and using write with relative paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants