Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeBinaryWrapper: Variable expansion at runtime #172583

Open
bergkvist opened this issue May 11, 2022 · 39 comments
Open

makeBinaryWrapper: Variable expansion at runtime #172583

bergkvist opened this issue May 11, 2022 · 39 comments

Comments

@bergkvist
Copy link
Member

bergkvist commented May 11, 2022

makeBinaryWrapper: Variable expansion at runtime

Right now, several packages (typically desktop GUI applications) that are wrapped with makeWrapper (shell version) depend on variable expansion at runtime (as opposed to build time) happening in the bash wrappers. This is currently not possible to do using the existing implementation of binary wrappers.

makeWrapper "${browserBinary}" "$out/bin/chromium" \
--add-flags ${escapeShellArg commandLineArgs} \
--add-flags "\''${NIXOS_OZONE_WL:+\''${WAYLAND_DISPLAY:+--enable-features=UseOzonePlatform --ozone-platform=wayland}}"

Above you can see an example of such an application (chromium) being wrapped in a way which currently can't be migrated to using binary wrappers.

wordexp.h

wordexp.h (https://man7.org/linux/man-pages/man3/wordexp.3.html) seems like it could enable this for binary wrappers as well. Below you can see an example using the string from the chromium bash wrapper.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <wordexp.h>

int main(int argc, char **argv) {
    putenv("NIXOS_OZONE_WL=1");
    putenv("WAYLAND_DISPLAY=1");
    
    const char *str = "\''${NIXOS_OZONE_WL:+\''${WAYLAND_DISPLAY:+--enable-features=UseOzonePlatform --ozone-platform=wayland}}";
    wordexp_t p;
    int err = wordexp(str, &p, 0);
    if (err) {
        printf("wordexp failed with error code %i\n", err);
        return err;
    }

    printf("p.offset = %i\np.we_wordc = %i\n", p.we_offs, p.we_wordc);
    for (int i = 0; i < p.we_wordc; i++) {
        printf("p.we_wordv[%i] = \"%s\"\n", i, p.we_wordv[i]);
    }
}

stdout

p.offset = 0
p.we_wordc = 3
p.we_wordv[0] = ""
p.we_wordv[1] = "--enable-features=UseOzonePlatform"
p.we_wordv[2] = "--ozone-platform=wayland"

It seems like wordexp will split words into seperate entries in the output array. Note that several characters are unsupported (like newlines) in the string passed to wordexp. It will cause wordexp to return an error code; so wordexp is likely not something that can/should be used as a default for --set, --set-default, --add-flags, etc.

Thanks to @thiagokokada for the suggestion of using wordexp.h.

Related:

@thiagokokada
Copy link
Contributor

Well, this definitely looks much better than I thought initially. I think implementing this in makeBinaryWrapper will get us to a 95% replacement for makeWrapper, and the only remaining cases will be some very specific corner cases.

@thiagokokada
Copy link
Contributor

Also, returning an error code can be thought as an advantage. We can capture those errors and print a user friendly message to help with debugging.

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

I'm wondering if these errors would be possible to detect/rule out at build time already (like the string containing an explicit newline), or if there are some errors that would only be possible to detect at runtime by setting environment variables to something bad.

Another discovery, which seems like it could be very useful when it comes to flags in particular:

// gcc main.c -o main
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <wordexp.h>

int main(int argc, char **argv) {
    const char *str = "$0 -before $@ -after";
    wordexp_t p;
    int err = wordexp(str, &p, 0);
    if (err) {
        printf("wordexp failed with error code %i\n", err);
        return err;
    }

    printf("p.offset = %lu\np.we_wordc = %lu\n", p.we_offs, p.we_wordc);
    for (int i = 0; i < p.we_wordc; i++) {
        printf("p.we_wordv[%i] = \"%s\"\n", i, p.we_wordv[i]);
    }
}

Running ./main -a -b -c:

p.offset = 0
p.we_wordc = 6
p.we_wordv[0] = "./main"
p.we_wordv[1] = "-before"
p.we_wordv[2] = "-a"
p.we_wordv[3] = "-b"
p.we_wordv[4] = "-c"
p.we_wordv[5] = "-after"

@thiagokokada
Copy link
Contributor

We can probably just add an obligatory post build phase that runs the binary after wrapping. If something is wrong this will trigger the issue.

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

Single quotes also seem to have a special purpose in the string passed to wordexp:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <wordexp.h>

int main(int argc, char **argv) {
    putenv("expandme=xyz");
    
    const char *str = "$0 '$noexpansion with spaces' $@ -after $expandme";
    wordexp_t p;
    int err = wordexp(str, &p, 0);
    if (err) {
        printf("wordexp failed with error code %i\n", err);
        return err;
    }

    printf("p.offset = %lu\np.we_wordc = %lu\n", p.we_offs, p.we_wordc);
    for (int i = 0; i < p.we_wordc; i++) {
        printf("p.we_wordv[%i] = \"%s\"\n", i, p.we_wordv[i]);
    }
}
./main "-'a" $'-b\nc' "d\ne"

p.offset = 0
p.we_wordc = 8
p.we_wordv[0] = "./main"
p.we_wordv[1] = "$noexpansion with spaces"
p.we_wordv[2] = "-'a"
p.we_wordv[3] = "-b"
p.we_wordv[4] = "c"
p.we_wordv[5] = "d\ne"
p.we_wordv[6] = "-after"
p.we_wordv[7] = "xyz"

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

Executing the wrapper will have the side effect of executing the program it wraps though. How would we know if the wrapper crashes or the program itself does? A solution to this might be to replace EXECUTABLE with a dummy shell script and rebuild/run in the post build phase.

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

I just tested this on macOS (x86 Big Sur), and it does not behave the same way/give the same output:

// ...
putenv("expandme=xyz");
const char *str = "$0 '$noexpansion with spaces' $@ -after $expandme";
// ...
./main -a -b -c

p.offset = 140701795755056
p.we_wordc = 4
p.we_wordv[0] = "sh"
p.we_wordv[1] = "$noexpansion with spaces"
p.we_wordv[2] = "-after"
p.we_wordv[3] = "xyz"

This is likely a consequence of this quote from the documentation:

https://man7.org/linux/man-pages/man3/wordexp.3.html
The result of expansion of special parameters ($@, $*, $#, $?, $-, $$, $!, $0) is unspecified.

So we should probably disallow usage of these since they are not portable.

@thiagokokada
Copy link
Contributor

Executing the wrapper will have the side effect of executing the program it wraps though. How would we know if the wrapper crashes or the program itself does? A solution to this might be to replace EXECUTABLE with a dummy shell script and rebuild/run in the post build phase.

Yeah, I think substituting the wrapper with a dummy should be doable during testing.

However, we can also ignore this issue for now. I mean, most people should run the binary at least once.

The main issue would be during mass migrations though.

@ncfavier
Copy link
Member

ncfavier commented May 11, 2022

Can you also check that your example with ${:+ also works on macOS? I was surprised to see it working because the Linux man page isn't very specific about what expansions are performed, but the POSIX one is a bit more.

@ncfavier
Copy link
Member

The problem with running the wrapper after build is that we probably want to allow command substitution, but we definitely don't want to run the commands during build, and we don't want to treat them as errors either.

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

Seems to work on macOS (x86) with ${:+:

// ...
putenv("y=Hello");
const char *str = "${x:+$y} world";
// ...
gcc main.c -o main && ./main
p.offset = 140702032323664
p.we_wordc = 1
p.we_wordv[0] = "world"

gcc main.c -o main && x=1 ./main
p.offset = 140701805683792
p.we_wordc = 2
p.we_wordv[0] = "Hello"
p.we_wordv[1] = "world"

Not sure what the offset is for - but it seems to generally be 0 on Linux, and some huge number on macOS.

@thiagokokada
Copy link
Contributor

The problem with running the wrapper after build is that we probably want to allow command substitution, but we definitely don't want to run the commands during build, and we don't want to treat them as errors either.

What I was thinking is more like, create a binary wrapper for a dummy binary, build, run. If this results in an error bail out, if not continue.

I think this should be doable.

@ncfavier
Copy link
Member

I'm not sure we're talking about the same thing. Command substitution is the $(...) syntax, e.g. --add-flags '--path $(foo)/bar'.

@bergkvist
Copy link
Member Author

For testing the validity of a wordexp-string at build-time, how about simply doing something like this when the C code is being generated:

#include <stdio.h>
#include <wordexp.h>

int main(int argc, char **argv) {
  if (argc != 2) {
    printf("Usage: %s INPUT_STRING\n", argv[0]);
    return -1;
  }
  wordexp_t p;
  return wordexp(argv[1], &p, 0);
}
# This could be packaged in nixpkgs, so that we don't need to compile every time we generate C-code
gcc wordexp-is-valid.c -o wordexp-is-valid

if ./wordexp-is-valid "test-string"; then
  # print out wordexp-code
else
  # print out compiler error macro complaining about bad wordexp
fi

@ncfavier
Copy link
Member

Doesn't solve the command substitution problem. Really we'd need a DRY_RUN flag for wordexp.

@thiagokokada
Copy link
Contributor

I'm not sure we're talking about the same thing. Command substitution is the $(...) syntax, e.g. --add-flags '--path $(foo)/bar'.

Ah ok.

I wasn't thinking of actually covering this case at all with wordexp, because this would basically involve a subprocess and I am almost sure that wordexp wouldn't cover those cases (if this does I am impressed... and I also kinda of fear how this library actually works).

This would be probably the 5% that I said before that still needs makeWrapper.

@ncfavier
Copy link
Member

It does, see the man page. Probably just calls system(3) under the hood. Having to choose between that and build-time error detection is a tough call.

Found a derivation using this:

wrapProgram "$out/bin/docker-slim" --add-flags '--state-path "$(pwd)"'

@thiagokokada
Copy link
Contributor

thiagokokada commented May 11, 2022

It does, see the man page. Probably just calls system(3) under the hood. Having to choose between that and build-time error detection is a tough call.

Oh. This is really unexpected.

   The expansion done consists of the following stages: tilde
   expansion (replacing ~user by user's home directory), variable
   substitution (replacing $FOO by the value of the environment
   variable FOO), command substitution (replacing $(command) or
   `command` by the output of command), arithmetic expansion, field
   splitting, wildcard expansion, quote removal.

Found a derivation using this:

wrapProgram "$out/bin/docker-slim" --add-flags '--state-path "$(pwd)"'

This looks like it could easily be changed to use $PWD though. We actually probably should do this anyway because spawning a shell just to get current directory seems unecessary.

But yeah, I understand that other usages may be more legitimate. So for now I am pending more for not having any checks, at all. We can think about how to do a proper checking later on.

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

I guess we can look at some of the wordexp implemetations to find out how the error checking is done:

And then try to copy over the error checking-part of the implementation - so that we avoid side-effects of command substitution.

@ncfavier
Copy link
Member

That would pretty much amount to reimplementing wordexp in nixpkgs...

@thiagokokada
Copy link
Contributor

That would pretty much amount to reimplementing wordexp in nixpkgs...

Yeah, I also think so.

I am not completely against of implementing some type of build time check to avoid having issues during runtime. Like I said, this will be especially good to have when we do mass migrations (like the one in wrapGAppsHook).

However, I think for the initial version we can ignore the build checks because we will probably migrate a few packages that we will migrate by hand and we will actually test them manually. This should be already be useful. And them once we have a good idea what we want, we can create a better way to validate.

Otherwise, I think we will bikeshed how to create a proper way to check during builds and will not get anywhere 😅.

@bergkvist
Copy link
Member Author

Another option: disallow command substitution altogether - and set the flag WRDE_NOCMD. Then wordexp shouldn't be able to cause any side-effects - and we could use it safely as a build-time check.

@thiagokokada
Copy link
Contributor

For testing the validity of a wordexp-string at build-time, how about simply doing something like this when the C code is being generated:

#include <stdio.h>
#include <wordexp.h>

int main(int argc, char **argv) {
  if (argc != 2) {
    printf("Usage: %s INPUT_STRING\n", argv[0]);
    return -1;
  }
  wordexp_t p;
  return wordexp(argv[1], &p, 0);
}
# This could be packaged in nixpkgs, so that we don't need to compile every time we generate C-code
gcc wordexp-is-valid.c -o wordexp-is-valid

if ./wordexp-is-valid "test-string"; then
  # print out wordexp-code
else
  # print out compiler error macro complaining about bad wordexp
fi

I think there is still a way this idea can possibly work even with variable substitution. I am not sure if wordexp returns different error codes from "parser error" vs "substitution error", but if it does it is just a question of ignoring the substitution errors (assuming they will fail because the test environment will not have the called program on PATH) and returning the error codes that are related to parsing.

@ncfavier
Copy link
Member

In any case the function only returns one error. Also we can't rely on unsetting PATH because the commands might be absolute paths.

@thiagokokada
Copy link
Contributor

In any case the function only returns one error. Also we can't rely on unsetting PATH because the commands might be absolute paths.

I think inside the build environment it would be very difficult for any program to have access to absolute PATHs, at least on NixOS, thanks to the sandbox. Could be an issue in non-NixOS Linux/Darwin though. Of course, the program can still link directly on the package itself, but them I don't see that much issue (at least in most cases). For the few problematic cases out there we could have a option to disable the check or something.

@bergkvist
Copy link
Member Author

Yes, the the function only returns one error - and it will give up on the first problem it encounters.

Setting WRDE_NOCMD means that something like "$(pwd)\n" would fail with WRDE_CMDSUB before reaching the newline character. This means we can't really find out if it would return WRDE_BADCHAR without also executing the command inside of $().

@bergkvist
Copy link
Member Author

bergkvist commented May 11, 2022

Okay, I have another (very hacky) idea - we don't need to reimplement wordexp, we just need to recompile a version of it with WRDE_CMDSUB redefined as 0 instead of 4.

This means the command substitution code will return early without causing side effects - but with 0 (success), making it look like everything worked as expected - and as a consequence it will keep parsing the input string.

https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/posix/wordexp.c#L890

@thiagokokada
Copy link
Contributor

Another option: disallow command substitution altogether - and set the flag WRDE_NOCMD. Then wordexp shouldn't be able to cause any side-effects - and we could use it safely as a build-time check.

This actually don't see that a bad idea actually. I mean, I still want to see an actual example of this being useful, and even if we found one that it is, this will probably negate most performance advantages that we have with binary wrappers since we will run a subprocess in those cases.

I would be totally fine disabling variable expansion by default and only enabling it by a unsafe flag or something.

@bergkvist
Copy link
Member Author

We would also want to catch usage of $@, $*, $#, $?, $-, $$, $!, $0 at build time, and validation using wordexp itself would not help here - since these are allowed, but have unspecified behaviour.

Maybe writing a basic wordexp parser/validator in bash rather than C would not be too difficult, which could be used for build-time validation.

@bergkvist
Copy link
Member Author

bergkvist commented May 12, 2022

Starting point for wordexp input parser/validator written in bash:

validate_wordexp_input() {
  local input
  input=$1

  while [ "${#input}" -gt 0 ]; do
    case "$input" in
      # Unspecified behaviour
      $'$@'*)  return 1;; # Unspecified behaviour
      $'$*'*)  return 1;; # Unspecified behaviour
      $'$#'*)  return 1;; # Unspecified behaviour
      $'$?'*)  return 1;; # Unspecified behaviour
      $'$-'*)  return 1;; # Unspecified behaviour
      $'$$'*)  return 1;; # Unspecified behaviour
      $'$!'*)  return 1;; # Unspecified behaviour
      $'$0'*)  return 1;; # Unspecified behaviour
      
      # Bad characters
      $'\n'*)  return 2;; # WRDE_BADCHAR
      $'|'*)   return 2;; # WRDE_BADCHAR
      $'&'*)   return 2;; # WRDE_BADCHAR
      $';'*)   return 2;; # WRDE_BADCHAR
      $'<'*)   return 2;; # WRDE_BADCHAR
      $'>'*)   return 2;; # WRDE_BADCHAR
      $'('*)   return 2;; # WRDE_BADCHAR
      $')'*)   return 2;; # WRDE_BADCHAR
      $'{'*)   return 2;; # WRDE_BADCHAR
      $'}'*)   return 2;; # WRDE_BADCHAR

      # More complex parsing
      $'\\'*)  input=$(parse_backslash "$input") || return $? ;;
      $'$'*)   input=$(parse_dollars "$input")   || return $? ;;
      $'`'*)   input=$(parse_backtick "$input")  || return $? ;;
      $'"'*)   input=$(parse_dquote "$input")    || return $? ;;
      $'\''*)  input=$(parse_squote "$input")    || return $? ;;
      $'~'*)   input=$(parse_tilde "$input")     || return $? ;;
      $'*'*)   input=$(parse_glob "$input")      || return $? ;;
      $'['*)   input=$(parse_glob "$input")      || return $? ;;
      $'?'*)   input=$(parse_glob "$input")      || return $? ;;
      *)       input="${input:1}";;
    esac
  done
}

Possible errors from wordexp:

  • WRDE_BADCHAR (build time)
  • WRDE_BADVAL (runtime only, if WRDE_UNDEF flag is set)
  • WRDE_CMDSUB (build time, if WRDE_NOCMD flag is set)
  • WRDE_NOSPACE (runtime only)
  • WRDE_SYNTAX (build time)

Note that WRDE_NOSPACE is an out of memory error, so this can't be detected at build-time. WRDE_BADVAL also can't be detected at build time since it happens when you disallow missing variables and a variable is missing. All the other errors should be possible to detect at build time.

@ncfavier
Copy link
Member

I strongly oppose this. Please don't add a shell parser to nixpkgs.

@bergkvist
Copy link
Member Author

The only purpose of the shell parser would be to validate the input string sent to wordexp as a build-time check in makeCWrapper.

Why are you strongly opposed to this?

@ncfavier
Copy link
Member

It's a lot of complexity for something which I'm not even sure might help. Wrappers should be tested, and errors caught then. For mass migrations, switching to wordexp won't make existing shell expressions invalid.

If you really want to check shell syntax, something like bash -nc "exp=($exp)" might do.

On another note, we'd probably also need to support something like --run, maybe using system calls. Those could be validated with bash -nc "$cmd".

system calls run in child processes so we couldn't do e.g. --run 'export FOO=$(bar)' anymore. That might be a problem.

@bergkvist
Copy link
Member Author

bergkvist commented May 12, 2022

Unspecified behaviour ($@, $*, $#, $?, $-, $$, $!, $0) in wordexp would likely not cause any errors, but instead cause weird differences in behavior between platforms. Also, only a subset of valid shell syntax is valid wordexp-syntax.

"$hello|$world;" # valid in a shell string, not allowed in wordexp, because of `|` and `;`

I think the only way to properly avoid these problems is to have a custom validator - or to use wordexp itself for validation, which can only be safely done if we disallow command substitution.

But even then, unspecified behavior will not be caught, so we'd need to check this seperately. And we can't just match the string against "$@" either for example, since we'd need to count the number of backslashes in front of the dollar sign to know if it is escaped or not.

I don't think the wordexp input validator would end up being all that complex - and the benefit for portability and resulting ease of use of makeBinaryWrapper could be worth the complexity in my mind.

@thiagokokada
Copy link
Contributor

but instead cause weird differences in behavior between platforms.

One question: why are we having different behaviors between platforms? Should be because we use different C libs right? Can we just force the binary wrappers on Darwin to use glibc instead of whatever Darwin uses by default?

@thiagokokada
Copy link
Contributor

I think the only way to properly avoid these problems is to have a custom validator - or to use wordexp itself for validation, which can only be safely done if we disallow command substitution.

Again, I am not against disabling command substitution. I want to see an actual usage of command substitution, because the only example we have right now can be rewritten to not use it anymore ($(pwd) -> $PWD). Even if we have good examples, they can always be kept in makeShellWrapper until we find a way to validate them.

@thiagokokada
Copy link
Contributor

thiagokokada commented May 12, 2022

BTW, I think we don't have to worry about command substitution:

$ rg -- '--add-flags .*\$\(.*\)'
pkgs/servers/zoneminder/default.nix
192:        --add-flags "$perlFlags $out/libexec/$(basename $f)"

pkgs/applications/virtualization/docker-slim/default.nix
35:    wrapProgram "$out/bin/docker-slim" --add-flags '--state-path "$(pwd)"'

pkgs/development/interpreters/cling/default.nix
102:    --add-flags "$(cat "$compilerIncludeFlags")" \

So looking at it, there are no cases where command substitution this would be useful. What all those cases are doing is calculating something at build time (look at the examples above and it will be clear).

Of course, maybe there is some cases using other wrapProgram flags. But I don't think we should care too much. This really seems that it would only complicate things, and also it seems to be a bad idea from performance perspective.

So my conclusion: let's use WRDE_NOCMD flag for now, and if we eventually see a case where command substitution will be useful we can revisit this later (but for me, maybe we should just leave those small cases to wrapShellProgram instead).

@wmertens
Copy link
Contributor

As the author of the runtime expansion (RTE) used as an example, is like to point out that the RTE was only used because it was possible, as a half hack to allow users to choose Wayland vs X for GUI programs without having multiple installs.

In this particular case, I believe a shell wrapper is justified and there are no scripts that will use the wrapped GUI programs on shebang lines. If a shell wrapper is not desired, a tiny single-purpose wrapper binary could be written that calls the regular binary wrapper of the GUI program.

Furthermore, IMHO the spirit of makeWrapper is to make the wrapped program behave correctly on Nix. I can't think of a use case for RTE for that purpose.

So, before trying to add RTE to makeBinaryWrapper, why not look for other examples of it in use? I think you'll not find any that justify adding generic support for RTE. Instead, we can split the wrapper into a static makeBinaryWrapper and a shell makeWrapper that calls it.

@nikp123
Copy link

nikp123 commented May 27, 2023

Are there any workarounds to this, as in examples of how to do RTE-like expansions. I need some to expand a HOME variable in commandline arguments for a program (otherwise it won't know where to put the files because it doesn't read those variables by itself, unfortunately)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants