Ease path to adding new external command parsers #81

tejing1 · 2022-07-04T12:38:42Z

I'd like to see a way to .override resholve from nixpkgs to effectively throw in some arbitrary extra code into ExternalCommandParsers. The override should also affect resholve.writeScript and such as well, of course. If this could be made part of the "solution" rather than an override, all the better.

It's a comparatively easily implemented relief valve to at least allow a user to do something to handle a situation where a command does execute its arguments and there's no parser for it in resholve. Currently there's no way forward in that case other than:

externally resolving the relevant command somehow and tricking resholve into thinking resolution isn't necessary
maintaining a patch file to apply to resholve during build
maintaining a fork of resholve

Longer-term, it would be good to develop some kind of simple language for users to describe a program's argument structure that covers most cases and is a more stable interface, but at least this works as a near-universal fallback for users willing to put in the effort.

The text was updated successfully, but these errors were encountered:

abathur · 2022-07-04T20:30:25Z

I've been hoping this day (i.e., someone asking this question) would come. :)

I'll be chewing on this, but I'll sketch out why it doesn't exist yet:

The task is still fraught/messy. This makes me hesitant to spill a lot of complexity on people who may think they're undertaking something simple. Especially since we might have to make N rounds of breaking changes (until we've wrestled with enough real-world complexity to create a solid pluggable abstraction).

Some commands just need a straightforward ~ArgParse parser with a magic name for picking out the command--and I have wanted to either make those pluggable with config (ideal) or code. The regular structure of these in the code might under-sell how much complexity is still lurking here. Some sources:

Early on, I thought I might be able to close over differences between linux/BSD variants. This is mostly true, so there are a lot of combined parsers like:

resholve/resholve

Lines 1608 to 1641 in 45583b7

    
               @staticmethod 
        
               def _env(): 
        
                   """ 
        
                   coreutils src/env.c:execvp 
        
                   Usage: %s [OPTION] [COMMAND [ARG]...] 
        
                   both gnu and bsd support "-" as a synonym for -i 
        
                   argparse appears to support, but IDK if it'll cause trouble 
        
                   assignment-alikes should be stripped out already 
        
                   """ 
        
                   generic = CommandParser("env") 
        
                   generic.add_argument("-P")  # bsd altpath 
        
                   # gnu 
        
                   generic.add_argument( 
        
                       # -i and - are also bsd 
        
                       "-i", 
        
                       "--ignore-environment", 
        
                       "-", 
        
                       action="store_true", 
        
                   ) 
        
                   generic.add_argument("-0", "--null", action="store_true") 
        
                   generic.add_argument("-u", "--unset")  # -u is bsd 
        
                   generic.add_argument("-C", "--chdir") 
        
                   generic.add_argument("-S", "--split-string")  # -S is bsd 
        
                   generic.add_argument("--block-signal") 
        
                   generic.add_argument("--default-signal") 
        
                   generic.add_argument("--ignore-signal") 
        
                   generic.add_argument("--list-signal-handling") 
        
                   generic.add_argument("-v", "--debug", action="store_true")  # -v is bsd 
        
                   generic.add_argument( 
        
                       "commands", nargs=theirparse.REMAINDER, action="invocations" 
        
                   ) 
        
                   return (generic,)

It didn't take long to start hitting cases where the syntax conflicts in some way that entails a separate parser like:

resholve/resholve

Lines 1782 to 1822 in 45583b7

    
           @staticmethod 
        
           def _script(): 
        
               linux = CommandParser("script", "linux") 
        
               linux.add_argument("-a", "--append", action="store_true") 
        
               linux.add_argument("-E", "--echo") 
        
               linux.add_argument("-e", "--return", action="store_true") 
        
               linux.add_argument("-f", "--flush", action="store_true") 
        
               linux.add_argument("--force", action="store_true") 
        
               linux.add_argument("-B", "--log-io") 
        
               linux.add_argument("-I", "--log-in") 
        
               linux.add_argument("-O", "--log-out") 
        
               linux.add_argument("-T", "--log-timing") 
        
               linux.add_argument("-m", "--logging-format") 
        
               linux.add_argument("-o", "--output-limit") 
        
               linux.add_argument("-t", "--timing") 
        
               linux.add_argument("-q", "--quiet", action="store_true") 
        
               linux.add_argument("-V", "--version", action="store_true") 
        
               linux.add_argument( 
        
                   "-c", 
        
                   "--command", 
        
                   dest="commands", 
        
                   action="invocations", 
        
                   split=True, 
        
                   nargs=1, 
        
               ) 
        
               linux.add_argument("file", nargs="?") 
        
               bsd = CommandParser("script", "bsd") 
        
               bsd.add_argument("-d", action="store_true") 
        
               bsd.add_argument("-k", action="store_true") 
        
               bsd.add_argument("-p", action="store_true") 
        
               bsd.add_argument("-r", action="store_true") 
        
               bsd.add_argument("-q", action="store_true") 
        
               bsd.add_argument("-t", action="store_true") 
        
               bsd.add_argument("-a", action="store_true") 
        
               bsd.add_argument("-F") 
        
               bsd.add_argument("-T") 
        
               bsd.add_argument("file", nargs="?") 
        
               bsd.add_argument("commands", nargs=theirparse.REMAINDER, action="invocations") 
        
               return (linux, bsd)

. So, now resholve tries a sequence of parsers and take the first that finds a sub-command.

This is working in practice, but I'm not sure we won't find cases where both forms would match (but mis-identify) sub-commands in invocations meant for the other form.

A few commands have some kind of syntactic complexity that argparse can't handle. Examples so far include find and sed.

resholve/resholve

Lines 1371 to 1424 in 45583b7

    
           class FindParser(CommandParser): 
        
               """ 
        
               find's exec options (-exec -execdir -ok -okdir) 
        
               are terminated in a way argparse can't really 
        
               deal with (;|+), so we'll declare these with 
        
               nargs=1, pre-parse to replace these args with 
        
               a fake word, and then swap them back in after 
        
               the parse. 
        
               """ 
        
               def _parse_known_args(self, arg_strings, namespace): 
        
                   simplified = list() 
        
                   capturing = False 
        
                   captures = list() 
        
                   for i, arg in enumerate(arg_strings): 
        
                       if arg in ("-exec", "-execdir", "-ok", "-okdir"): 
        
                           capturing = True 
        
                           simplified.append(arg) 
        
                           simplified.append("fakecmd") 
        
                           captures.append(list()) 
        
                       elif arg in (";", "+"): 
        
                           capturing = False 
        
                       elif capturing: 
        
                           captures[-1].append(arg) 
        
                           continue 
        
                       else: 
        
                           simplified.append(arg) 
        
                   ns, rest = super(FindParser, self)._parse_known_args(simplified, namespace) 
        
                   if captures and ns.commands: 
        
                       ns.commands = [Invocation(words=x) for x in captures] 
        
                   return ns, rest 
        
           class GnuSedParser(CommandParser): 
        
               """ 
        
               gnused's -i/--in-place options can be bare, or have the 
        
               argument directly attached (it CANNOT be a separate 
        
               shell word). 
        
               """ 
        
               def _parse_known_args(self, arg_strings, namespace): 
        
                   simplified = list() 
        
                   capturing = False 
        
                   captures = list() 
        
                   for i, arg in enumerate(arg_strings): 
        
                       if arg.startswith("--in-place="): 
        
                           continue 
        
                       elif arg.startswith("-i"): 
        
                           if len(arg) > 2: 
        
                               continue 
        
                       else: 
        
                           simplified.append(arg) 
        
                   ns, rest = super(GnuSedParser, self)._parse_known_args(simplified, namespace) 
        
                   return ns, rest

I've found at least one command with mutually-exclusive syntax forms that affect where the command will be:

resholve/resholve

Lines 1678 to 1704 in 45583b7

    
               @staticmethod 
        
               def _runcon(): 
        
                   """ 
        
                   coreutils src/runcon.c:execvp 
        
                   Usage: %s [OPTION] [COMMAND [ARG]...] 
        
                   two mutually-exclusive forms; if opts are present, the command 
        
                   is the first arg--if not, there's a combined context first. 
        
                   """ 
        
                   opt_context = CommandParser("runcon", "opt context") 
        
                   context = opt_context.add_mutually_exclusive_group(required=True) 
        
                   context.add_argument("-c", "--compute", action="store_true") 
        
                   context.add_argument("-t", "--type") 
        
                   context.add_argument("-u", "--user") 
        
                   context.add_argument("-r", "--role") 
        
                   context.add_argument("-l", "--range") 
        
                   opt_context.add_argument( 
        
                       "commands", nargs=theirparse.REMAINDER, action="invocations" 
        
                   ) 
        
                   # these two forms are mutua 
        
                   combined_context = CommandParser("runcon", "combined context") 
        
                   combined_context.add_argument("context") 
        
                   combined_context.add_argument( 
        
                       "commands", nargs=theirparse.REMAINDER, action="invocations" 
        
                   ) 
        
                   return (opt_context, combined_context)

Something about the command syntax entails a more sophisticated/creative handling step. IIRC the main examples of this that I've handled so far are sed, awk, and dc.

resholve/resholve

Lines 3559 to 3768 in 45583b7

    
               def _find_sed_e_cmd(self, expr): 
        
                   """ 
        
                   '{ s/a/b/ ; s/b/c/e ; }' 
        
                   sed: -e expression #1, char 20: e/r/w commands disabled in sandbox mode 
        
                   1 
        
                   --expression='s/a/b/' -e 's/b/c/e' 
        
                   sed: -e expression #2, char 7: e/r/w commands disabled in sandbox mode 
        
                   1 
        
                   -e 'e echo' 
        
                   'sed: -e expression #1, char 1: e/r/w commands disabled in sandbox mode\n' 
        
                   """ 
        
                   sed = lookup("sed") 
        
                   if not sed: 
        
                       raise Exception( 
        
                           "Somehow I ended up trying to look for a sed `e` command in %r when sed isn't even present. Oops! Please report this @ https://github.com/abathur/resholve", 
        
                           expr, 
        
                       ) 
        
                   p = Popen( 
        
                       [sed, "--sandbox", expr], 
        
                       shell=False, 
        
                       stdin=PIPE, 
        
                       stdout=PIPE, 
        
                       stderr=PIPE, 
        
                       close_fds=True, 
        
                   ) 
        
                   stdout, stderr = p.communicate(input="something cute") 
        
                   # 
        
                   if p.returncode == 1 and "commands disabled in sandbox mode" in stderr: 
        
                       # gnused e/r/w command present 
        
                       # ['sed', '', '-e', 'expression', '#1,', 'char', '1', '', 'e/r/w', 'commands', 'disabled', 'in', 'sandbox', 'mode', ''] 
        
                       badchar = int(re.split(r"\s|:", stderr)[6]) 
        
                       badchar -= 1  # adjust 1 -> 0 index 
        
                       # trying to work around an issue 
        
                       # demo: https://gist.github.com/abathur/ca0c6ca342da292f4361acd826740691 
        
                       # report: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=48725 
        
                       # my best-guess is that the error, for s cmds, can't trigger until 
        
                       # the s-cmd ends? And since s supports internal spaces, 
        
                       # that isn't until a brace or a semicolon or the end of the arg? 
        
                       # in any case; walk back 
        
                       while expr[badchar] in (";", " "): 
        
                           badchar -= 1 
        
                       if expr[badchar] == "e": 
        
                           return True 
        
                           # TODO: maybe later we can do something smarter 
        
                   return False 
        
               def handle_external_sed(self, parsed, invocation): 
        
                   """ 
        
                   We want to check sed expressions on the CLI. They can 
        
                   either come after an -e/--expression flag, or they can 
        
                   be the first ~positional argument as long as -e or -f 
        
                   or --expression or --file weren't used. 
        
                   (I guess there's a hole here, in that someone can construct 
        
                   a heredoc or string and pass it into the script. I'm going 
        
                   to call that the arbitrary line in the sand. I'd be tickled 
        
                   pink to be able to hand off both these expressions *and* the 
        
                   any heredocs/etc to another program that is ~resholve for 
        
                   sed scripts/expressions.) 
        
                   This is also the main example of how to override a handler 
        
                   for a command that can't skate by with the generic handler. 
        
                   - return True if this is now handled (will try any additional 
        
                     parsers before failing if it isn't handled) 
        
                   """ 
        
                   if parsed.expression: 
        
                       for expression in parsed.expression: 
        
                           if self._find_sed_e_cmd(expression): 
        
                               observe( 
        
                                   SedECommand, 
        
                                   word=expression.ast, 
        
                                   span_id=expression.first_spid, 
        
                                   arena=self.arena, 
        
                               ) 
        
                       return True 
        
                   elif not parsed.file and parsed.input_files: 
        
                       # trailing positional expression 
        
                       first = parsed.input_files[0] 
        
                       if self._find_sed_e_cmd(first): 
        
                           observe( 
        
                               SedECommand, 
        
                               word=first.ast, 
        
                               span_id=first.first_spid, 
        
                               arena=self.arena, 
        
                           ) 
        
                       return True 
        
                   return True 
        
               def _find_awk_sub_cmd(self, expr): 
        
                   """ 
        
                   $ awk --sandbox 'BEGIN { system("date"); close("date")}' 
        
                   awk: cmd. line:1: fatal: 'system' function not allowed in sandbox mode 
        
                   $ awk --sandbox 'BEGIN { 
        
                       cmd = "ls -lrth" 
        
                       while ( ( cmd | getline result ) > 0 ) { 
        
                           print result 
        
                       } 
        
                       close(cmd); 
        
                   }' 
        
                   awk: cmd. line:3: fatal: redirection not allowed in sandbox mode 
        
                   """ 
        
                   awk = lookup("awk") 
        
                   if not awk: 
        
                       raise Exception( 
        
                           "Somehow I ended up trying to look for an awk sub command in %r when awk isn't even present. Oops! Please report this @ https://github.com/abathur/resholve", 
        
                           expr, 
        
                       ) 
        
                   # TODO: abstract this pattern 
        
                   # close dupe in sed 
        
                   p = Popen( 
        
                       [awk, "--sandbox", expr], 
        
                       shell=False, 
        
                       stdin=PIPE, 
        
                       stdout=PIPE, 
        
                       stderr=PIPE, 
        
                       close_fds=True, 
        
                   ) 
        
                   try: 
        
                       timer = Timer(2, p.kill) 
        
                       timer.start() 
        
                       stdout, stderr = p.communicate(input="something cute") 
        
                   finally: 
        
                       if timer.is_alive(): 
        
                           timer.cancel() 
        
                       else: 
        
                           logger.warning( 
        
                               "timeout after 1s waiting for awk expression test: \n%s\ncontinuing but I'm not certain it doesn't contain external commands.", 
        
                               expr, 
        
                           ) 
        
                   if p.returncode == 2: 
        
                       if "fatal: 'system' function not allowed in sandbox mode" in stderr: 
        
                           return True 
        
                       # test for | since normal > redirects also trigger 
        
                       elif ( 
        
                           "fatal: redirection not allowed in sandbox mode" in stderr 
        
                           and "|" in expr 
        
                       ): 
        
                           return True 
        
                       logger.debug("unhandled awk error: %r", stderr) 
        
                       # TODO: maybe later we can do something smarter 
        
                   return False 
        
               def handle_external_awk(self, parsed, invocation): 
        
                   """ 
        
                   We want to check awk expressions on the CLI. They can 
        
                   either come after an -e/--source flag, or they can 
        
                   be the first ~positional argument 
        
                   TODO: validate this behavior for awk (as long as -e or -f 
        
                   or --expression or --file weren't used.) 
        
                   (I guess there's a hole here, in that someone could construct 
        
                   a heredoc or something and pass it into the script. I'm going 
        
                   to call that the arbitrary line in the sand. I'd be tickled 
        
                   pink to be able to hand off both these expressions *and* the 
        
                   any heredocs/files to another program that is ~resholve for 
        
                   awk scripts/expressions.) 
        
                   """ 
        
                   if parsed.source: 
        
                       for expression in parsed.source: 
        
                           if self._find_awk_sub_cmd(expression): 
        
                               observe( 
        
                                   AwkSubCommand, 
        
                                   word=expression.ast, 
        
                                   span_id=expression.first_spid, 
        
                                   arena=self.arena, 
        
                               ) 
        
                       return True 
        
                   elif not parsed.file and parsed.script: 
        
                       # trailing positional expression 
        
                       if self._find_awk_sub_cmd(parsed.script): 
        
                           observe( 
        
                               AwkSubCommand, 
        
                               word=parsed.script.ast, 
        
                               span_id=parsed.script.first_spid, 
        
                               arena=self.arena, 
        
                           ) 
        
                       return True 
        
                   return True 
        
               # TODO: maybe a better way for the caller to find these 
        
               handle_external_gawk = handle_external_awk 
        
               def handle_external_dc(self, parsed, invocation): 
        
                   """ 
        
                   check dc expressions for ! cmds 
        
                   """ 
        
                   if parsed.expression: 
        
                       for expression in parsed.expression: 
        
                           if "!" in expression: 
        
                               bangpos = expression.index("!") 
        
                               if expression[bangpos + 1] not in ("<", "=", ">"): 
        
                                   observe( 
        
                                       DcBangCommand, 
        
                                       word=expression.ast, 
        
                                       span_id=expression.first_spid, 
        
                                       arena=self.arena, 
        
                                   ) 
        
                       return True 
        
                   return True

I originally intended (and tried) to build a pressure-relief valve here alongside lore. The goal was to enable users to manually specify an "assay" that just explicitly told resholve which argument was executable. In order to sort out the logistics of actually substituting these (i.e., when and where in the process) I stubbed out a proof-of-concept based on specifying the word-number:

resholve/tests/behavior.bats

Lines 163 to 219 in 1e34e32

    
           @test "resolve fails without assay" { 
        
             require <({ 
        
               status 9 
        
               line -1 contains "'cat' _might_ be able to execute its arguments, and I don't have any command-specific rules for figuring out if this specific invocation does or not." 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" < assay.sh 
        
           CASES 
        
           @test "resolve fails with bad assay" { 
        
             require <({ 
        
               status 9 
        
               line -1 contains "'cat' _might_ be able to execute its arguments, and I don't have any command-specific rules for figuring out if this specific invocation does or not." 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p head)${fs}head${fs}yes${fs}4) < assay.sh 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat${fs}yes${fs}4) < assay.sh 
        
           CASES 
        
           @test "resolve fails with overshooting assay wordnum" { 
        
             require <({ 
        
               status 10 
        
               line 3 contains "I have an assay matching this invocation, but: the wordnum index" 
        
               line 3 contains "is too large to zero-index args(4)" 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}4) < assay.sh 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}5) < assay.sh 
        
           CASES 
        
           @test "resolve fails with assay wordnum 0" { 
        
             require <({ 
        
               status 1 # TODO: should be 2 
        
               line -1 contains "assay wordnum should be 1+ (0 is the same as the invoking command itself)" 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}0) < assay.sh 
        
           CASES 
        
           @test "resolve fails with undershooting assay wordnum" { 
        
             require <({ 
        
               status 3 
        
               line 3 contains "Couldn't resolve command" 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}1) < assay.sh 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}2) < assay.sh 
        
           CASES 
        
           @test "resolve succeeds with assay" { 
        
             require <({ 
        
               status 0 
        
               line -1 contains "bin/head" 
        
             }) 
        
           } <<CASES 
        
           unset RESHOLVE_LORE && RESHOLVE_PATH="$RESHOLVE_PATH:$PKG_COREUTILS" resholve --interpreter $INTERP --execer "can${fs}$(type -p cat)" --execer "cannot${fs}$(type -p head)" --assay <(echo $(type -p cat)${fs}cat CANNOT="do this" --not-a-real-flag head${fs}yes${fs}3) < assay.sh 
        
           CASES

When I turned to mulling how to make a ~syntax for triaging each invocation that would be easy to understand, work with, and not a PITA to maintain as the target script evolved, I started to suspect the triage approach might be a mis-feature. (Since I want resholve to carry the most-common of these, I want to avoid making a mechanism that is clumsy, a little miserable to use, and inadvertently burns energy that would be better spent upstreaming support for additional commands.)

I've been implementing these external-executable-intel bits in a fairly centralized way (parsers in resholve, resholve's Nix API invoking binlore to do the binary analysis) for expedience.

But, I don't love how far knowledge about the package (in the form of lore, lore overrides, and parsers) is from the package. In the long term we'll probably need extra mechanisms to manage complexity that comes with centralizing it (ways to disambiguate version/variant differences, deal with drift between rules and the upstream features, etc.). If it was in the package, resholve wouldn't need to worry about whether it's the BSD or GNU version, or whether there's a version-number difference.

I don't want to actually pursue that yet, but a good way to prepare would be figuring out a humane no-code format that can flexibly express either the full command syntax or which command arguments are executable with enough fidelity to reliably disambiguate them. (Or figuring out with some confidence that it just isn't tractable.)

Edit (April 2024): my thinking on the last paragraph above has evolved a good bit since I wrote it. Some notes:

I still want something here, but I'm not sure if it can be a true no-code format.
Some of the most-troublesome cases here involve commands that make dynamic decisions about how to parse their arguments as they go. (I.e., the semantics of an arg/flag or the number of words consumed in some position may change when they encounter some flag/arg.)
Perhaps it could still be fairly declarative, maybe something graph-based that can express conditional logic is enough to handle this?
A great solution might have to handle at least 3 more kinds of complexity:
- If the parsers live on/in the packages they describe, I previously speculated that we might be able to get away without needing a framework for disambiguating clashes between, say BSD/GNU or backwards-incompatible version changes. As I've mulled this, though, I've come to wonder if this disambiguation is an ideal property of the best implementation (regardless of where the parser definition lives)? It's what we'd want in order to be able to be able to throw an error if we're trying to use a GNU-only flag with the BSD variant, or use a removed flag with a version it's been removed in. I haven't convinced myself that this is table-stakes--this type of problem isn't currently in scope for resholve, and I think it's defensible to say it's out of scope. But, if we find an ~inspired way to achieve it, I think it could ultimately unlock a lot of use cases (which might help attract hands/eyes to help build and maintain the parsers).
- While at the moment we're most-concerned with executable arguments, there are other things (like environment variables and perhaps some static/relative paths outside of the store) under user control that any given command could potentially exec. If the scope of the effort was more like, "describe the external influences on how this runs", it might make sense to tackle these? (Command-line arguments and envs are two partly overlapping types, also config files/directories and any other path that could affect it.)
- Since this project is Sisyphean at best, I think it makes sense to do a little work up front to ensure that it supports the greatest number of use-cases that it can bear without overcomplicating it. It should probably be abstract enough that it can have implementations in most languages someone could plausibly want to write or parse CLI invocations from. It should probably be able to drive completion-generators. Once complete, it should probably be able to replace a command's own parsing routines. If it can do this, it should probably be able to do tricks like: compile down to something that supports only the ~edge spec for the command; merge in parsers that model other variants of the command and be able to run in a test mode that warns/errors when maintainers are adding a flag that'll clash with other popular implementations; keep you from re-using arguments with different semantics without swearing you aren't an idiot; either generate or force you to write deprecation/migration notes for people who run no-longer-supported arguments as long as they aren't shadowed by newer args; etc.

chasecaleb · 2022-12-28T22:42:27Z

I'm trying to figure out how to deal with nixos-enter, which I think is related to this issue since nixos-enter --command 'echo foo' and nixos-enter -- echo foo can execute commands and resholve/binlore don't have rules for it.

I started down the path of execer = [ "can:${pkgs.nixos-install-tools}/bin/nixos-enter" ];, but then got stuck because I couldn't figure out how to teach resholve about nixos-enter's use of echo. So is the only user-facing escape hatch currently to lie to resholve with cannot, like execer = [ "cannot:${pkgs.nixos-install-tools}/bin/nixos-enter" ];?

I'm happy to make a binlore issue for nixos-enter or something along those lines, but I'm not entirely sure I understand the situation correctly so I thought I'd start with a comment here.

abathur · 2022-12-29T01:08:51Z

I'm trying to figure out how to deal with nixos-enter, which I think is related to this issue since nixos-enter --command 'echo foo' and nixos-enter -- echo foo can execute commands and resholve/binlore don't have rules for it.

A comment here is a good place to start.

I haven't used nixos-enter, but a quick look suggests that the commands would run in a chroot--are the inputs to resholve guaranteed to be available inside it? (I guess the "bad" case would be if only system packages in the config specified via the --system option are available?)

If everything is available, it can probably be tractable w/ a bit of work. Its options smell like a good example of why staking out a user-facing mechanism is tricky (and why I'm taking it slow...). Are you able to link to the script you need to resholve?

IIUC the -- form would treat echo as an external command (which resholve can handle, but it takes adding a parser).

The --command form appears to run in a bash shell session where echo would refer to the shell builtin. I remember implementing a ~generic shell parser for things like bash -c, but I don't remember wrestling with the nested-shell-script-scope that a principled implementation would require.

I started down the path of execer = [ "can:${pkgs.nixos-install-tools}/bin/nixos-enter" ];, but then got stuck because I couldn't figure out how to teach resholve about nixos-enter's use of echo. So is the only user-facing escape hatch currently to lie to resholve with cannot, like execer = [ "cannot:${pkgs.nixos-install-tools}/bin/nixos-enter" ];?

Correct.

This isn't an oversight, just one of the trickiest features on the roadmap to get right.

I briefly thought we could satisfice w/ a way to specify positions to resolve and replace--but after playing with a draft implementation I decided it was a misfeature. (Basically a worse version of patch+substitute* that is harder to use/understand and part of resholve's maintenance burden...)

The other ~obvious approach is enabling people to write parsers without having to upstream them to resholve. That does need to happen as soon as feasible, but I'm not happy with how easy it is to write and maintain the existing parsers and don't want to make it harder to change until I can either fix it or make my peace w/ its sharp corners.

tejing1 · 2022-12-29T01:45:22Z

nixos-enter is a chroot; the external system is no longer available for programs inside, at least not at the original file path. So really, resholve shouldn't be resolving anything inside the command it's configured to run. It's just a string, from resholve's perspective.

chasecaleb · 2022-12-29T17:13:45Z

@tejing1 and @abathur you're both right about nixos-enter specifically being an edge case due to using chroot, thanks for pointing that out.

The other ~obvious approach is enabling people to write parsers without having to upstream them to resholve. That does need to happen as soon as feasible, but I'm not happy with how easy it is to write and maintain the existing parsers and don't want to make it harder to change until I can either fix it or make my peace w/ its sharp corners.

Okay that makes sense, thanks.

abathur · 2023-03-12T18:14:42Z

Documenting something I should've noted a while back.

A few months ago one of the Fig co-founders suggested that Fig's autocomplete format might help on HN (https://news.ycombinator.com/item?id=32862163).

Their format is somewhat documented in https://fig.io/docs/reference/arg and the other bits of the reference section in the sidebar. Some likely problems:

I searched "isCommand" in their autocomplete repo, however, and have low confidence that their collected definitions for autocompletions are currently robust enough for resholve's needs.
I didn't get an answer back re: how well what they're doing copes with thorny things resholve has to be able to handle (like combined no-argument short flags terminated by a short flag that does take an option), but it might be a tree worth barking up again (whether that's asking them, playing with fig directly, or figuring out if there's actually a parser in it, etc.)
Their spec format is less dynamic than the formats I'm aware of for working shells--but the autocomplete focus still means that their specs use some dynamic behavior. If they were high-quality enough for our needs here, I imagine we'd still need to exclude those dynamic parts? I haven't audited deeply enough to know if that's going to be intrinsically fraught (i.e., no way to exclude them without knocking out support for some arguments, etc.)

Edit: Fig.io will be closing down, so I imagine there's a nonzero chance their repos stop improving. Unless the fig community decide to fork-and-maintain fig in some way, the prospect of having a resource (even if imperfect) that is maintained by a large community with many eyes seems grim.

Announcement for context:

Fig is sunsetting, migrate to Amazon CodeWhisperer
Dear Fig users,

Effective September 1, 2024 we will be ending access to Fig.

We encourage users to migrate to Amazon CodeWhisperer for command line. It’s free on the Individual tier and is designed to be faster and more reliable than Fig. To make this transition as easy as possible, users can upgrade to CodeWhisperer for command line directly from the Fig dashboard.

To learn more about the changes to Fig and how to export your data, read our blog post.

With hundreds of thousands of users, 22k GitHub stars, 13k Discord members, 400+ open source contributors, and 5 products, we are incredibly proud of what we accomplished. We are incredibly thankful to our community for their support, and we are excited to continue our journey with you at Amazon as part of Amazon CodeWhisperer.

Brendan, Matt, and the Fig team

Download CodeWhisperer for command line
Download Fig before it sunsets (existing users only)
View docs for Fig CLI completion specs
View Fig user manual
Contact Fig support

tejing1 mentioned this issue Jul 31, 2022

awk option parser cannot handle -vfoo=bar, only -v foo=bar #82

Closed

abathur mentioned this issue Dec 31, 2022

handling commands that run other commands in a different path-space (different system, chroot, vm, etc.) #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ease path to adding new external command parsers #81

Ease path to adding new external command parsers #81

tejing1 commented Jul 4, 2022

abathur commented Jul 4, 2022 •

edited

chasecaleb commented Dec 28, 2022 •

edited

abathur commented Dec 29, 2022 •

edited

tejing1 commented Dec 29, 2022

chasecaleb commented Dec 29, 2022

abathur commented Mar 12, 2023 •

edited

Ease path to adding new external command parsers #81

Ease path to adding new external command parsers #81

Comments

tejing1 commented Jul 4, 2022

abathur commented Jul 4, 2022 • edited

chasecaleb commented Dec 28, 2022 • edited

abathur commented Dec 29, 2022 • edited

tejing1 commented Dec 29, 2022

chasecaleb commented Dec 29, 2022

abathur commented Mar 12, 2023 • edited

abathur commented Jul 4, 2022 •

edited

chasecaleb commented Dec 28, 2022 •

edited

abathur commented Dec 29, 2022 •

edited

abathur commented Mar 12, 2023 •

edited