Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cli-user driven delimiters #71

Closed
timotheecour opened this issue Dec 3, 2018 · 15 comments
Closed

proposal: cli-user driven delimiters #71

timotheecour opened this issue Dec 3, 2018 · 15 comments

Comments

@timotheecour
Copy link
Contributor

timotheecour commented Dec 3, 2018

(follow-up on #70 (comment))

one thing I never liked with current design is that cli author sets the delimiter; this may not match what cli user finds convenient, and is less explicit (should be clear if set by cli user)

How about the following:

proposal: cli-user driven delimiters

  • get rid of delimiter in dispatch
  • get ride of this syntax: --foo@= to clear a seq; it'll be handled via --foo,= as a special case of multiargument syntax (namely, 0 arg), see below (helps discoverability!)
  • by default, flags assume no delimiter (for eg, a , shall be interpreted as a regular char, eg: arg1,arg2 is a single entry)
proc main(foo: seq[string] = @["bar1", "bar2"]) = discard

## single argument syntaxes:
--foo=arg #  => @["arg"] ; clobbering single assignment; assumes no delimiter
--foo= # => @[""] 
--foo+=arg # => @["bar1", "bar2", "arg"] ; single append
--foo^=arg  # => @["arg", "bar1", "bar2"] ; single prepend
--foo-=bar1  # => @["bar2"] ; single remove; removes ALL (0 or more) entries in `foo` equal to bar1

cligen automatically provides syntax sugar for delimiters; these are in DPSV format, always. The cligen author doesn't need to set anything to enable these.

each multiargument syntax starts with ,= followed either 0 delimiter or 1 delimiter, a slight twist on your DPSV idea.

## multi argument syntaxes (0 ore more)
--foo,= # => @[] ; clears seq; no delimiter given ; that's the twist on your DPSV idea; makes thing nice and uniform IMO
--foo,=,arg1,arg2 # => @["arg1", "arg2"] ; clobbering multi-assign
--foo,=@arg1@arg2 # => @["arg1", "arg2"] ; nothing special here; just to show usage using a different delimiter
--foo,=, # => @[""] ; nothing special here
--foo,=@ # => @[""] ; nothing special here
--foo,+=,arg1,arg2 # @["bar1", "bar2", "arg1", "arg2"] 
--foo,^=,arg1,arg2 # @["arg1", "arg2", "bar1", "bar2"] 
--foo,-=,bar1,bar2 # removes all entries bar1 and bar2 from seq; in this particular case we end up with `@[]`

## space can be used too, so long there's proper shell quoting
--foo,+=" arg1 arg2" # @["bar1", "bar2", "arg1", "arg2"] ; nothing special

benefits

  • cligen author doesn't have to predict the best delimiter; it's set by cli-user
  • no escaping of delimiter needed for single argument syntax (less surprises) ; for multi-arg, we can probably live without escaping, or maybe allow escaping via \, no strong opinion yet (see escaping delimiter for a seq[string] #70)
  • I think it's much simpler to explain that way; the single argument variants are powerful enough to express anything (along with the special multi-arg syntax --foo,= that clears seq) and don't require escaping any delimiter, and for advanced user the multi-arg syntax sugar is always available (doesn't depend on cligen author enabling it)
@c-blake
Copy link
Owner

c-blake commented Dec 3, 2018

I do like the direction of this. There may be some details to figure out and possibly simplifications as I write it up. It had occurred to me to maybe require a delimiter to be in opChars and fold it in with the operator, but your proposal of just ,OP is cleaner. Like I said, I almost went with DPSV by default. This is basically single-arg plus DPSV for multi-arg with the extra emptiness to clear available via missing delimiter. I think as long as we can explain it well in a --help-syntax this will be fine. I will probably try to clean up that code and do this tomorrow or so.

@timotheecour
Copy link
Contributor Author

one design decision is whether to allow all 3 =, :, tokens with the multiarg syntax, eg:

--foo,+=,arg1,arg2
--foo,+:,arg1,arg2
--foo,+ ,arg1,arg2

I'm leaning towards just allowing the 1st token (=) in multiarg syntax as the other 2 sound weird (we can always change our mind later if we want to allow the other 2, but better be conservative first)

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

Maybe your markdown just rendered weird, but what is --foo,+<SPACE>,arg1,arg2 supposed to do?

Also, do you like the current single assignment semantics of clobbering with an explicit = but appending without? That's sort of my own made-up thing, too, AFAIK. All your examples use an explicit operator and with an explicit operator you have to say += to get append. Most things have no way to either clobber or prepend or subtract. Most other things do just append rather than overwrite for seq-like destinations. We could require != or something for clobbering. (Shells seem to use that for clobbering files in re-direction, but it's definitely a pain to quote.)

These are all relatively small questions compared to the overall idea, though.

@timotheecour
Copy link
Contributor Author

Maybe your markdown just rendered weird, but what is --foo,+<SPACE>,arg1,arg2 supposed to do?

that was under assumption that : = (space) are treated equivalently, so that --foo,+<SPACE>,arg1,arg2 (ie --foo,+ ,arg1,arg2); (but NOT --foo,+' ',arg1,arg2ie, not escaping the space) would behave just like --foo,+=,arg1,arg2; but frankly I'd much rather go with require = for all the multi arg variants; it's a new syntax anyway so we might as well be strict about what's allowed (unlike the single argument case in which we can accept all 3 tokens)

other questions answered below

@timotheecour
Copy link
Contributor Author

timotheecour commented Dec 4, 2018

Also, do you like the current single assignment semantics of clobbering with an explicit = but appending without?

I actually had forgotten about that subtely, and TBH I really don't like that there's a difference, it's surprising error prone behavior :) IMO:

--foo=arg 
--foo:arg 
--foo arg 
-f=arg
-f:arg
-f arg

should all be equivalent; furthermore, it shouldn't care whether arg starts with a space (so long it's quoted obviously, for the shell; i'll I'm discussing is AFTER shell parsing) or with -, so we don't have the same complications as in posix tools, eg:

grep foobart file
grep -e -foobart file # -e needed because pattern starts with `-`

Note: for special case when there's no arg at all, there's debate whether to allow -f and --foo or to error (see #72)

proposal: single argument --foo:arg --foo=arg --foo arg are identical

@timotheecour
Copy link
Contributor Author

Note: I had initially thought of giving --foo=arg the meaning of append (for a seq) instead of clobber, but IMO making --foo=arg mean clobber is better for following reasons:

  • single way to do one thing append (+=) vs assign (=) (otherwise these would be multiple ways for append which is confusing)
  • matches common practice in programming languages (at the expense of common practice for cmdline -I foo, which is only downside)
  • biggest argument: self consistent with how other types (beyond seq[string], for example string) are treated, eg:
proc main(cmd = "clang -g -O2") = discard
./main --cmd+=" -std=c++11" # append (anything else would be weird)
./main --cmd="clang++ -O3" # clobbers (anything else would be weird)

whereas if --foo=arg had meant append, the above would be very inconsistent

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

Ah..Sorry about the confusion. I was mentally parsing that as a character in a syntactic position not as any amount of whitespace/os tokenization.

The problem is that out in the wider world, repetition has been the only way/most common way to append, and appending is very commonly needed and supported while clobbering isn't (except for strings). cc -Dd1 -Dd2 -Ip1 -Ip2 is the operative analogy. Virtually everyone is familiar with long series of -I flags and -L or -l flags building up logical sequences. nim c even does the same with it's logical seq variables like -d: and -path:. I don't think we should force CL users to run --help-syntax to find out about += just to be able to do that. Multiple ways to do the same thing is ok -- there was always going to be at least --foo=arg and --foo arg.

I think we should target a situation where what people try first works as expected without looking at any documentation. Learning nothing at all is the least effort thing, even if that nothing is a kind of half capable/half consistent traditional CL syntax.

It may seem like a little thing to you, but those dozen -I flag dozen -l flag command lines are legendary and to some people as signature of what command-line interface syntax is as double-dash for long options. I feel like common CLI practice should win out here to save hassles of people asking WTF it works this way. Bear in mind that CL users will still be using other programs not using cligen at the same time. No one likes to juggle subsyntaxes.

In short, having =|nothing clobber seq seems the wrong choice. Since it has to mean += for seq it should for sets and other types for internal consistency.

I hear you Re: string, but strings can be, as they so often are, a special circumstance. The dual singular/plural personality of strings gets them special syntax all the time (like their internal-delimiter-free literal). Because of that, people don't have rigid expectations about generalizing from "how strings work" to "how anything else does". I think that's a really hard argument to make, actually.

I'm with you on --foo=arg and --foo arg working the same, though, and being error prone. I always felt that was a bit of a stretch and messed up typing my own commands. Prior to this multi-assignment idea there was just no other way to do a clobbering/empty assignment and that seemed an important but sadly rare capability. In being uncommon, it is also fine that it can be only accessible in the more advanced multi-assign syntax. CL users who care/find consistent syntax easier to remember/use will do so after just a few --help-syntax runs.

So, = should clobber strings, but append to seq, and += should append to everything. This seems least likely to confuse. Multiple ways to append for all-but-string is fine. Coherency with broader world is good. The decision here is mostly of the form to treat =string specially as a singular clobber/overwrite, but then re-use the += operator for append. We could require &= or something to append to a string to hammer home this difference. That's probably unnecessarily harsh/different, though.

@timotheecour
Copy link
Contributor Author

timotheecour commented Dec 4, 2018

ok, convinced.

so I guess here's the final deal; please let me know if there's anything where you disagree in what follows:

  • -f<whatever> is always equivalent to --foo<whatever> (where whatever starts with spc,:,=,, (short form == long form) for every type
  • for bool we also allow -abc to mean -a -b -c
  • --foo=<whatever> is always equivalent to --foo:<whatever> and --foo <whatever> (colon/equal/spc are equivalent) but ONLY for single param
  • a naked --foo as a last cmdline argument (ie without an argument, eg: ./main --foo) will result in error when foo in proc is anything except bool
  • semantics for seq[string]:
proc main(foo: seq[string] = @["bar1", "bar2"]) = discard
## single argument syntaxes: assumes no delimiter
--foo=arg #  => @["bar1", "bar2", "arg"] ; append
--foo= # => @["bar1", "bar2", ""] append (an empty string)
--foo+=arg # => @["bar1", "bar2", "arg"] ; append
--foo^=arg  # => @["arg", "bar1", "bar2"] ; prepend
--foo-=bar1  # => @["bar2"] ; remove; removes ALL (0 or more) entries in `foo` equal to bar1

## multi argument syntaxes (0 ore more); `=` CANNOT be interchanged with `:` nor spc 
--foo,= # => @[] ; clears seq; no delimiter given
--foo,=,arg1,arg2 # => @["bar1", "bar2", "arg1", "arg2"] ; append multi
--foo,=@arg1@arg2 # => @["bar1", "bar2", "arg1", "arg2"] ; nothing special here; just to show usage using a different delimiter
--foo,=, # => @["bar1", "bar2", ""] ; nothing special here
--foo,=@ # => @["bar1", "bar2", ""] ; nothing special here
--foo,+=,arg1,arg2 # @["bar1", "bar2", "arg1", "arg2"] append multi
--foo,^=,arg1,arg2 # @["arg1", "arg2", "bar1", "bar2"] prepend multi
--foo,-=,bar1,bar2 # removes all entries bar1 and bar2 from seq; in this particular case we end up with `@[]`
--foo,:=,arg1,arg2 # @["arg1", "arg2"] clobbering (that's the only clobber syntax available for seq (modulo choice of delimter); no clobber syntax for single param)
--foo,:=@arg1@arg2 # @["arg1", "arg2"] same as example above, to show example with another delimiter

## space can be used too, so long there's proper shell quoting
--foo,+=" arg1 arg2" # @["bar1", "bar2", "arg1", "arg2"] ; nothing special

semantics for string (instead of seq[string]):

proc main(foo = "bar") = discard

--foo=arg # => "arg" ; clobbers instead of appends
--foo= # => "" clobbers (sets to an empty string)
--foo+=arg # => "bararg" ; append
--foo^=arg  # => "argbar" ; prepend
# --foo-=bar1 this syntax doesn't exist (would result in error)

note

one more thing: I think we should stick to chars that don't require shell quoting, so &= (that you had mentioned) is out

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

I agree with everything I see there, except --foo,:=,arg1,arg2 as the clobbering assignment. I realize it's possible if we stick to just = and no : in the multi-assignment syntax, but it seems like --foo,@=,arg1,arg2 would be less likely to confuse. That @ is also similar to Nim's @[] seq literal (also "about" whole sequence assignment) making it easy to remember for a CLI user who is also Nim-minded.

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

(That naked non-bool --foo as a last arg should error out with "arg expected". Haven't looked into that bug yet.)

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

And we probably want to use / rather than @ as an alternate to , in examples with more delimiters. People know /reg/ex notation, and it's (basically) a copy of that model. E.g., in bash/zsh you can say ${VAR:gs/old/new}. People sometimes use, @, too, but since that's my proposed operator for clobbering it's less distractingly using the same character for distinct purposes. We also probably won't use / for anything.

@timotheecour
Copy link
Contributor Author

timotheecour commented Dec 4, 2018

  • +1 for not using same char for distinct purposes in --help-syntax; I'm fine with / as u suggest (only 1 worry is might be a tad confusing for windows users where / is used for cmd line arg beginnings, eg: myprog /cmd1 /cmd2, but meh, that's fine)
  • I really find --foo,:=,arg1,arg2 more intuitive than --foo,@=,arg1,arg2 (:= with meaning of assignment is common and intuitive in lots of languages) ; only downside is we're allowing : and = interchangeably for single param case, but the ,= syntax is new anyway so user will have to read --help-syntax at least once to get to these advanced options; but go ahead with --foo,@=,arg1,arg2 if you really prefer, no big deal for that particular point

overall I'm quite happy with resulting syntax / semantics we're finally converging to!

it's a unique mix of:

  • familiar looking syntax behaves as you'd expect
  • syntax sugar allows easy passing of multiple arguments
  • shell escapes are rarely needed (thanks to avoiding chars that need escaping, and thanks to DPSV to some extent)
  • uniform, predictable treatment (except for necessary evil of seq[string] vs string in treatment of --foo=bar)
  • strictly more powerful than standard posix cmdline (eg allow resetting, clobber, prepend instead of just append)

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

I get your := idea...If A) Nim/Windows/even our single-assign mode wouldn't been using : for other stuff and not using @ at all and B) Nim didn't use @[] for its seq I would probably tilt that way, too. But given A&B I tilt toward @ and away from : for clarity/rememberability.

In impl terms, parseopt3's sepChars uses : long before it gets to an argParse, though. It could actually be : (all of sepChars, really) should be disallowed as opChars. It might sorta work, but be strange/limited. Also seemed easier to sidestep with @.

Anyway, I should get to work re-coding all this now. I agree that it's a nice outcome that satisfies a lot of concerns. People also sometimes use % instead of / as an alternate for ,/@. We can just use % and , (if we need >1). We can see how that help message looks.

c-blake pushed a commit that referenced this issue Dec 4, 2018
@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

Actually, I think I changed my mind about a missing argument to an option at the end of parameters. An empty argument is a valid one for several kinds of types. So, I think we should just set the val to be empty string and let the argParse for the type decide. For numbers/things that need non-empty string the argParse will error out. It should be consistent, though, and right now it isn't.

@c-blake
Copy link
Owner

c-blake commented Dec 4, 2018

Ok. I think I got everything we outlined above (with the EOL missing equivalent to empty arg delta), but I would not be even a little surprised if I missed something. Give it a spin and let me know. We should probably add a bunch of other-than-just---help invocations to test.sh after everything is built. PR welcome. You can update test/ref or just let me do it. I try to run on an 80 column terminal so the line wrapping is consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants