Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify docopt #104

Open
keleshev opened this issue Apr 7, 2013 · 62 comments
Open

Simplify docopt #104

keleshev opened this issue Apr 7, 2013 · 62 comments
Labels

Comments

@keleshev
Copy link
Member

keleshev commented Apr 7, 2013

@docopt/docopt, which features would you get rid of in docopt? I'm really concerned with 1.0.0 release and I wish to strip features and simplify implementation. Any candidates?

@shabbyrobe
Copy link
Member

I'd be wary of drawing too much from JSON here - it's not an apples to apples comparison. docopt is targeting a much more complex problem and doesn't have a blank slate to work with. The legacy of thirty odd years worth of bizarre, twisted help messages is hard to escape!

I've been over the current README from top to bottom and identified a few things I thought could go, only to remember heaps of use cases for everything but the uppercase arg thing mentioned in #50. I beg you - please be careful before deprecating features - docopt is incredibly useful and flexible in its current form.

@keleshev
Copy link
Member Author

Suggestions so far:

@shabbyrobe
Copy link
Member

What about just removing --version? That seems less like docopt's job than --help.

@kblomqvist
Copy link
Member

I suggest removing help as well. However, I like ARG more than <arg> so I would keep both. But yeah, I understand that it would be simpler to have only one way to do it.

@ryanartecona
Copy link
Member

I think these are different types of features. ARG vs. <arg> syntax should absolutely be standardized, as it is core to the docopt language. Automatically handling help and version arguments are not strictly necessary for docopt to still parse those options and just let the user explicitly handle them. That said, those two options could be considered boring boilerplate, and conveniences to automatically handle them would be useful. I think whether to make it available and if so whether to enable it by default should be a port-specific decision, not a requirement for compatibility across ports.

@keleshev
Copy link
Member Author

ARG vs. <arg> syntax should absolutely be standardized

You mean we should pick one of them?

@ryanartecona
Copy link
Member

Sorry I've been absent; I'm in the middle of finals for my last year of undergraduate.

Yeah, I think it would be better to have one or the other form for positional args, not both. It seems like <arg> form would make for an easier implementation, though some people seem to strongly prefer ARG. Being relatively new to the command line, <arg> seems clearer, more difficult to mistake for a command. I just think if docopt is going to have a standardized syntax, it makes more sense to have only one way of writing positional arguments.

@keleshev
Copy link
Member Author

The only thing stopping me from removing ARG convention is that it might confuse a lot of people. Also that would invalidate my PyCon UK talk as a good introduction to docopt.

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

So, lately I'm thinking of:

  • remove ARG convention (as above)
  • require to write equal sign --long=<arg> in usage-pattern

Also, it is not documented, but options' arguments could be arbitrary strings right now, not just <arg> or ARG, so I'm thinking of removing that in favor of:

  • having only <arg> convention for options, not arg or ARG—both in usage-pattern and in options section

Some more questionable moves:

  • reject short options in usage-pattern; if you want a short option, use [options] or make an equivalent long option

I you apply all the above changes to the docopt usage-pattern, its grammar becomes context-free, and could be parsed with a normal parser.

What do you think?

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

/cc @docopt, since the above will likely influence everyone

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

BTW, @johari, how is your success with parsing docopt using parser generator?

@johari
Copy link
Contributor

johari commented May 4, 2013

@halst racc does the job very well. Until you parse in two phases, it's really easy to parse docopt using lexers + parser generators. (docopt_racc needs some minor refinements though. I'm looking forward to new test cases, especially error messages)

I can write a separate document explaining implementation details of docopt_racc if you're interested. But generally, it parses "options block" first (figuring out which options have arguments), and then passes those data to "usage block" lexer. This lexer would provide tokens to "usage block" parser and that would eventually generate a simple parser that consumes argv. I'm thinking of extracting the latter consumer to a separate gem, so that any option parser can benefit from it, regardless of external API.

So basically, you don't need to simplify docopt in order to use a parser generator.

To be honest, I don't agree with any of the mentioned simplifications. I like docopt as it is.

I can't see why you want to simplify or remove any part of the language, as they are all based on rightful conventions made by programmers over the years.

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

The thing is that right now you can't write a grammar for the language, it's just not possible. As you have shown, you can parse it in several stages, though.

If we make those changes it is possible to write a single grammar and declare it as the docopt language.

BTW, did you look into Parsing Expression Grammar? If I have to define docopt via grammar I would definitely use PEG, since you don't need tokenization as a separate step.

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

BTW, why do you see "require to write equal sign --long=<arg> in usage-pattern" as a bad idea?

@johari
Copy link
Contributor

johari commented May 4, 2013

BTW, did you look into Parsing Expression Grammar? If I have to define docopt via grammar I would definitely use PEG, since you don't need tokenization as a separate step.

Yep, but I found working with racc and crafting the lexer more fascinating. Nevertheless, I like to experiment with PEGs someday.

The thing is that right now you can't write a grammar for the language, it's just not possible.

Having a single grammar for docopt language is really cool (I love it!), but that shouldn't be the main goal for language decisions. I mostly like to consider CLI-developer happiness, and removing ARG is definitely not in that direction.

Here's an idea: we can have a small minidocopt side project aiming to restrict docopt to a meaningful subset expressible by PEGs in a business card. How's that?

Implementing minidocopt would also be easy in ruby if I extract that "argv consumer" to a separate gem. We can also have a similar project in python.

BTW, why do you see "require to write equal sign --long= in usage-pattern" as a bad idea?

This actually isn't bad. We should encourage people to do that in our docs. In fact, one of the reasons we parse docopt in two phases is to deal with --long <arg> cases in usage message.
But your solution wouldn't make docopt parsable in a single phase, or expressible in PEGs, since you might also see -s <arg> in usage message when -s has arguments. So you finally need to parse options block and provide those data to usage block lexer.

Removing --long arg wouldn't solve any problem unless you require people to write only long options for those options that have arguments, or omit <arg> for short options. That is a bad idea..

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

I mostly like to consider CLI-developer happiness

It depends whether we are talking about short-term happiness or long-term one :-). In the short turn it will be like "shit, why is docopt not working anymore?!", but in the long term, if docopt can influence the status-quo of command-line interfaces it will make more people happy, they know that <arg> are arguments, that --long=<arg> are long options with arguments, no confusion about different conventions, there is a single convention.

Then, if we remove short options from usage-patterns, I would argue that the world would be a better place. Who likes the man ls page, right?

SYNOPSIS
     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

It should have used [options], or just specified the long options.

@johari
Copy link
Contributor

johari commented May 4, 2013

Well we can't make any influence by breaking existing habits. The best way to influence people is to slowly encourage them do the right thing.

For example, I think docopt did the right thing not to support long options with single "-" prefix (like ruby -rubygems), but deprecating ARG is just pushing too far.
.
What makes docopt so cool is that you don't need to read any DSL/API reference to write a CLI (we're not marketing it in our docs the right way though), since docopt complies with what a CLI-developer already knows and these simplifications would just break that.

Also, we don't have any sophisticated error reporting in docopt right now, so we'd just increase the wtf/s ratio if we deprecate common patterns, rendering people confused.

Here's an idea, we can have an "strict" option to enforce best practices for people that really care. This can also map to the minidocopt subset, or shall we say, strictdocopt.

Then, if we remove short options from usage-patterns, I would argue that the world would be a better place. Who likes the man ls page, right?

I like your man ls synopsis more than mine, It just tells me what I needed to look up a single line.. 😀

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

Maybe then allow single letter options:

  • single letter without argument
  • more specific format for options with argument
    • like require parens around: (-o <arg>), or
    • require it to be one word -o<arg>

Sounds reasonable to me. And we still get parsability.

@keleshev
Copy link
Member Author

keleshev commented May 4, 2013

And that is possible only in case <arg> is the only convention. You naturally cant write it as -oARG or -oarg as you can do it now. That is a good thing.

@jric
Copy link

jric commented May 5, 2013

It' is less readable to use short options than long options, but it's more convenient when you have to use those options a lot.

My software-design philosophy is that it's worth some extra pain in implementing the infrastructure (docopt), so that the 99% of the usage out there, which includes writing and use of end-user apps, becomes easier or more intuitive.

The problem with (-o <arg>) is that it is a second meaning for the parens. The existing meaning is, these arguments come together. The new meaning is, these arguments come together AND the value of <arg> is stored in the dictionary as "-o". But what if someone really just wants the value stored as "<arg>"? For instance, (-o <output>). For code readability, they may want to access this as args["<output>"] rather than args["-o"].

The problem with -o<arg> is that it looks different than how it is used, which again could become confusing. It should be used as "-o myout.txt" because that's more readable than "-omyout.txt".

Requiring the '=' seems somewhat innocuous, since it's no more verbose or less readable than allowing "-o <arg>" and "--output <arg>" if you could allow "-o=<arg>" and "--output=<arg>" -- could you? Only problem is, as you noted in another thread, CLI interfaces are function signatures, and it could be a big hurdle to adoption of docopt, and hardship for existing or new users, to break all that backwards-compatibility. What about flags that introduce multiple arguments? Would you require them to be written as "--output=<basename>=<ext>" or as "--output=<basename> <ext>"? That starts to look confusing. And now you have two characters that need to be escaped for parsing, the space AND the equal sign.

Another good idea for a "strict" mode would be to require a long-name equivalent for every short option. I also think that listing both, or at least the long-form, in the usage block would be preferable to just listing the short-form of the option there.

The --help and --version support is probably not adding much complexity to your code -- more likely a small add-on, and again, is a time-saver for everybody using docopt, as well as being a way to ensure everybody's CLI powered by docopt has a help message!

Long story, short, as nice as it would be to simplify the docopt code, I'm afraid I don't see how to do it in the specification, without pushing out that complexity onto the broader base of end-users out there.

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

It' is less readable to use short options than long options, but it's more convenient when you have to use those options a lot.

I agree, but the changes proposed above are only about usage-pattern, nothing will be changed for parsing the ARGV.

--output=<basename>=<ext>

This is not a problem since for an option to take several arguments you need to "repeat" it --output=<1> --output=<2> or --output=<arg> ....

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

The problem with -o<arg> is that it looks different than how it is used

Well -obar is a valid use, like -ofile.txt. Take git for example, you can do git commit -mMyMessage and the commit message will be MyMessage. So it doesn't look different from the pattern.

it could be a big hurdle to adoption of docopt, and hardship for existing or new users, to break all that backwards-compatibility

My idea is to never break or even never change docopt after 1.0.0. Right now docopt is a long-lasting beta :-). Eventually docopt 1.0.0 will be released and we'll have a solid CLI description language for decades to come. That is why I want to make docopt right, and I would prefer omitting a good feature, than keeping a bad one.

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

As you say, requiring = is harmless. So the last step to have parsable docopt is to somehow handle short options with arguments. So far -o<arg> seems to make sense. Maybe there are better alternatives?

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

So to sum up my opinion to date (about usage-pattern-parsing, not argv-parsing):

  • remove ARG convention in favor of <arg>
  • make --long=<arg> the only convention for long options with argument
    • this effectively removes --long=ARG, --long=arg, --long ARG and --long arg conventions
  • make -s<arg> the only convention for short options with argument
    • this effectively removes -sARG, -sarg, -s ARG, -s arg and -s <arg> conventions

This way we keep all the functionality, and make docopt parsable.

Some of you may think "why should we care so much about parsability?". Well if we change the grammar to be context-free it simplifies not only computer reasoning, but also human reasoning. Context-free grammar allows humans to parse it without keeping any context in their heads. Context like "which option takes argument, which not". When you see -s<arg> you know, -s takes argument. When you see -s <arg> you need to look it up to be sure.

@ambv
Copy link

ambv commented May 5, 2013

👍

Since it's not 1.0 yet, you don't have to think about backwards compatibility too hard. Just document it clearly. I personally liked the --long=ARG -sARG syntax. If you kept uppercasing ARG everywhere, it was easy to read by humans as well. That being said, the <arg> syntax is unambiguous and I like it, too. It's your call. I support simplifying the spec and implementation.

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

@ambv thanks for the feedback, appreciated.

@keleshev
Copy link
Member Author

keleshev commented May 5, 2013

I just pushed an experimental PEG-based parser and thought you guys might want to take a look:

78b8aea

Right now it handles most of usage-pattern, but it is not integrated with the rest of the code. It uses parsimonious PEG library.

@jric
Copy link

jric commented May 5, 2013

I'd prefer -o=<arg> to -o<arg>, since it' s more analogous to the --output=<arg> case.

@ambv
Copy link

ambv commented May 5, 2013

@jric but it's also totally unlike how POSIX command line utilities have worked for the last 30 years or so.

@fsaintjacques
Copy link

+1 to the latest proposition

@kblomqvist
Copy link
Member

Could we still have -s <arg> in addition to -s<arg>? --long=<arg> is fine as is without other formats.

@keleshev
Copy link
Member Author

keleshev commented May 9, 2013

@kblomqvist the problem with -s <arg> is you can't figure out if it's an option with argument, or option and a separate argument, without some context. This makes it unparsable with conventional context-free grammar (CFG) or parsing expression grammar (PEG).

But we may come up with a better context-free version than -s<arg>.

@KangOl
Copy link

KangOl commented May 9, 2013

I would also allow ? in short_options

@ambv
Copy link

ambv commented May 9, 2013

This is tricky, indeed. A thing to consider is how will a user react if tool -s arg throws an error but tool -sarg suddenly works. Same for --long=opt and --long opt. If you're going to cut on functionality here, better think of great (and I do mean excellent) error messages.

With the -s arg form docopt basically becomes a meta-grammar with each user configuration being a concrete grammar (where a PEG would know that the -s token is followed by an argument token).

Maybe a feasible route would be to make docopt generate a concrete grammar from the docstring. In other words, the docstring has to be -s<arg> but the user can pass -s arg and it's parsed correctly.

@keleshev
Copy link
Member Author

keleshev commented May 9, 2013

If you're going to cut on functionality here, better think of great (and I do mean excellent) error messages.

Absolutely agree.

Maybe a feasible route would be to make docopt generate a concrete grammar from the docstring. In other words, the docstring has to be -s but the user can pass -s arg and it's parsed correctly.

I'm not sure, are you talking about usage-section-parsing or argv-parsing?

Just to be sure, the idea was to restrict the usage-section (docstring) to -s<arg> and --long=<arg> forms, while the end users (argv providers) are allowed to write -sfoo, -s foo and --long=foo, --long foo.

@ambv
Copy link

ambv commented May 9, 2013

Haha, this is what I was suggesting! Perfect. I'm absolutely in favor of simplifying the grammar for usage-section and being more lenient with argv parsing.

If the developer cannot write -s<arg> properly and consistently in the docstring, then we have other problems to deal with.

@erikrose
Copy link

erikrose commented May 9, 2013

Here's a crazy idea: on your Parsimonious branch, you're using a Parsimonious grammar to parse docopt DSL, but then presumably you're still delegating to your existing ad hoc code to do the recognition of command-line input. Have you thought about instead generating a second Parsimonious grammar from that and running it against the command-line input? You could either generate a string and throw it at the Grammar constructor (safer) or construct a tree of Expression objects manually. Just a crazy thought that popped into my mind—I haven't looked at your code at all. :-)

@erikrose
Copy link

Okay, I've looked at your code now. Just look how much alike it and parsimonious/expressions.py are! Wow. We even named a bunch of classes almost the same. Unless I'm missing something (which is quite likely), you could delete a whole pile of that code and just write a NodeVisitor which extracts the final dict from the tree. Done.

The one tricky bit is that you have a pre-divided argv coming in; you'd have to mash that back down to a string to have a grammar parse it, so you'd need to decide on some kind of backslashy lossless conversion or something. /idle commentary

@keleshev
Copy link
Member Author

@erikrose that's an interesting idea, I need to think more about that. That could be a nice refactoring.

@YorikSar
Copy link

I have a proposal on the topic.
Currently the module implements both docopt syntax parsing and command-line parsing. Why don't let docopt parse its special syntax, generate a good old argparse parser and let standard library handle the rest?

@keleshev
Copy link
Member Author

@YorikSar argparse can't handle many of docopt features, such as arbitrary nested patterns.

@YorikSar
Copy link

Can you please give me one good example of what argparse can not do?

@keleshev
Copy link
Member Author

@YorikSar

  • [--foo | --bar --baz]
  • [- | <file>...]
  • (<from> <to>)...
  • [-v | -vv | -vvv]

@YorikSar
Copy link

Thanks!
I was under impression that since argparse allows to nest argument groups it handles them appropriately.

@fsaintjacques
Copy link

I've been playing with the latest proposed grammar. There seems to be an ambiguity with short options, consider the following definition:

-abc<c_arg> -d<d_arg>

There is no problem parsing this definition in the usage section. The problem arise while parsing argv; how do we interpret -cabd123? If we're parsing in a greedy fashion, it should be Option("-c", "abd123"), but according to the grammar definition, it can also be interpreted as Option("-c"), Option("-a"), Option("-b"), Option("-d", "123").

To me, packing short options while allowing only the last one to have an argument is syntactic sugar that brings a lot of problem for almost no gain. IMHO, [options] is good enough and is almost exclusively what we use at $WORK. Is this because we want to support the ol' -vvvv (which should be replaced with -v<level>)?

On related note, many times I faced the situation where I wanted something like [options] but more strict, i.e. some options are command dependent. I was thinking of options set à la [:group1:], [:group2:] ... where the sets are defined in the options section. That would solve the problem of packing options (at usage time at least) AND would also support long options.

@fsaintjacques
Copy link

@keleshev should we plan a deadline for the implementation of this :). Maybe make it more formal and call it docopt 1.0.0 ?

@Ericson2314
Copy link

How bout we use braces to enclose short options with arguments (for usage, not argv)? They haven't been put to use yet.

expr              <- space* basic_expr '...'? space*
basic_expr        <- '[options]' / required    / optional
                                 / long_option / short_options
                                 / short_options_arg
                                 / argument    / command
required          <- '(' expr+ ('|' expr+)* ')'
optional          <- '[' expr+ ('|' expr+)* ']'
long_option       <- '--' (alnum / '-')+ ('=' argument)?                    
short_options     <- '-' (alnum)+ 
short_options_arg <- '{' space+ '-' (alnum)+ (space+ argument)* space+ '}'
argument          <- '<' (alnum / ' ' / '-')+ '>'
command           <- [a-z-]+
space             <- [ \t\r]
alnum             <- [a-zA-Z0-9]

Also I agree about -asd<Foo> not pulling its weight, and adjusted the grammer to require space accordingly.

Finally, I removed \n from spacing because we use that for the implicit top-level alternative matching.

@Ericson2314
Copy link

To get on to the next steps of formalization, this grammar parses a strict subset of my previous one (provided I didn't screw up) to create a richer parse tree showing which patterns can come in any order in which can't. This may or may not be useful to implementations, but is certainly good for fuzz testing.

A few examples of subtleties (edit: these are argvs not usages):

  • my-prog --foo --bar == my-prog --bar --foo
  • my-prog start stop != my-prog stop start, assuming start and stop are commands in usage
  • git -C /asdf confuse --verbose != git --verbose /asdf confuse -C
# replaces `expr+` the non-commute must come first, anything after comes in any order
sequence          <- (non-commute+ commute*)+

# Can be matched in any order
# Technically also associative with [..] and (..)
commute           <- space* commute space*
basic_commute     <- [options]'
                   / required_commute
                   / optional_commute
                   / long_option
                   / short_options
                   / short_options_arg
required_commute  <- '(' commute+ ('|' commute+)* ')'
optional_commute  <- '[' commute+ ('|' commute+)* ']'
long_option       <- '--' (alnum / '-')+ ('=' argument)?
short_options     <- '-' (alnum)+
short_options_arg <- '{' space+ '-' (alnum)+ (space+ argument)* space+ '}'

# Must be matched in order
non_commute       <- space* basic_non_commute '...'? space*
basic_non_commute <- required
                   / optional
                   / argument
                   / command
required          <- '(' sequence ('|' sequence)* ')'
optional          <- '[' sequence ('|' sequence)* ']'
command           <- [a-z-]+

space             <- [ \t\r]
alnum             <- [a-zA-Z0-9]

@Ericson2314
Copy link

Also, not sure if this has been discussed before, but I'd find it more intuitive if [foo bar baz] == [(foo bar baz)] rather than [foo] [bar] [baz].

Among other things, this makes this nice symmetry:

  • [foo bar baz | zxcv asdf qwer] == [(foo bar baz) | (zxcv asdf qwer)] == [((foo bar baz) | (zxcv asdf qwer))]
  • ((foo bar baz) | (zxcv asdf qwer)) == ((foo bar baz) | (zxcv asdf qwer))

[foo bar baz | zxcv asdf qwer] == [[foo bar baz] | [zxcv asdf qwer]] == [[foo] [bar] [baz] | [zxcv] [asdf] [qwer]] is arguably just as symmetric with the old way, but much more confusing semantically.

Also with this changing the new conservative grammar to

optional  <- '[' expr+ ']'

or even

optional  <- '[' expr ']'

would make it more minimal without loss of expressive power.

@TylerTemp
Copy link

@Ericson2314

This will break the following format

cat [-benstuv] [<file> ...]

@Ericson2314
Copy link

@TylerTemp That part about the commutation I assume you mean? [Fun fact, BSD commands actually do work that way.] But yeah I see how it is restrictive. Another route is to make so arguments commute with things in commute but not with each other (this does not apply to option arguments).

git -C foo merge -X bar == git foo -C merge bar -X != git foo -X merge bar -C, were only git and merge are commands

felixSchl added a commit to felixSchl/neodoc that referenced this issue Mar 14, 2016
* ...but don't support abbreviations. They are considered a misfeature
  in the original (Refer: docopt/docopt#104)
* ...but keep alias in outputs. I.e. if there's an option: "-o,
  --output", then provide the matched value for both keys:
  '{ "--output": value, "-o": value }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests