Another request for type annotations and conversions #61

Closed
drewfrank opened this Issue Oct 11, 2012 · 12 comments

Comments

Projects
None yet
3 participants

I just learned about docopt from your video featured in Python Weekly, and I must say it looks amazing! However, I was surprised to find that there is no functionality provided for annotating or converting the types of arguments. Even though this has been raised and closed twice before --

#8
#58

-- I think it is an important feature and I want to revisit it to add my perspective and support.

The main arguments against adding this functionality seem to be:

  1. Simplicity is valuable.
  2. You can easily perform conversions elsewhere using Schema

I will address each of these in turn:

  1. Yes, simplicity is valuable and one of the most compelling aspects of docopt. To preserve simplicity, let's consider allowing only the following type annotations: int, float, and string. Everything is a string unless otherwise annotated. This drastically reduces both implementation complexity and usage complexity as compared to a more featureful approach that would allow conversions to arbitrary, possibly user-defined Python objects.

    I think int and float would cover a large fraction of use cases. The only other types I would even consider supporting would be boolean and maybe file. Since docopt is presenting a generic command line interface, it makes sense to me that it would only support data types that are somehow "native" to the Unix ecosystem rather than arbitrary Python-specific types that have no Unix analogue. Ranges or sets of the supported primitives would also be candidates for inclusion -- I think they would be worth adding, but I agree they add complexity (both implementation-wise and in the DSL). So, start with just the really simple stuff!

  2. If an argument is required to be a particular type, I would like the command line help text to reflect that. If I've already specified the type in the usage message, why should I need to manually create a schema re-specifying portions of the interface?

    Schema looks like a very nice tool! What if docopt used the information it parsed from the usage string to automatically create the Schema and perform validations/conversions for me (behind the scenes)? That would be big win for clarity and usability and incur only minor implementation complexity.

Anyway, I just wanted to add my two cents. Again, great job with docopt!

Owner

keleshev commented Oct 11, 2012

What API do you imagine? Like in #8, or #58, or something else?

I think the API in #8 is on the right track with something like:

Options:
  --count=N  number of operations    @int

The annotation could either be stripped from the usage string as suggested in #8, or (my personal preference) replaced with something like "Type: integer".

Owner

keleshev commented Oct 11, 2012

  1. The DSL must stay programming-language agnostic. In different languages definition of "integer" is different, also it is spelled differently.
  2. Stripping/replacing stuff makes DSL non-WYSIWYG, which is bad.

So if type-conversion will make it in DSL it should be something elegant.

Or maybe you can think of a good function-level API, which is radically better than using schema+docopt?

I agree that the DSL must be language agnostic. I can see two ways of dealing with this. One: annotations in the DSL refer to the abstract notion of a data type, and the specific conversion that is performed varies based on the language. For example, the "@float" (maybe "@floating"?) annotation would convert to a float in Python and a double in C. Two: each language supported by docopt supports its own set of type annotations. This could be done in a modular way, but obviously it trades off simplicity for flexibility.

To address the WYSIWYG problem: just leave the annotations in the usage string. Maybe that argues for a different spelling of the annotation ("@integer" doesn't really look like it's meant for humans), but this seems like an easy problem to solve. How about just annotating via "Type: <docopt_type>"?

Owner

keleshev commented Oct 11, 2012

My case against this is that you rarely need to verify that an argument is just integer, or just float, or just file. In reality you want to verify that an integer is in rage, a string belongs to a set, a file is writable.

However, if you come up with a particular, formulated proposal to extend API/DSL, I will review it carefully.

That's fair. You are right that specific ranges or sets of values are often required. My goal in creating this issue was just to emphasize the following points:

  1. If there are restrictions on acceptable values for an argument, those restrictions should be visible in the usage string.
  2. If we're already documenting variable types and acceptable values in the usage string, it would be wonderful to parse that information and avoid respecifying it elsewhere.
  3. I think accomplishing (2) could be made fairly simple by putting some reasonable limits on the supported data types / validation functions.

To be honest, given my current commitments I'm not likely to follow up with a more detailed proposal or implementation any time soon. Regardless, thank you for the discussion. I hope it will prove useful if you, I, or someone else decides to revisit the idea in the future.

@drewfrank drewfrank closed this Oct 11, 2012

hmm, don’t y’all think we need a tracking bug that stays open until we found a solution? after #8, #58, and #61, it’s pretty clear there is a need, and we just can’t decide on a syntax.

maybe we should collect use cases until we can determine if we can restrict ourselves to a simple DSL inside the docstring, or need to do it in the python API after docopt did its thing (e.g. with schema).

i’ll start:

Common usecase “bunch of flags”

when using schema with that, i have to test for a bunch of keys i don’t care about because i know they’re there.

docopt already does a good job in converting flags to bools, i don’t need schema for it, but schema can’t just partially validate every key existing in the schema dict.

it’s already too much useless boilerplate for the -h, --help, and --version keys. (see below)

Common usecase “range of ints”

we can use schema nicely here, but int conversions with ranges are so common that i’d rather want a specific syntax for it inside the docstring. i mean: the user needs to know about those ranges, too, and docopt is all about using the information we present to the user. this surely also applies for a few other things, e.g. colon-separated paths.

Common usecase “optional file handles that can be stdin/out” aka “the dash”

this one is a convention. docopt says it supports it, but it only supports “stdin or not” instead of the more common “stdin/out or file”.

the probem of using schema is the choice between either the complexly currying get_stream function below, or too much code duplication

args=docopt("""d2d

converts <infile> to <outfile>

Usage:
    d2d [<infile> [<outfile>]]
    d2d -w <infile> <outfile>
    d2d -h | --help
    d2d --version

""")

def get_stream(default, mode='r'):
    return lambda f: open(f, mode) if f not in ('-', None) else default

Schema({
    '<infile>':  Use(get_stream(stdin)),
    '<outfile>': Use(get_stream(stdout, 'w')),
    '-w': bool, #don’t care
    '-h': bool, #don’t care
    '--help': bool, #don’t care
    '--version': bool, #don’t care either
}).validate(args)

@keleshev keleshev reopened this Oct 24, 2012

Owner

keleshev commented Oct 24, 2012

I'm all open to your proposals to change API or DSL.

I can see that some of it would be handled if schema supported keys to be literals and types at the same time:

Schema({'<infile>': Use(open), str: object})  # don't care about other str keys
Owner

keleshev commented Dec 10, 2012

If you have a specific API or DSL proposal I will reopen this.

@keleshev keleshev closed this Dec 10, 2012

Owner

keleshev commented Dec 10, 2012

I also added this: halst/schema#9

Owner

keleshev commented Mar 29, 2013

Since version 0.2.0 of schema you can now rewrite the following;

Schema({
    '<infile>':  Use(get_stream(stdin)),
    '<outfile>': Use(get_stream(stdout, 'w')),
    '-w': bool, #don’t care
    '-h': bool, #don’t care
    '--help': bool, #don’t care
    '--version': bool, #don’t care either
}).validate(args)

into:

Schema({
    '<infile>':  Use(get_stream(stdin)),
    '<outfile>': Use(get_stream(stdout, 'w')),
    str: bool, #don’t care
}).validate(args)

woo! progress!

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment