Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support command composition #7

Closed
evanunderscore opened this issue Feb 19, 2016 · 13 comments
Closed

Support command composition #7

evanunderscore opened this issue Feb 19, 2016 · 13 comments

Comments

@evanunderscore
Copy link
Collaborator

Specific feature request:

I wonder if you could support composition without adding too much complexity. A lot of my CLIs take an optional config argument which takes a path to a YAML file (defaults to ./config.yaml) and loads that YAML file and initializes a logger. I'd love to be able to re-use that code and simply compose it with other scripts. I don't know if that would be possible without sacrificing simplicity though.

Source

This is within the realm of possibility but I'm looking for a clean design. Suggestions are welcome.

@evanunderscore
Copy link
Collaborator Author

Possible design:

from common import config

def main(...):
    ...

defopt.run(main, setup=config)

When using a single command as above, arguments from config would be mixed with arguments for main (this will fail if they share a name). When using multiple commands, arguments for config go before the subcommand.

python script.py --config config.yaml subcommand args

config would called before the chosen command.

@AbeDillon
Copy link

AbeDillon commented Jun 9, 2016

Hi, I'm the guy you quoted on reddit for this enhancement suggestion. I just now got around to trying to implement something like this using defopt, but I've run into a few problems. Here's my approach:

I wanted a class that would manage layers of config, provide logging, and provide a CLI (I might need to work on Single Responsibility Principal, but we'll ignore that for now):

import script

# subclass script.Script to get all the nifty tools
class MyScript(script.Script):
    def work(self, pos_arg, *args, kw_only_arg, opt_arg=1, **kwargs):
        """doc string"""
        super().work(*args, **kwargs)
        # logging is built in
        self.log.info("starting work with: {}".format(locals())
        # script.Script is a subclass of collections.ChainMap and all config is stored in self
        host = self.get("host", "localhost")
        # config is layered and comes from several places including an optional --config-path cli argument
        ...do stuff 

if __name__ == "__main__":
    MyScript().run()

Here's how I'm going about it so far:

import collections
import logging
from logging import config as logging_config
import copy
import defopt
from . import util  # this is a place holder for various helper functions


class Script(collections.ChainMap):
    """
    Base class for configured, logged, command-line scripts.

    Config layers are prioritized by how close to run-time they are set, e.g. config parameters passed in
    through the command line take priority over config parameters specified in a file.

    Attributes:
        cls_config (Dict[str, Any]): default config specified as a class-level attribute (lowest priority)
        obj_config (Dict[str, Any]): config passed in to the __init__ method (low priority)
        pth_config (Dict[str, Any]): config loaded from a yaml file (high priority)
        cli_config (Dict[str, Any]): config key-value pairs from the command line (highest priority)
        log (logging.Logger): a logger
        default_config_path (str): fallback path to look for a yaml config file
        reserved (Set[str]): command line parameter names that are reserved
    """

    cls_config = {}
    default_config_path = "./config.yaml"
    reserved = frozenset(["-c", "--config-path", "-h", "--help"])

    def __init__(self, **obj_config):
        super().__init__()
        self.cli_config = {}
        self.maps.append(self.cli_config)
        self.pth_config = {}
        self.maps.append(self.pth_config)
        self.obj_config = obj_config
        self.maps.append(self.obj_config)
        self.maps.append(self.cls_config)
        self.log = logging.getLogger(type(self).__name__)
        self.config_log("basic logging enabled")

    def config_log(self, message=""):
        """configure or reconfigure the logger, then log the given message (if any)"""
        config = self.get("logging")
        if config:
            logging_config.dictConfig(config)
        self.log = logging.getLogger(type(self).__name__)
        if message:
            self.log.info(message)

    def run(self, argv=None):
        self.log.info("running")
        argv = sys.argv[1:] if argv is None else argv
        self._load_path_config(argv)
        # walks the MRO to build a function with a complete signature and doc-string for a method
        work = util.flatten_method(self.work)
        # automatically generates short aliases for parameters (excluding any reserved names)
        short = util.auto_short(work, reserved=self.reserved)
        # removes unknown args from argv and attempts to build a kwargs dict out of them
        cli_config = util.filter_kwargs(argv, work, short)
        self.cli_config.update(cli_config)
        self.config_log("all config loaded and logger fully configured")
        try:
            defopt.run(work, argv=argv, short=short)
        except KeyboardInterrupt:
            self.log.info("KeyboardInterrupt received. Exiting")
        except:
            self.log.exception()
            raise

    def work(self):
        """Override this method with custom script logic"""
        self.log.info("working")

    def _load_path_config(self, argv):
        self.log.info("loading path config")
        config_path = self.default_config_path
        if {"-c", "--config-path"} & set(argv):
            i = argv.index("-c") if "-c" in argv else argv.index("--config-path")
            argv.pop(i)  # remove the key
            config_path = argv.pop(i)  # remove the value
        self.log.info("attempting to load config from %s", config_path)
        try:
            with open(config_path) as f:
                pth_config = yaml.load(f)
            self.pth_config.update(pth_config)
        except FileNotFoundError:
            self.log.warinig("config file not found at %s", config_path)
        else:
            self.config_log("config file loaded")

OK, so hopefully most of that makes at least a little sense. The magic/troublesome bits are the util helper functions: flatten_method, auto_short, and filter_kwargs. flatten_method is kinda complicated and might not be a worth-while addition (I haven't quite gotten it to work yet), I might cover it in a different proposal. auto_short is pretty simple:

import inspect


def auto_short(func, reserved=frozenset()):
    """
    Build dictionary of shortened arguments for the given function

    Args:
        func (callable): the function from which the param list will be derived
        reserved (Set[str]): a set of reserved words
    Returns (Dict[str, str]): maps original param names to short versions
    """
    lowers = set("-"+c for c in string.ascii_lowercase) - reserved
    uppers = set("-"+c for c in string.ascii_uppercase) - reserved
    short = {}
    unshortened = []
    # pass one: try to map as many params to the proper case
    for name in inspect.signature(func).parameters:
        c = "-" + name[0]
        if c in lowers | uppers:
            short["--" + name.replace("_", "-")] = c
            lowers.discard(c)
            uppers.discard(c)
        else:
            unshortened.append(name)
    # pass two: try matching swapped cases for unshortened params
    for name in unshortened:
        c = "-" + name[0].swapcase()
        if c in lowers | uppers:
            short["--" + name.replace("_", "-")] = c
            lowers.discard(c)
            uppers.discard(c)
    return short

filter_kwargs is also pretty simple, but it relies on some of the internal workings of defopt, so it makes a better candidate for an enhancement to defopt rather than a stand-alone tool:

import defopt

def filter_kwargs(argv, func, short=None):
    """
    pre-filter the given argument vector such that only valid arguments remain
    return the invalid arguments as a dictionary of pairs.

    Args:
        argv (List[str]):
        func (callable):
        short (Dict[str, str]):

    Returns (Dict[str, str]):
    """
    short = short or {}
    parser = defopt.ArgumentParser(formatter_class=defopt._Formatter)
    parser.set_defaults()
    defopt._populate_parser(func, parser, None, short)
    _, unkonwns = parser.parse_known_args(argv)
    if len(unkonwns) % 2:
        raise ValueError(
            "odd number of remaining args preventing proper key-value pairing: %s", (unknowns,))
    kwargs = {k.lstrip("-").replace("-", "_"): v
              for k, v in zip(unkonwns[::2], unkonwns[1::2])}
    for arg in unkonwns:
        argv.remove(arg)
    return kwargs

In order to attempt casting kwargs to reasonable types, I also use a helper function: yamlize. It converts the Dict[str, str] to a yaml-loadable representation so that the yaml parser can guess the types:

import yaml

def yamlize(d):
    """
    Treat a dict mapping strings to strings as though it was defined in a yaml file so that the
    values can be converted to objects according to the yaml parser.

    Args:
        d (Dict[str, str]): a dict of mapping keys to serialized values

    Returns (Dic[str, Any]): the yaml-interpreted version of d

    Example:
        >>> d = {"a": "123", "b": "hello"}
        >>> yd = yamlize(d)
        >>> assert isinstance(yd['a'], int)
        >>> assert isinstance(yd['b'], str)

    """
    s = "\n".join("%s: %s" % i for i in d.items())
    return yaml.load(s)

Finally, I noticed that keyword-only arguments that don't have a default value are treated as positional arguments. I think that's a bug.

Anyway, I hope that all makes some sense, clears up my use case, and maybe gives you some ideas about a decent way to handle kwargs. This is a cool project. It's a joy to use. Thanks for your awesome tool!

@AbeDillon
Copy link

AbeDillon commented Jun 9, 2016

A similar approach to your proposed design could be:

import script

widget = script.Script()  # can be used for logging and config management, but separate from main

def main(a, *b, c, d=1, **e):
    """..."""
    widget.log.info("starting main")
    host = widget.get("host", "localhost")
    ...do stuff

if __name__ == "__main__":
    # widget has a configure method which will add params to the parser (e.g. '--config-path')
    # it'll throw a ParameterConflict exception if the params conflict with other mixins or functions
    defopt.run(main, mixins=[widget.configure])

@evanunderscore
Copy link
Collaborator Author

evanunderscore commented Jun 9, 2016

Thanks for the detailed comments! I've just skimmed them for the moment but I'll come back and have a proper look in the next few days.

I've opened #13 for keyword-only arguments without defaults. I had no idea you could even do that, so it's a pure accident that it works the way it does currently.

@evanunderscore
Copy link
Collaborator Author

The reason I initially didn't want to automatically pick short flags is that adding a parameter to any of the called functions could then unintentionally steal a short flag from another argument. I like the idea of auto_short making this sort of behavior explicit. I'll think about how best to integrate this or some equivalent.

@AbeDillon
Copy link

I've thought about this a lot more. Most of the little things I suggested like auto_short are bad ideas. They might be useful as utilities or something, but in general they don't belong in the core implementation for the reasons you've stated.

I've also realized that argparse has pretty much exactly what I want. You can compose parsers by adding parents, add sub-parsers (though, the interface for this is a little clunky), and even allow abbreviations (if **kwargs support gets implemented it should probably raise an error if your parser allows abbreviations and kwargs at the same time). I would keep the defopt.run function pretty much the way it is, but change the implementation to use a subclass of argparse.ArgumentParser under the hood and allow access to those parsers.

It could look like so:

class Parser(ArgumentParser):

    def __init__(self, *args, **kwargs):
        # I would rename 'parsers' to 'deserializers' or something to avoid confusion
        self.deserializers = kwargs.pop("parsers", []) or kwargs.pop("deserializers", [])
        # I would rename 'short' to 'aliases' and allow it to map a name to an iterable of stings
        self.aliases = kwargs.pop("short", {}) or kwargs.pop("aliases", {})
        self.func = kwargs.pop("func", None)
        kwargs.setdefault("formatter_class", _Formatter)
        super(Parser, self).__init__(*args, **kwargs)
        # handle to subparsers object if one is created
        self._subs = None
        if self.func is not None:
             self.populate(self.func)

    def populate(self, func):  # replaces _populate_parser
        _populate_parser(func, self, self.deserializers, self.aliases)

    def add_argument(self, *args, **kwargs):  # replaces _add_argument
        # it may be a good idea to allow a deserializer here to override the default deserializer
        name = kwargs.pop("name")
        positional = kwargs.pop("positional", False)
        _add_argument(self, name=name, short=self.aliases, _positional=positional)

    def add_subparser(self, func):
        if self._subparsers is None:
            self._subs = self.add_subparsers(title="subcommands")
        if isinstance(func, ArgumentParser):
            raise NotImplementedError("there should be a way to add a parser object to subparsers")
        else:
            subparser = self._subs.add_parser(
                func.__name__,
                formatter_class=self.formatter_class,
                deserializers=self.deserializers,
                aliases=self.aliases,
                func=func)
            subparser.set_defaults(_func=func)

    def parse_ak(self, argv=sys.argv[1:]):
         "return (args, kwargs) instead of a namespace?"
         arg_namespace, unknowns = self.parse_known_args(argv)
         # s

         if not all(a[0] in self.prefix_chars for a in unknowns):
             raise Exception()
         if len(unknowns) % 2:
             raise Exception()
         kwargs = dict(zip(unknowns[::2], unknowns[1::2]))
         # move all non-positional arguments from arg_namespace to kwargs, the rest go to args
         raise NotImplementedError()
         # deserialize arguments
         raise NotImplementedError()
         return args, kwargs

    def run(self, argv=sys.argv[1:]):
         args, kwargs = self.parse_ak(argv)
         # run any parent parsers
         raise NotImplementedError("don't know how exactly to do this")
         # run self.func if it exists else kwargs["_func"]
         func = self.func or kwargs.pop("_func", None)
         if func is None:
             raise Exception("no function to run")
         return func(*args, **kwargs)

That's all I've got so far. I'm not 100% how parents works with an ArgumentParser, it looks like they're added as container_actions (not sure what that means).

I can try to implement this in a feature branch and so that it's a little more fully baked.

@evanunderscore
Copy link
Collaborator Author

Do you have some examples in mind for how this would be used and what the resulting command line would look like?

@evanunderscore
Copy link
Collaborator Author

I think the main problem I have with zipping up the extra arguments into **kwargs is that it's difficult to emulate argument parsing in exactly the same way argparse does it. For example, does --arg1 --arg2 become {'arg1': '--arg2'} or does it raise an error? I can imagine that getting fairly fiddly and I'm not yet convinced much is gained from it.

@evanunderscore
Copy link
Collaborator Author

Having said that, I do like the idea of having a class to allow for more tweaking of the internals that you're after. Then it might be relatively simple to add **kwargs handling for your own project.

If you do end up working on this in a branch, keep me updated.

@AbeDillon
Copy link

Sorry it took me so long to get back to you. My thought is that you don't have to make a decision about how specifically to handle *_kwargs now or really ever. It would be nice to have a way to add a *_kwargs handler to a parser. The interface for a **kwargs handler is pretty simple: take a list of strings and return a keyword arguments dictionary. Argument parser has a function to produce that list of strings: arg_namespace, unknowns = self.parse_known_args(argv)

If there is a kwargs_handler and the function has kwargs argument, pass unknowns into the handler and unpack the result into the function call. Otherwise raise an error.

I will for sure keep you updated on any work I do on defopt.

@AbeDillon
Copy link

I'll work on a branch that implements some of the things i'm talking about so it's more clear

@evanunderscore evanunderscore changed the title Support command composition (concept) Support command composition Feb 23, 2017
@anntzer
Copy link
Owner

anntzer commented Jan 2, 2020

Adding an unknown_args_handler as suggested in #7 (comment) seems reasonable, e.g. along the lines of

--- i/lib/defopt.py
+++ w/lib/defopt.py
@@ -154,7 +154,11 @@ def run(funcs: Union[Callable, List[Callable]], *,
         show_defaults=show_defaults, show_types=show_types, version=version,
         argparse_kwargs=argparse_kwargs)
     with _colorama_text():
-        args = parser.parse_args(argv)
+        if unknown_args_handler:
+            args, rest = parser.parse_known_args(argv)
+            args = unknown_args_handler(args, rest)
+        else:
+            args = parser.parse_args(argv)
     # Workaround for http://bugs.python.org/issue9253#msg186387
     if not hasattr(args, '_func'):
         parser.error('too few arguments')

although this likely needs kwargs support (#3, or rather at least allow them to be ignored on the CLI side) to be really useful (so that the handler can e.g. add more kwargs).

I'll leave the implementation to anyone who has a real use for this, though.

@anntzer
Copy link
Owner

anntzer commented Apr 24, 2023

I'm going to consider that this is effectively provided by defopt.bind_known.

@anntzer anntzer closed this as completed Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants