Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

Command-line interface #783

Closed
buckley-w-david opened this issue Apr 24, 2018 · 7 comments
Closed

Command-line interface #783

buckley-w-david opened this issue Apr 24, 2018 · 7 comments

Comments

@buckley-w-david
Copy link
Contributor

As discussed in #775 we're in agreement that the CLI can/should be changed.

I suggest an interface along this line:

Usage: data [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  download
  process
  run
  update
  upload

---

Usage: data download [OPTIONS]

Options:
  --help  Show this message and exit.

Equivilant to the current data.update --just-download


---

Usage: data update [OPTIONS] [SCAN_ARGS]...

Options:
  --scan [skip|download|here]
  --gather [skip|here]
  --help                       Show this message and exit.

Just does the scan and gather operations, without any processing.

---

Usage: data process [OPTIONS]

Options:
  --date TEXT
  --help       Show this message and exit.

Just does the processing operations, assumes update data is in place (does a sanity check before doing anything)

--

Usage: data upload [OPTIONS]

Options:
  --date TEXT
  --help       Show this message and exit.

Just uploads the data to s3, assumes update data is in place (does a sanity check before doing anything)

---

Usage: data run [OPTIONS] [SCAN_ARGS]...

Options:
  --date TEXT
  --scan [skip|download|here]
  --gather [skip|here]
  --upload
  --help                       Show this message and exit.

Almost equivalent to the current interface, will do scan, gather, processing, and uploading. It is just a composition of the other commands.

---

With that configuration the update command passes along SCAN_ARGS to the scan and gather sections of update, formatted so that it is equivalent to the current options dictionary that they receive now.
As it is there it would accept anything, but I would like that to eventually be replaced by a concrete list of options that it can pass along. Maybe we don't even have this intermediate step of passing along what we are given and just go right to that concrete list. Thoughts?

Let me know what you think, if anything needs changing.

@konklone
Copy link
Contributor

I think this is a great proposal! A couple small thoughts:

  • It seems like data is a little generic in this context. Maybe pulse as a command name instead?
  • I don't see a lot of value to whitelisting the options it can accept or making them concrete, unless it's a really lightweight process. I would like to avoid adding a lot of argument parsing/cleaning code.

@buckley-w-david
Copy link
Contributor Author

pulse as a command name makes sense to me, it was just data since that's what the package is called right now.

On that subject for the same reason that package name can/should also change. Perhaps to the same name.

@buckley-w-david
Copy link
Contributor Author

As a counter to your comment on whitelisting, as the code is the whitelisting is already taking place, just in the command generation process.

. . .
# Allow some options passed to python -m data.update to go
# through to domain-scan.
# Boolean flags.
for flag in ["cache", "serial", "lambda"]:
    value = options.get(flag)
    if value:
        full_command += ["--%s" % flag]
. . .

Those sections are the existing whitelist, we would just transfer that processing into one central place, and be able to pass concrete lists of arguments instead of these options dictionaries.

Now to play devils advocate to my own argument, I can see some value in making suboptions generic and passed around "as is" in the case of arguments meant to processing once it is refactored into a more general form with "processors" that may have their own arguments, but I think that the scan and gather process is different and will remain mostly static with arguments known at time of writing.

@konklone
Copy link
Contributor

I'd love to get rid of that code sample you included too! ;) It's essentially there because those 3 flags are just pass-throughs to domain-scan, rather than Pulse-specific flags. Maybe there's some better way to effect that pass-through.

I'm not going to be strongly opinionated on this stuff, only that we avoid any significant argument parsing code. We recently worked on adding some argument parsing/whitelisting code to domain-scan itself, and the code ballooned substantially and created some instability for a time as we found regressions. I'd rather have things documented clearly and have the interface feel rational and unix-style, and am less excited about making the argument processing itself substantially more complex.

@buckley-w-david
Copy link
Contributor Author

Argument parsing is of course no fun.
Have you heard of/used the click library? It's a really easy to use CLI library that I personally really like (and would be a proponent of using for this).

The (mostly) complete initial implementation of the above proposal using it is in a PR on our fork right now https://github.com/cds-snc/pulse/pull/3/files, take a look if you're curious.

@konklone
Copy link
Contributor

click looks great, I'd be happy to give it a go.

@gbinal
Copy link
Member

gbinal commented Jun 9, 2020

Thank you for your contributions to this project over the year. The website has been decommissioned, so I'm going to go ahead and close this issue.

@gbinal gbinal closed this as completed Jun 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants