Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

claycli discovery #46

Closed
nelsonpecora opened this issue Jan 31, 2018 · 23 comments
Closed

claycli discovery #46

nelsonpecora opened this issue Jan 31, 2018 · 23 comments
Assignees
Labels

Comments

@nelsonpecora
Copy link
Contributor

  • investigate and document current workflows using ad-hoc / scratch-cli scripts
  • establish use cases that are currently not being met by claycli
@nelsonpecora nelsonpecora self-assigned this Jan 31, 2018
@nelsonpecora
Copy link
Contributor Author

Per @amycheng:

these are the reasons why i dont use clay-cli 100% of the time:

  • lack of site-to-site import
  • messes up page import (see github issue)
  • doesn't publish the page(s) I just imported (edited)
  • doesn't do ambrose imports

@nelsonpecora
Copy link
Contributor Author

nelsonpecora commented Feb 5, 2018

After reviewing various custom scripts, the old claycli github issues, and interviews with devs, this looks like the main feature set claycli needs to gain widespread mindshare:

  • import/export
    • automatically handle unpublished pages' url property (should be customUrl, and should be prefixed correctly)
    • test claycli import -f against all current first-run bootstraps
    • don't import layout automatically when importing page(s)
    • allow auto-publishing (via flag) when importing page(s)
      • note: should use original publish date + url
    • multi-page import (using pages index)
      • allow batched publishing (using auto-publish flag)
      • limit/offset/query flags
      • lists/users flags
    • multi-site export
      • limit/offset/query flags
      • lists/users flags
  • programmatic api
  • create (not yet, wait for styles)
  • clone (remove)
  • --page/--component--url for individual item, should work with users/lists/pages/components
  • better logging (using clay-log)

The Way Forward

  • Sketch out a redesigned API for claycli
  • remove dead code / docs (create, clone)
  • rewrite utils based on Chris's PR (incl. logging)
  • add programmatic hooks
  • build + test updates to single-url import
  • build + test updates to single-url export
  • build + test multi-page import
  • build + test multi-page export
  • build + test CQ importer
  • build + test Wordpress importer

@nelsonpecora
Copy link
Contributor Author

nelsonpecora commented Feb 5, 2018

Arguments

  • --key, -k api key or alias
  • --source-key, -K source api key, used when doing multi-page import from a site (thus, querying the source's pages index) if the site requires a different key than the --key
  • --site, -s site prefix or alias
  • --file, -f file path or alias
  • --version, -v print version and exit
  • --verbose, -V debug mode
  • --help, -h print help and exit
  • --url, -u import, export, or lint specific url (component, page, or public url)
  • --users, -U import or export users
  • --lists, -L import or export lists
  • --limit, -l set number of most recent pages to import/export
  • --offset, -o set offset of most recent pages to import/export
  • --query, -q path to JSON/YAML elasticsearch query to run against pages index (rather than fetching latest w/ limit + offset)
  • --publish, -p boolean to publish imported page(s) / component(s)
  • --dry-run, -n list effects of config/touch/import/export that would be performed without doing the action
  • --force, -F when importing pages, overwrite layouts with imported data. when exporting, overwrite file if it exists
  • --recursive, -r recursively lint children when linting
  • --amphora-legacy, -a use non-underscored api routes (applies like --key)
  • --source-amphora-legacy, -A use non-underscored api routes (applies like --source-key)

Commands

config -k|s|f [-n] <name> [value]
touch [-n] <url>
import [-u|s|f] (or stdin) [-kKaA] [-ULpnF] [-loq] <destination site>
export -u|s [-ka] [-ULnF] [-loq] <destination path> (or stdout) 
lint [-u|f] (or stdin) [-r]

@amycheng
Copy link
Contributor

amycheng commented Feb 5, 2018

What are the use cases of the --site argument. I don't find myself using the site prefix a lot while importing and exporting data, because the tool as it is right now, can derive the site from the url argument.

With scratch-cli, I found that it was a hassle to add acceptable items to its sites list. This led to annoying number of errors where scratch told me the site I passed in wasn't accepted.

@nelsonpecora
Copy link
Contributor Author

nelsonpecora commented Feb 5, 2018

that’s for, say, importing or exporting multiple pages

e.g. clay import -s di-prod -l 10 -K prod -k local di-local means “import the latest 10 pages from prod DI into my local DI (both set as aliases via clay config)”

e.g. clay export -s localhost:3001/selectall -l 0 -U path/to/user-backup.yml would be "export only the users (limit 0 pages) from my local selectall site to user-backup.yml"

@amycheng
Copy link
Contributor

amycheng commented Feb 5, 2018

setting the sites alias in an external config is great idea. It was a hassle to maintain that list in scratch-cli

@nelsonpecora
Copy link
Contributor Author

yuuup. The config is one of my better ideas 🙃

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

This may be outside the scope of this discovery, but I think we can use this opportunity to also rejigger import and export slightly to make them a little more straightforward. Right now, they have too many options, many of which are nonsensical in certain combinations, and which demand too much rote memorization and steepen the learning curve. I recommend we simplify both in two ways:

  • Keep them strictly separated. export writes asset data to stdout and import simply expects it from stdin. import should not be concerned with pulling data out from anywhere; it should only be concerned with putting it somewhere. This is similar to how other data migration tools work — e.g. mongorestore doesn't also do a mongodump.
  • Automatically infer asset type from a specified url instead of providing an option for each. I should just be able to do clay export foo.com/pages to get many pages, clay export foo.com/pages/1 to get one page, clay export foo.com for the whole site, clay export foo.com/components/a/instances/1 for a single cmpt instance, clay export foo.com/users for all users, etc.

Examples:

  • clay export foo.com/pages/1 > bootstrap.yml Export single page to file
  • clay import bar.com < bootstrap.yml Import from file
  • clay export foo.com/users | clay import bar.com copy users from foo.com to bar.com

@amycheng
Copy link
Contributor

amycheng commented Feb 6, 2018

maybe clay-cli can accept an opts file like mocha.opts. It just sweeps the arguments under the rug but at least the dev will only have to pass in one argument instead of several?

This has the added benefits of devs sharing opts files and creating different opts files for different tasks.

clay import foo.com/pages -opts cut-import

@nelsonpecora
Copy link
Contributor Author

nelsonpecora commented Feb 6, 2018

A dispatch is a collection of components, pages, users, and other data that can be represented in an object. It uses prefix-free uris and is formatted as YAML (the written-out YAML files are known as bootstrap files). Dispatches can be exported from Clay installs, and imported to them. 3rd party exporters should create a dispatch that's sent to stdout, so it can be piped into claycli.

wordpress-export domain.com/blog | clay import my-clay-site

the passed through object looks like:

pages:
  index:
    customUrl: /
    main:
      - /_components/feed/instances/index
  post1:
    customUrl: /2017/first-post
    main:
      - /_components/article/instances/post1
components:
  feed:
    instances:
      index:
        query:
          match_all: {}
  article:
    instances:
      post1:
        title: First Post
        content:
          - _ref: /_components/paragraph/instances/1
          - _ref: /_components/paragraph/instances/2
  paragraph:
    instances:
      1:
        text: Lorem ipsum dolor sit amet
      2:
        text: consectetur adipisicing elit

config

config -k|s <alias> [value]

Set aliases for api keys and site prefixes

touch

touch <url>

Run GET requests against all instances of a specified component (parsed from the url provided)

import

import accepts a dispatch from stdin and sends it to a site. the dispatch may come from a file, importer, or another script

import domain.com < bootstrap.yml
cq-exporter | import domain-local -k local (site alias, apikey alias)
import domain-local -p < pages.yml (auto-publish pages)

export

export prints a dispatch to stdout. it may be used to pipe to a file, importer, or another script

export domain.com/_components/foo/instances/bar > foo-bar.yml
export domain.com/_components/foo > foo.yml
export domain.com/_pages/foo > foo.yml
export domain.com/_users > users.yml
export domain.com/_lists/tags > tags.yml
export domain.com/_lists > lists.yml
export domain.com/_users | import domain-local (import to site alias)

site prefix means search pages index (determine underscoring of routes afterwards)

export domain.com > pages.yml (100 pages, no layouts)
export domain-prod -l 5 -L > pages.yml (site alias, 5 pages, layouts)
export domain-prod -l 5 -o 10 > pages.yml (limit + offset)
export domain-prod -l 5 < path/to/query.yml > result.yml (custom query)

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

@nelsonpecora Wouldn't you have to collect all the data first to write the dispatch? That'll be too memory intensive.

@nelsonpecora
Copy link
Contributor Author

Hmm, yes, that's the balancing act (between memory and number of required api calls).

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

@nelsonpecora I don't see how this approach reduces the number of API calls unless we add some kind of "dispatch import" endpoint to Amphora. Is that what you're suggesting?

My thought was to have each line of clay export's stdout represent something that can be PUT into something else within one request. clay export foo.com/pages/1 would stream something like:

{"/pages/1": <page base data>}
{"/components/a/instances/1": <composed root-level cmpt data>}
{"/components/b/instances/1": <composed root-level cmpt data>}

And you could pipe this directly into the import command — it would just go line by line, issuing PUTs.

(This example above assumes cascading page PUTs don't work... I tried after you told me about them but could not get them to work — show me?)

If your concern is duplicate PUTs (e.g. a layout appearing a thousand times in a stream b/c a bunch of pages use it), we can use highland's uniqueBy method to prevent that.

@nelsonpecora
Copy link
Contributor Author

The problem with that format is that it's really brittle and not human readable, and cannot be used to interact with bootstrap files. The problem with bootstrap files is that they aren't composable (hence, memory hogs). The problem with smaller chunks is that they require too many API calls.

We need something better.

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

🤔 clay bootstrap myBootstrapDir | clay import foo.com

bootstrap reads the specified bootstrap file / dir and streams asset objects.

@nelsonpecora
Copy link
Contributor Author

@cperryk we currently allow both importing and exporting via files, and there are very strong use cases for both types of action.

option 1: we abandon piping to/from files and go back to using --file arguments for that, which would allow stdin/stdout to use that machine-readable format rather than a human-readable one

option 2: we figure out a compromise format that can be used by both machines (to stream export → import) and humans (to import/export from/to files)

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

Using export with an arrow would allow us to export data to files and import from files -- it just wouldn't be bootstrap format. It wouldn't be instantly readable, but it would allow us to "save" data to put somewhere later. Do we need the ability to export to bootstrap format? Have we ever needed to generate a bootstrap file from a site?

If we do need it, here's how it could work:

  • clay export foo.com/pages | clay toBootstrap bootstrap.yml
  • clay fromBootstrap someDir | clay import foo.com

@cperryk
Copy link
Contributor

cperryk commented Feb 6, 2018

We could also come up with some kind of compromise format and make export's streaming to stdout and import's reading from stdin more intelligent. e.g. for import there would be logic like "read lines until condition x is met when I know I have a complete asset, parse, import, continue to next line"

@nelsonpecora
Copy link
Contributor Author

nelsonpecora commented Feb 6, 2018

I think we've figured out the compromise solution:

--yaml, -y flag for both import and export, specify that the stdin/stdout format should be yaml (normalized bootstrappy objects) rather than the optimized-for-apis dispatch format. (also I propose we call these formats "bootstrap" and "dispatch", for clarity) We should also heavily discourage people from mucking with the dispatch format manually, since it's not human-friendly and they absolutely will muck it up.

clay import -y domain.com < bootstrap.yml
clay export -y domain.com/_pages/1 > backup.yml
clay import domain.com < db_dump.clay (newline-delineated, so not technically json)
clay export domain.com/users > user_dump.clay (or .txt, or no extension?)

@nelsonpecora
Copy link
Contributor Author

Hmm, this is as simple as I can make the api. @cperryk do you think there's a way to simplify the export api any more? (The --limit/--offset and query stuff sorta conflict)

Arguments

  • --key, -k api key or alias
  • --site, -s site prefix or alias
  • --version, -v print version and exit
  • --verbose, -V debug mode
  • --help, -h print help and exit
  • --limit, -l set number of most recent pages to export
  • --offset, -o set offset of most recent pages to export
  • --publish, -p boolean to publish pages/components while importing
  • --layouts, -L boolean to export pages' layouts (false by default)
  • --concurrency, -c run api calls concurrently
  • --yaml, -y handle yaml files in stdin/stdout (note: queries for exporting are always specified as yaml)

Commands

config (-k|-s) <alias> [value]
import [-k <apikey>] [-c <concurrency>] [-py] <site alias>
export [-k <apikey>] [-c <concurrency>] [-Ly] [-l <limit>] [-o <offset>] <url or site alias>
lint [-y] (<url>|stdin)

Exports Examples

Exporting single items

export domain.com/_components/foo/instances/bar
export domain.com/_pages/foo # page only, no layout
export -L domain.com/_pages/foo # page + layout
export domain.com/_lists/authors > authors.clay

Exporting multiple items

export -y domain.com/_users > users.yml
export -l 10 domain.com # latest 10 pages
export -l 10 -o 10 domain.com # next 10 pages
export domain.com < query.yml > dump.clay

@cperryk
Copy link
Contributor

cperryk commented Feb 13, 2018

Yes, this looks great! Some thoughts:

  • offset and limit should work with any export, e.g. export foo.com/users -l 10 should export 10 users.
  • lint foo.com/pages should lint all pages.
  • (re: off-thread conversations) --concurrency is important when doing big data migration (e.g. nymag's SoU import) and we should keep it

@nelsonpecora
Copy link
Contributor Author

per our discussions:

  • lint and export should accept a newline-delineated stream of uris (pages/components/etc), rather than a query
  • --limit and --offset will work on stdin if it exists, then fallback to usage against the pages index (if passed in a site for multi-page export)
  • concurrency will be defaulted to 10, but controlled via a CLAYCLI_CONCURRENCY env variable

@nelsonpecora
Copy link
Contributor Author

Note: All claycli env variables should begin with CLAYCLI_ to prevent naming collisions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants