Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show progress of CREATE INDEX #113

Open
apstndb opened this issue Apr 14, 2021 · 21 comments
Open

Show progress of CREATE INDEX #113

apstndb opened this issue Apr 14, 2021 · 21 comments

Comments

@apstndb
Copy link
Collaborator

apstndb commented Apr 14, 2021

Currently, spanner-cli doesn't show anything until CREATE INDEX is done.

Recently, Cloud Spanner supports tracking of the progress of index backfill operations.
https://cloud.google.com/spanner/docs/release-notes#April_06_2021

You can now track the progress of long-running index backfill operations through the gcloud command line tool, REST API, and RPC API. For more information, see Checking the progress of a secondary index backfill.

So we can show progress of index creation using polling.

Consideration

  • How to visualize progress?
    • Bar or percent?
    • With elapsed time
  • Multiple DDL statements
@yfuruyama
Copy link
Collaborator

Thanks for filing the feature request. I was thinking of the same thing 👍

Bar or percent?

I prefer percent for simplicity, but basically both are fine. Besides the percent, it would be better to show the elapsed time to make it possible to guess when the backfill will finish.

Multiple DDL statements

I think the most time consuming task for updating DDL would be index backfilling, so it might make sense to show the index backfill progress even if it contains other DDL statements.

Other consideration is whether we show index backfill progress when it's running as batch mode. If it shows the progress, I think spanner-cli would be the good alternative to gcloud spanner databases ddl update for users.

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 14, 2021

Blocker:
UpdateDatabaseDdlMetadata.progress is not yet added into UpdateDatabaseDdlMetadata struct.

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

progress has added.
googleapis/go-genproto@9228244

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

I don't have a concrete idea yet for the actual implementation in spanner-cli but I prototyped some output formats using the real UpdateDatabaseDdlMetadata.

Note: The elapsed time may rewind because the final end_time can be older than the previous call.

Rewriting a single line

render1618470581582

Rewriting multiple lines

render1618470556966

Non-rewriting multiple lines(for non-tty output)

render1618470531746

@yfuruyama
Copy link
Collaborator

Thanks for creating some prototypes. These are great resources to think about an appropriate format.

Rewriting multiple lines

Personally I prefer this one, but if it's hard to implement this, I think it's adequate to use "Rewriting a single line" pattern.

Non-rewriting multiple lines(for non-tty output)

I think it's common to see this pattern in some build logs like CircleCI's or Cloud Build's when we try to rewrite the output line. I'm not sure, but can't we rewrite the line if we run spanner-cli as a batch mode?

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

Rewriting multiple lines

Personally I prefer this one, but if it's hard to implement this, I think it's adequate to use "Rewriting a single line" pattern.

I think it isn't so hard but I didn't yet think the change to program structure and portable implementation of rewriting(like Windows support).

I think it's common to see this pattern in some build logs like CircleCI's or Cloud Build's when we try to rewrite the output line. I'm not sure, but can't we rewrite the line if we run spanner-cli as a batch mode?

I think ordinal executions of the batch mode in a terminal is suited for rewriting multiple-line format.
(Actually, the interactive mode doesn't support batch execution of DDL statements.)

I have experienced the case that some CI build log has broken(including unprocessed escape sequences) because the execution environment is tty but build log doesn't support escape sequences.
So I think it may be better to support a flag to force non-rewriting format.
(cmd < /dev/null may be a workaround.)

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

Consideration about "non-rewriting multiple lines":

  • Output when:
    • periodic even if there are no changes.
    • a status for each statement has updated.
      • end_time or progress_percent
    • a statement has finished.

@yfuruyama
Copy link
Collaborator

yfuruyama commented Apr 15, 2021

I think it isn't so hard but I didn't yet think the change to program structure and portable implementation of rewriting(like Windows support).

I think it's better to use a well-tested library to rewrite the line. One of the examples is gosuri/uilive which supports windows.

I have experienced the case that some CI build log has broken(including unprocessed escape sequences) because the execution environment is tty but build log doesn't support escape sequences.
So I think it may be better to support a flag to force non-rewriting format.

I'm not completely sure, but can we say it's a rare case? If it happens on the major CI providers like CircleCI or Cloud Build, it might be better to add such a flag, but if it's not, I don't think adding more flag is a good option because there is a workaround. Please correct me if my assumption is wrong.

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

I'm not completely sure, but can we say it's a rare case? If it happens on the major CI providers like CircleCI or Cloud Build, it might be better to add such a flag, but if it's not, I don't think adding more flag is a good option because there is a workaround. Please correct me if my assumption is wrong.

Yeah, I think it is a major case.

Test repo
https://github.com/apstndb/circleci-sandbox

CircleCI

CircleCI seems to completely ignore unsupported ANSI escape code(no handling).

version: 2
jobs:
  build:
    docker:
      - image: ubuntu
    steps:
      - run: for i in $(seq 1 10); do echo "\033[F\033[2K${i}" && sleep 1; done

image

Cloud Build

Cloud Build doesn't support to rewrite and only strip ANSI escape code.

steps:
    - name: ubuntu
      entrypoint: /bin/sh
      args:
          - -c
          - |
              for i in $(seq 1 10); do echo "\033[F\033[2K${i}" && sleep 1; done

image

Refs:

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

I think many tools see CI environment variable to switch behavior but Cloud Build doesn't populate that variable..

@yfuruyama
Copy link
Collaborator

Thank you for quickly trying out on some CI providers!

I think many tools see CI environment variable to switch behavior but Cloud Build doesn't populate that variable..

Ah that's a good idea. Even though we have to add some logic in ad hoc way, that looks better than providing a new flag.

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

Little correction:
The case of CircleCI seems wrong because echo of bash doesn't print the escape character without -e option.

image

It is similar result with Cloud Build's one.

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

Related discussion in CircleCI Discuss.

I think format like "Rewriting multiple lines" will be broken in major CI and file redirection so non-rewriting mode is needed but flag is not mandatory if some workaround is documented.(cmd < /dev/null, cmd | tee, cmd | cat, etc.)

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 15, 2021

Another choice:

  • At the beginning, show progress only if it runs on tty regardless interactive mode or batch mode.
    • It will be silent on non-tty as same as current behavior.

@yfuruyama
Copy link
Collaborator

At the beginning, show progress only if it runs on tty regardless interactive mode or batch mode.
It will be silent on non-tty as same as current behavior.

Yeah that would be a viable option. But I'm wondering the use case where users need to run CREATE INDEX on the CI build system and check the progress of index backfilling in real time.

To cover that use case, what about always rewriting the lines and documenting the workaround for users who don't want that behavior?

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 16, 2021

To cover that use case, what about always rewriting the lines and documenting the workaround for users who don't want that behavior?

Does "always" means regardless of whether or not on tty?
I think if there is no alternative mode, users can't do any workaround because I think the workaround is to force it into non-tty(non-rewriting format) mode.

@yfuruyama
Copy link
Collaborator

If I understand correctly the results of your experiments on CircleCI and Cloud Build, both CI systems just ignores the escape character for erasing the line. So if spanner-cli attempts to rewrite the line on those environments, it would show the following long listing outputs:

 0% CREATE INDEX xxxx : elapsed_time: 10s
 5% CREATE INDEX xxxx : elapsed_time: 15s
10% CREATE INDEX xxxx : elapsed_time: 20s
...

So I wanted to mean if a user doesn't want to be bothered with long listing outputs on non-tty environment like the preceding example, then I think they can suppress the output with > /dev/null; it becomes the same behavior as the current's.

One thing I don't fully understand is how turning the stdin/stdout to non-tty mode becomes the workaround for users who bothers with the broken log output (escape character is shown as it is) or long listing output like the above example. Could you elaborate on that?

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 16, 2021

If I understand correctly the results of your experiments on CircleCI and Cloud Build, both CI systems just ignores the escape character for erasing the line. So if spanner-cli attempts to rewrite the line on those environments, it would show the following long listing outputs:

I think rewriting multiple-line output format isn't suitable without rewriting because it is multiple-lines and short interval update of elapsed_time even if there are no update in operation status.

If there are 4 statements in batch, output will be shown as below.

  0% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
  0% DROP INDEX TestIndex11: elapsed_time: 0s
  0% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
  0% DROP INDEX TestIndex12: elapsed_time: 0s
  0% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 5s
  0% DROP INDEX TestIndex11: elapsed_time: 0s
  0% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
  0% DROP INDEX TestIndex12: elapsed_time: 0s
  0% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 10s
  0% DROP INDEX TestIndex11: elapsed_time: 0s
  0% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
  0% DROP INDEX TestIndex12: elapsed_time: 0s
 10% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 15s
  0% DROP INDEX TestIndex11: elapsed_time: 0s
  0% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
  0% DROP INDEX TestIndex12: elapsed_time: 0s
...
100% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 1h
100% DROP INDEX TestIndex11: elapsed_time: 1h
100% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 1h
  0% DROP INDEX TestIndex12: elapsed_time: 59m55s
100% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 1h
100% DROP INDEX TestIndex11: elapsed_time: 1h
100% CREATE INDEX TestIndex12 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 1h
100% DROP INDEX TestIndex12: elapsed_time: 1h

(progressPercent was not granular at least my case.)

There may be some improvement(like separator) but basically it emit huge sized logs(in above format, <num of statements> × <total elapsed time> / <interval> lines).
I think these huge output is not well-readable and not suitable to some CI service logs viewer.

Non-rewriting output mode can resolve problems.

  • Can print only the current executed statement at once.
  • Can print only when a status of the statement is updated.
    • or longer interval for non-rewriting mode.

Below is the case of printing current statement when status changed or 10m interval.

1/4   0% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 0s
1/4  10% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 15s
...
1/4 100% CREATE INDEX TestIndex11 ON Albums(AlbumTitle, MarketingBudget): elapsed_time: 1h
2/4   0% DROP INDEX TestIndex11: elapsed_time: 0s
2/4   0% DROP INDEX TestIndex11: elapsed_time: 10m
...
2/4 100% DROP INDEX TestIndex11: elapsed_time: 1h
...
4/4 100% DROP INDEX TestIndex12: elapsed_time: 1h

@yfuruyama
Copy link
Collaborator

That makes sense. Thanks for describing the disadvantage of rewriting the lines on CI systems 👍

Okay, for the simplicity, how about using he following two different modes for showing the index backfill progress?

  1. Rewrite lines in real time; this is for users who uses spanner-cli with interactive mode (assuming that it has a tty).
  2. Append lines whenever the progress is changed or some time passed; this is for users who uses spanner-cli with batch mode (it may or may not have a tty).

@apstndb
Copy link
Collaborator Author

apstndb commented Apr 16, 2021

  1. Rewrite lines in real time; this is for users who uses spanner-cli with interactive mode (assuming that it has a tty).
  2. Append lines whenever the progress is changed or some time passed; this is for users who uses spanner-cli with batch mode (it may or may not have a tty).

Interactive mode doesn't support batch execution of multiple statements so it means:

  1. (Interactive mode) Rewrite a progress for single statement in realtime(short interval).
  2. (Batch mode) Append a line of progressing statement(triggered by status change or longer interval).

@yfuruyama
Copy link
Collaborator

That looks good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants