Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-22940][SQL-CLIENT] Make sql client column max width configurable #16245

Conversation

sv3ndk
Copy link
Contributor

@sv3ndk sv3ndk commented Jun 22, 2021

What is the purpose of the change

As a resolution for FLINK-22940, this PR aims at replacing the hard-coded maximum display width currently used by the SQL client with a configurable one.

In the linked ticked we discussed allowing the user to set the max width as unlimited, but on second thought I find is hard to implement since the resulting data may not be materialized => we do not know the maximum width required by the data. I suggest we do not implement this as part of this ticket.

Brief change log

  • Aligned the column width strategy across all modes
    In tableau mode, the column width is computed by PrintUtils.columnWidthsByType() and depends on the column type (e.g. 10 for a DATE).
    In Table and Changelog modes however, all columns defaulted to MAX_COLUMN_WIDTH, as provided by computeColumnWidth(int idx), which lead to waste of screen space, inconsistent end-user experience and harder code to maintain.
    In order to use the same logic for all modes, I removed the computeColumnWidth(int idx) methods and replaced them
    with an initialization of the column widths in the constructor that relies on the same PrintUtils.columnWidthsByType() method as the Tableau mode.

  • Added a new option sql-client.display.max_column_width to SqlClientOptions
    I defined the default value as 30, which is the current hard-coded MAX_COLUMN_WIDTH

  • Added an instance of ReadableConfig as a member of ResultDescriptor
    ResultDescriptor already has 2 methods providing access to configuration: isTableauMode() and isStreamingMode(), that were previously initialized explicitly by the caller.
    => in order to simplify the initialization of ResultDescriptor and allow access to all configuration keys, I placed an instance of ReadableConfig directly inside it.
    Since this class is read-only, I made it public for simplicity.

  • Updated the view result in all modes to use the new parameter
    In any call to PrintUtils.columnWidthsByType() from within the SQL client, the logic is now depending on the new config parameter, obtained via the ResultDescriptor instance.

  • Updated documentation
    I updated the example of display for each mode in the documentation + re-organized the structure a bit in order to move all configuration-related aspects in the "Configuration" section.

  • Fixes a few deprecation warnings
    Some unit tests were using org.junit.Assert.assertThat, which is now deprecated in favour of org.hamcrest.MatcherAssert.assertThat => I replaced.

Verifying this change

See the 2 videos, shoing the old behavior (including inconsistencies of column width across modes) and the new behavior (including usage of the new parameter).

old_behavior.mp4
new_behavior.mp4

I also validated the existing tests and style with:

mvn spotless:apply install -pl flink-table/flink-sql-client

And validated the documentation updates with

hugo -b "" serve

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector:no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs / JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 22, 2021

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit c333291 (Thu Sep 23 17:51:49 UTC 2021)

Warnings:

  • Documentation files were touched, but no docs/content.zh/ files: Update Chinese documentation or file Jira ticket.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 22, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@pensz
Copy link
Contributor

pensz commented Jun 25, 2021

I think this PR contains too many changes...

@@ -85,16 +94,6 @@ public CliChangelogResultView(CliClient client, ResultDescriptor resultDescripto
return Arrays.copyOfRange(resultRow, 1, resultRow.length);
}

@Override
protected int computeColumnWidth(int idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To optimize changelog table result view ,I think computeColumnWIdth with following code will be better.

protected int computeColumnWidth(int idx) {
     PrintUtils.columnWidthByType(column, ....)
}

Another benefit is that constructor of CliResultView is no need to change, subclass of CliResultView can implement computeColumnWIdth differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pensz .

The PR is 200 lines indeed. About half of it is documentation though, and the previous behavior of column width was implemented differently for all 3 modes, so one factor that made the PR grow is my change to make them all depend on the same logic.

The computeColumnWidth(int idx) you refer to would be called for every row, which would be marginally slower I think. In this PR the width are now initialized only once, in the constructor, which seems a natural place for performing initialization.

IMHO, adding methods like protected int computeColumnWidth(int idx) adds complexity, since it contributes to making CliResultView more different than the tableau mode, and it adds a degree of freedom to the code which might not add that much value, since using the same column width policy everywhere increase the user experience I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @sv3ndk 's point here. computeColumnWidth is not a good design before, because tableau mode doesn't use it. ResultDescriptor is a good place to put the initialized column widths and make all modes to have the same getting column width logic.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution and nice videos @sv3ndk !

The changes looks good in general. I think we need to add a test to verify it. You can add a test in flink-table/flink-sql-client/src/test/resources/sql/select.q to change max width using SET command, and execute SELECT again.

private final boolean isTableauMode;

private final boolean isStreamingMode;
public final ReadableConfig config;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a public int getMaxColumnWidth() method just like isStreamingMode, so that we don't need to expose the whole config. And users don't need to remember the config name when using, e.g. resultDescriptor.config.get(DISPLAY_MAX_COLUMN_WIDTH).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very kind and useful feed-back @wuchong :)

I'll create the getMaxColumnWidth() as you described and hide the config as a private field.


@Documentation.TableOption(execMode = Documentation.ExecMode.BATCH_STREAMING)
public static final ConfigOption<Integer> DISPLAY_MAX_COLUMN_WIDTH =
ConfigOptions.key("sql-client.display.max_column_width")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to use sql-client.display.max-column-width, Flink configuration doesn't use _ as separator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed, I'm not respecting the convention here, I'll fix it.

This query performs a bounded word count example.

In *changelog mode*, the visualized changelog should be similar to:
The documentation of the SQL client commands can be accessed as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not easy to maintain the HELP list (it's not complete too), would be better to just add a link to SQL page dev/table/sql/overview/ where shows all supported statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll remove the output of the HELP command and add the link you mentioned.
Maybe we still keep the mention of the HELP command though? Even incomplete I find it's useful to know it exists I think.
Would it be useful if I opened another jira ticket for updating the output HELP and make it accurate? For example I learned recently the existence of ADD JAR and SHOW JAR that seems useful to know but are not documented yet AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an pending PR for this #16060. Would be great to help to review the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great! I'll have a look at that one and review it.

name cnt
Alice 1
Greg 1
Bob 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show this display example using tableau mode which I think the community is tend to make it as default mode in the future. The tableau mode is also more readable than table mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea.
I'll also set the execution mode to batch, to avoid showing the +I -U and +U to keep first example as simple as possible

@sv3ndk
Copy link
Contributor Author

sv3ndk commented Jun 25, 2021

@wuchong , I added the unit test you suggested in select.q.
After doing so I'm realizing the truncation logic works as expected but might be a bit confusing to end-users: if some value is wider that the configured max width, if its type has variable length (e.g. String) it gets truncated, whereas if it has a fixed length (e.g. Timestamp) the width of the type is used instead.
This logic is currently implemented in PrintUtils.columnWidthsByType, which I've not updated.
Maybe I should update it to make sure the max width is applied to all types? It would impact the SQL client as well as other places of Flink that depend on that class.
What's your opinion on this?

@wuchong
Copy link
Member

wuchong commented Jun 26, 2021

@sv3ndk , this is as expected. This config option should only affect variable-length types in streaming mode to control the displayed characters. Fixed-length types and all types in batch mode should be able to be displayed in a deterministic column width.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is a good shape and can be merged now. It depends on you whether to also mention the affect scope of this new config option, e.g. varaible-length types in streaming mode.

@wuchong
Copy link
Member

wuchong commented Jun 26, 2021

Besides, a side note, please do not use "git merge" to update branches, otherwise it's hard to track the commit changes. Please use "git rebase" instead. IntelliJ IDEA provide an easy tool to do git rebase, you can find the tool via VCS -> Git -> Rebase.

@sv3ndk
Copy link
Contributor Author

sv3ndk commented Jun 26, 2021

I added one last commit clarifying the documentation of the new option as you suggested and decorated it with @Documentation.TableOption(execMode = Documentation.ExecMode.STREAMING) instead of BATCH_STREAMING.

I'm taking note rebase is preferred over merge, sure, I'll use that in the future then.

As far as I know I think as well the PR is ready to be merged, thanks again for the review and all the feed-back @wuchong :)

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wuchong wuchong merged commit 84e1186 into apache:master Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants