Web console: Multi-stage query support #12919

vogievetsky · 2022-08-18T06:38:57Z

Do you ever use the web console? Well, be prepared to have your experience elevated!

This PR adds functionality to the web console primarily driven by the desire to support the Multi-stage distributed queries effort with an appropriate GUI. This PR needs #12918 to work and will not pass CI until #12918 is merged.

Please note: The example data is hosted on static.imply.io. Storage space and bandwidth on this server is being donated by Imply as a service to the community. I thought about using druid.apache.org instead, but that didn't seem like a good place to host larger data files, since the server is designed for website content.

Here are the high level bits.

New query view (WorkbenchView) with tabs and long running query support

The new query lets you execute multi-stage, task based, queries via the /druid/v2/sql/task and /druid/indexer/v1/task/* APIs as well as native and sql-native queries just like the old Query view. A key point of the sql-msq-task based queries is that they may run for a long time. This inspired / necessitated many UX changes including (but not limited to):

Tabs

You can now have many queries stored and running at the same time. In my opinion tabs are a UX game changer

You can, open many tabs, duplicate them, and copy them as text to be pasted into any console and reopened there.

Progress reports (counter reports)

sql-msq-task queries have detailed progress reports and these can be visualized using the summary progress bar, or the in detail execution table that provides summaries of the counters for every step.

Error and warning reports

sql-msq-task queries go out of their way to present user friendly warnings and errors should anything go wrong.
The new query view has components to visualize these with their full detail including a stack-trace.

Recent query panel

Since sql-msq-task queries are tasks is it possible to show queries that are executing currently and that have executed in the recent past.

For any query in this panel you can view the execution details for it and you can also attach it as a new tab and continue iterating on the query. It is also possible to download the "query detail archive" which is a JSON file containing all the important details for a given query that can be easily shared for troubleshooting.

Connect external data flow

Running SQL-based ingestion queries is all well and good once you have the query, but where do queries come from?

The Connect external data flow lets you use the sampler to sample your data, determine its schema, and generate a fully formed SQL query that you can edit to your hearts content before clicking Run.

This point-and-click flow will save you a bunch of typing.

Preview button

The Preview button appears when you type in an INSERT or REPLACE SQL query.

When clicked it removes the INSERT or REPLACE clause and executes your query as an "inline" query (with a limit). This lets you get a sense of how your data will be shaped after all the transformations that you coded into the SQL query have been applied.

Results table

The query results table has been improved in style and function.

It now shows you type icons for the column types and supports the ability to manipulate nested columns with ease.

Helper queries

Do you like notebooks? Do you like CTEs? Well now they have some UI affordances in the console.
Helper queries are collapsable elements that hold a query that can be referenced from the main query just like they were defined with a WITH statement.

The idea is that if you are composing a complicated query it is nice to be able to break it down into queries and be able to preview each part on its own.

Misc tools

Along with these features there are also miscellaneous tools available from the ... menu.

Along these you will find:

Explain query - show the query plan for a sql-native and sql-msq-task query.
Convert ingestion spec to SQL - Helps you migrate your native batch and hadoop based specs to the SQL-based format.
Open query detail archive - lets you open a query detail archive downloaded earlier.
Load demo queries - lets you load a set of pre-made queries to play around with sql-msq-task functionality.

New SQL-based data loader

The data loader exists as a GUI wizard to help users craft a JSON ingestion spec using point and click and quick previews.
The SQL data loader is the SQL-based ingestion analog of that.

Just like the native based data loader that serves as its inspiration, all the state is stored in the actual SQL query and at any moment the user can opt to manipulate the query directly.

What is the difference between the SQL-based data loader and the "Connect external data" flow? They start off exactly the same, they figure out the source and format to use and generate the external. After the external has been defined the "Connect external data" flow generates a seed query that you can then edit as you wish. The key point is that (unless you just want to ingest with the seed query) you must edit a rather complex SQL statement and preview it using the "Preview" functionality that is not sub-second (it reached out to the data for every preview). On the other hand the data loader let's you manipulate the (SQL) ingestion spec completely with point-and-click it also samples the data and lets you preview it using sql-native making the previews sub-second. The SQL data loader limits you to what kind of query you can write, for example you can not currently use it to express a query involving a JOIN, but at any point in time it lets you take your query to the regular query view so there is no need to worry about being restricted.

Since the SQL-based data loader is batch only and the existing data loader is for streaming (supervisors) and batch data (tasks). The existing data loader has been split up into its streaming and batch components to make the comparison clearer.

Misc other changes

Since the MSQ part is so transformative it drove other parts of the console development as well.

Header rearrangment

Now that the query view has tabs and can do ingestion and data manipulation it is about time it took the helm as the most important view in the header.

There is now also a more ... button that can host all the views not important enough to be top level.

Data preview for both the Datasource and Segment detail modals

You can now click on a datasource or segment to see a preview of the data within.

Task cancel status

The task table now explicitly shows if a task has been canceled in a different color than a failed task.

Other improvements

Expose the tuning config property maxColumnsToMerge for batch ingestion and compaction in the form UI
Fixed docs link URL
Better filtering on all data tables (Datasources, Ingestion, Services, etc)
Made filtering on modules work better in the Status dialog
Added the file filter property to all inputSources where it applies
Mixed some confusing labels in the UI
Future proof the useQueryManager hook to work with React18

lgtm-com · 2022-08-18T07:05:33Z

This pull request introduces 10 alerts when merging 748f6e5 into 7fb1153 - view on LGTM.com

new alerts:

7 for Polynomial regular expression used on uncontrolled data
2 for Property access on null or undefined
1 for Useless conditional

lgtm-com · 2022-08-18T07:16:23Z

This pull request introduces 9 alerts when merging 1b5cac0 into 7fb1153 - view on LGTM.com

new alerts:

7 for Polynomial regular expression used on uncontrolled data
2 for Property access on null or undefined

lgtm-com · 2022-08-18T16:50:08Z

This pull request introduces 7 alerts and fixes 2 when merging f2cb02a into 7fb1153 - view on LGTM.com

new alerts:

7 for Polynomial regular expression used on uncontrolled data

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-19T20:03:27Z

This pull request introduces 1 alert and fixes 2 when merging fa623bc into 69fe1f0 - view on LGTM.com

new alerts:

1 for Polynomial regular expression used on uncontrolled data

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-19T22:04:16Z

This pull request introduces 1 alert and fixes 2 when merging bb112ca into 69fe1f0 - view on LGTM.com

new alerts:

1 for Polynomial regular expression used on uncontrolled data

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-20T02:36:14Z

This pull request fixes 2 alerts when merging 6355491 into f8097cc - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-20T19:27:57Z

This pull request fixes 2 alerts when merging 23cf43e into f8097cc - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

vogievetsky · 2022-08-20T20:13:22Z

I have made a preview build of Druid 24 that includes #12918 and #12919 if you want to play with it and help test. You can get it by running:

wget https://static.imply.io/msq-preview/apache-druid-24-msq-preview1-bin.tar.gz
tar xzf apache-druid-24-msq-preview1-bin.tar.gz
cd apache-druid-0.24.0-SNAPSHOT
./bin/start-micro-quickstart

vogievetsky · 2022-08-22T03:39:12Z

If you want to see this PR in action I made a video based on the above build: https://www.youtube.com/watch?v=xUzYuPfzARk (make sure to watch in the highest available quality)

lgtm-com · 2022-08-22T05:37:13Z

This pull request fixes 2 alerts when merging 6ad1437 into f8097cc - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

* Use Ace.Completion directly * Another Ace.Completion

lgtm-com · 2022-08-22T20:26:58Z

This pull request fixes 2 alerts when merging 7870642 into 6fec1d4 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-22T23:10:52Z

This pull request fixes 2 alerts when merging 60a6bba into 6fec1d4 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-23T16:34:19Z

This pull request fixes 2 alerts when merging 1b360ce into f0fc450 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

gianm

I have been using this while @vogievetsky was developing it and it is very cool. I'm not a frontend person, so I didn't dig through the files closely, but I did give a once-over to the general structure and to some of the non-JS files (like new scripts, license updates). They look OK to me.

@vogievetsky, there is one LGTM issue: "2 for Property access on null or undefined". Does it matter? If it doesn't matter is there a way to suppress it?

You also mention that some example files are hosted on static.imply.io. The decision makes sense to me, since I've heard we shouldn't use project sites (like druid.apache.org) for larger data files, and some of these are tens of MBs. It would be better, however, for these files to have a long term home on a server that the project team has direct access to. Could you inquire with ASF Infra to see if there's an appropriate place? I don't think this needs to block the PR, but it'd be good to figure out for the long term.

vogievetsky · 2022-08-23T17:56:35Z

Haha, thank you for the light review. If you look at the LGTM message closely you will see that I actually fixed those issues so after this PR there will be 2 fewer issues than before. I believe those issues spontaneously emerged due to new checks being added in LGTM. My new code originally added 2 of those issues and then I went in and fixed 4 of them 👍 .

I will check in with ASF infra.

gianm · 2022-08-23T21:43:28Z

Haha, thank you for the light review. If you look at the LGTM message closely you will see that I actually fixed those issues so after this PR there will be 2 fewer issues than before. I believe those issues spontaneously emerged due to new checks being added in LGTM. My new code originally added 2 of those issues and then I went in and fixed 4 of them 👍 .

Ha, got it. Maybe that's the same thing I ran into here: #12918 (comment)

I will check in with ASF infra.

Thank you.

lgtm-com · 2022-08-24T02:42:41Z

This pull request fixes 2 alerts when merging 2c8b8e6 into d7d15ba - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-24T03:32:03Z

This pull request fixes 2 alerts when merging 1e3d276 into 31eda58 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-24T06:41:58Z

This pull request fixes 2 alerts when merging b21f180 into 3b58a01 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

lgtm-com · 2022-08-24T15:57:32Z

This pull request fixes 2 alerts when merging f504f0c into 35aaaa9 - view on LGTM.com

fixed alerts:

2 for Property access on null or undefined

clintropolis · 2022-08-24T20:43:40Z

...gs/overlord-dynamic-config-dialog/__snapshots__/overload-dynamic-config-dialog.spec.tsx.snap

@@ -11,7 +11,7 @@ exports[`OverlordDynamicConfigDialog matches snapshot 1`] = `
    Edit the overlord dynamic configuration on the fly. For more information please refer to the

    <Memo(ExternalLink)
-      href="https://druid.apache.org/docs/0.23.0/configuration/index.html#overlord-dynamic-configuration"
+      href="https://druid.apache.org/docs/latest/configuration/index.html#overlord-dynamic-configuration"


what made this part change? Shouldn't the console link to the docs of the version of the build? Or is this some artifact of tests and in a release build it would point to the docs version of the release?

I changed the doc version to "latest". I think that is a state that makes sense or that setting on the master branch. I was imagining that we would cut the release branch and then change the version there. That way there is less "thrashing" on master.

Yeah, that makes sense, where is it set at? I looked around but you've got a lot of changes here and couldn't find it 😜

It is set on this line: https://github.com/apache/druid/blob/master/web-console/src/links.ts#L22

clintropolis · 2022-08-24T20:56:02Z

Ah, I think there is a hole in the check test script that caused this PR to have to run all the tests. This PR changed a file in licenses and https://github.com/apache/druid/blob/master/check_test_suite.py#L114 isn't ignoring changes to that directory so it counts as a "java" and so has to run the tests.

Since the license check job itself is always ran, i think we could just add it to the ignored list.

MSQ web console

4a82ab9

vogievetsky added Area - Web Console Area - Query UI Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Aug 18, 2022

fix typo in comments

748f6e5

remove useless conditional

1b5cac0

wrap SQL_DATA_TYPES

f2cb02a

fixes sus regex

fa623bc

rewrite regex

bb112ca

remove problematic regex

6355491

fix UTs

23cf43e

vogievetsky mentioned this pull request Aug 20, 2022

Add druid-multi-stage-query extension. #12918

Merged

convert PARTITIONED / CLUSTERED BY to ORDER BY for preview

6ad1437

vogievetsky and others added 3 commits August 22, 2022 12:42

fix log

57513da

updated to use shuffle

9b03b45

Web console: Use Ace.Completion directly (apache#1405)

7870642

* Use Ace.Completion directly * Another Ace.Completion

better comment

60a6bba

Merge remote-tracking branch 'upstream/master' into msq-web-console

1b360ce

gianm reviewed Aug 23, 2022

View reviewed changes

gianm approved these changes Aug 23, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/master' into msq-web-console

2c8b8e6

fix column ordering in e2e test

1e3d276

add nested data example also

b21f180

Merge remote-tracking branch 'upstream/master' into msq-web-console

f504f0c

clintropolis approved these changes Aug 24, 2022

View reviewed changes

clintropolis mentioned this pull request Aug 24, 2022

ignore licenses changes in check test script #12964

Merged

vogievetsky merged commit 04ee7ab into apache:master Aug 24, 2022

vogievetsky deleted the msq-web-console branch August 24, 2022 23:17

clintropolis mentioned this pull request Aug 24, 2022

Update asf-release-process-guide.md #12966

Merged

vogievetsky mentioned this pull request Aug 26, 2022

Text box search in the Status pop-up window is broken in 0.23.0. #12921

Closed

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

techdocsmith mentioned this pull request Aug 26, 2022

[Draft] 24.0 Release notes #12825

Closed

asdf2014 mentioned this pull request Aug 30, 2022

Upgrade DRUID_DOCS_VERSION from 0.20.0 to latest version #12832

Closed

9 tasks

abhishekagarwal87 mentioned this pull request Sep 8, 2022

Test issue [Please ignore] #13055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web console: Multi-stage query support #12919

Web console: Multi-stage query support #12919

vogievetsky commented Aug 18, 2022 •

edited

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 19, 2022

lgtm-com bot commented Aug 19, 2022

lgtm-com bot commented Aug 20, 2022

lgtm-com bot commented Aug 20, 2022

vogievetsky commented Aug 20, 2022

vogievetsky commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 23, 2022

gianm left a comment

vogievetsky commented Aug 23, 2022

gianm commented Aug 23, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

clintropolis Aug 24, 2022

vogievetsky Aug 24, 2022

clintropolis Aug 24, 2022

vogievetsky Aug 24, 2022

clintropolis commented Aug 24, 2022

Web console: Multi-stage query support #12919

Web console: Multi-stage query support #12919

Conversation

vogievetsky commented Aug 18, 2022 • edited

New query view (WorkbenchView) with tabs and long running query support

Tabs

Progress reports (counter reports)

Error and warning reports

Recent query panel

Connect external data flow

Preview button

Results table

Helper queries

Misc tools

New SQL-based data loader

Misc other changes

Header rearrangment

Data preview for both the Datasource and Segment detail modals

Task cancel status

Other improvements

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 19, 2022

lgtm-com bot commented Aug 19, 2022

lgtm-com bot commented Aug 20, 2022

lgtm-com bot commented Aug 20, 2022

vogievetsky commented Aug 20, 2022

vogievetsky commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 22, 2022

lgtm-com bot commented Aug 23, 2022

gianm left a comment

Choose a reason for hiding this comment

vogievetsky commented Aug 23, 2022

gianm commented Aug 23, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

lgtm-com bot commented Aug 24, 2022

clintropolis Aug 24, 2022

Choose a reason for hiding this comment

vogievetsky Aug 24, 2022

Choose a reason for hiding this comment

clintropolis Aug 24, 2022

Choose a reason for hiding this comment

vogievetsky Aug 24, 2022

Choose a reason for hiding this comment

clintropolis commented Aug 24, 2022

vogievetsky commented Aug 18, 2022 •

edited