New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SystemSchema tables (#5989) #6094
Changes from 5 commits
44c6337
52b6115
335fc2b
4f34202
59e996b
7991720
3fd41de
816552e
d74040c
a8038ee
456a0ad
5728364
05fd4ce
cec8737
54dd64c
c99f027
7a57b1e
3cb0f52
cf18959
68d45a0
0239f94
4e3b013
495883a
b6fe553
bab61c6
14064c5
1f44382
8f7b0b6
b66a81b
e92237f
9efbe96
95b5bc8
b605ab9
1569aa5
b1a219a
ba7afe9
44d7285
be5e9d7
f53600f
100fa46
a0dc468
0f96043
689f655
3806a9c
dc9fa4c
132404d
7ffc2b4
1bdff58
26acfe8
3fbbdc6
ccc7f18
23112a5
b84d728
3cd1722
1022693
d63469d
9f396aa
e0657e5
83c74fe
1873c92
892ee80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -430,6 +430,10 @@ plan SQL queries. This metadata is cached on broker startup and also updated per | |
[SegmentMetadata queries](segmentmetadataquery.html). Background metadata refreshing is triggered by | ||
segments entering and exiting the cluster, and can also be throttled through configuration. | ||
|
||
Druid exposes system information through special system tables. There are two such schemas available : Information Schema and System Schema | ||
|
||
## INFORMATION SCHEMA | ||
|
||
You can access table and column metadata through JDBC using `connection.getMetaData()`, or through the | ||
INFORMATION_SCHEMA tables described below. For example, to retrieve metadata for the Druid | ||
datasource "foo", use the query: | ||
|
@@ -481,6 +485,77 @@ SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'druid' AND TABLE_ | |
|COLLATION_NAME|| | ||
|JDBC_TYPE|Type code from java.sql.Types (Druid extension)| | ||
|
||
## SYSTEM SCHEMA | ||
|
||
SYSTEM_TABLES provide visibility into the druid segments, servers and tasks. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please correct capitalization and naming:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
For example to retrieve all segments for datasource "wikipedia", use the query: | ||
```sql | ||
select * from SYS.SEGMENTS where DATASOURCE='wikipedia'; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lowercase seems more Druid-y, so I think I'd prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, will make everything lowercase in docs and code. For the column names, do they need to be camelCase like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, good question. I think in SQL underscores are more normal, although If anyone else has an opinion please go for it. |
||
``` | ||
|
||
### SEGMENTS table | ||
Segments tables provides details on all the segments, both published and served(but not published). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me this reads a bit unclear, I'd suggest trying something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
|
||
|
||
|Column|Notes| | ||
|------|-----| | ||
|SEGMENT_ID|| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please include a description for all of these columns, and capitalize the first letter of each description. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please include "size", "version", and "partition_num" too -- they are all useful. I'd also include "replicas" which should be the number of replicas currently being served. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
|DATASOURCE|| | ||
|START|| | ||
|END|| | ||
|IS_PUBLISHED|segment in metadata store| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be clearer to expand this a bit: "True if this segment has been published to the metadata store." Similar comment for the other ones. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added more description |
||
|IS_AVAILABLE|segment is being served| | ||
|IS_REALTIME|segment served on a realtime server| | ||
|PAYLOAD|jsonified datasegment payload| | ||
|
||
### SERVERS table | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a blurb here explaining what this table is all about. Currently, it's listing all data servers (anything that might host a segment) and that includes both historicals and ingestion tasks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added blurb |
||
|
||
|
||
|Column|Notes| | ||
|------|-----| | ||
|SERVER|| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please include a description for all of these columns, including:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added description |
||
|SERVER_TYPE|| | ||
|TIER|| | ||
|CURRENT_SIZE|| | ||
|MAX_SIZE|| | ||
|
||
To retrieve all servers information, use the query | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better grammar: "To retrieve information about all servers, use the query:" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
```sql | ||
select * from SYS.SERVERS; | ||
``` | ||
|
||
### SEGMENTSERVERS table | ||
|
||
SEGMENTSERVERS is used to join SEGMENTS with SERVERS table | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed to |
||
|
||
|Column|Notes| | ||
|------|-----| | ||
|SERVER|| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please include in the notes which column these correspond to in the other tables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|SEGMENT_ID|| | ||
|
||
### TASKS table | ||
|
||
TASKS table provides tasks info from overlord. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about:
And link "indexing tasks" to a useful page about that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
|Column|Notes| | ||
|------|-----| | ||
|TASK_ID|| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These should all have comments too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes added |
||
|TYPE|| | ||
|DATASOURCE|| | ||
|CREATED_TIME|| | ||
|QUEUE_INSERTION_TIME|| | ||
|STATUS|| | ||
|RUNNER_STATUS|| | ||
|DURATION|| | ||
|LOCATION|| | ||
|ERROR_MSG|| | ||
|
||
For example, to retrieve tasks information filtered by status, use the query | ||
```sql | ||
select * from SYS.TASKS where STATUS='FAILED'; | ||
``` | ||
|
||
|
||
## Server configuration | ||
|
||
The Druid SQL server is configured through the following properties on the broker. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,9 +19,13 @@ | |
|
||
package io.druid.sql.calcite.planner; | ||
|
||
import com.fasterxml.jackson.databind.ObjectMapper; | ||
import com.google.common.base.Preconditions; | ||
import com.google.common.io.BaseEncoding; | ||
import com.google.common.primitives.Chars; | ||
import io.druid.client.BrokerServerView; | ||
import io.druid.client.TimelineServerView; | ||
import io.druid.discovery.DruidLeaderClient; | ||
import io.druid.java.util.common.DateTimes; | ||
import io.druid.java.util.common.IAE; | ||
import io.druid.java.util.common.ISE; | ||
|
@@ -32,6 +36,7 @@ | |
import io.druid.server.security.AuthorizerMapper; | ||
import io.druid.sql.calcite.schema.DruidSchema; | ||
import io.druid.sql.calcite.schema.InformationSchema; | ||
import io.druid.sql.calcite.schema.SystemSchema; | ||
import org.apache.calcite.jdbc.CalciteSchema; | ||
import org.apache.calcite.rel.type.RelDataType; | ||
import org.apache.calcite.rel.type.RelDataTypeFactory; | ||
|
@@ -98,11 +103,30 @@ public static Charset defaultCharset() | |
return DEFAULT_CHARSET; | ||
} | ||
|
||
public static SchemaPlus createRootSchema(final Schema druidSchema, final AuthorizerMapper authorizerMapper) | ||
public static SchemaPlus createRootSchema( | ||
final TimelineServerView serverView, | ||
final Schema druidSchema, | ||
final AuthorizerMapper authorizerMapper, | ||
final DruidLeaderClient coordinatorDruidLeaderClient, | ||
final DruidLeaderClient overlordDruidLeaderClient, | ||
final ObjectMapper jsonMapper | ||
) | ||
{ | ||
final SchemaPlus rootSchema = CalciteSchema.createRootSchema(false, false).plus(); | ||
rootSchema.add(DruidSchema.NAME, druidSchema); | ||
rootSchema.add(InformationSchema.NAME, new InformationSchema(rootSchema, authorizerMapper)); | ||
if (serverView instanceof BrokerServerView) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really necessary? It looks like you added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it doesn't seem necessary, removed the cast. |
||
rootSchema.add( | ||
SystemSchema.NAME, | ||
new SystemSchema( | ||
(BrokerServerView) serverView, | ||
authorizerMapper, | ||
coordinatorDruidLeaderClient, | ||
overlordDruidLeaderClient, | ||
jsonMapper | ||
) | ||
); | ||
} | ||
return rootSchema; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing around the colon is weird: it should have a space after, but not before. Please also add some information about what each table is useful for (INFORMATION_SCHEMA provides details about tables/column types, and SYS provides information about Druid internals like segments/tasks/servers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed