New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SystemSchema tables (#5989) #6094
Changes from all commits
44c6337
52b6115
335fc2b
4f34202
59e996b
7991720
3fd41de
816552e
d74040c
a8038ee
456a0ad
5728364
05fd4ce
cec8737
54dd64c
c99f027
7a57b1e
3cb0f52
cf18959
68d45a0
0239f94
4e3b013
495883a
b6fe553
bab61c6
14064c5
1f44382
8f7b0b6
b66a81b
e92237f
9efbe96
95b5bc8
b605ab9
1569aa5
b1a219a
ba7afe9
44d7285
be5e9d7
f53600f
100fa46
a0dc468
0f96043
689f655
3806a9c
dc9fa4c
132404d
7ffc2b4
1bdff58
26acfe8
3fbbdc6
ccc7f18
23112a5
b84d728
3cd1722
1022693
d63469d
9f396aa
e0657e5
83c74fe
1873c92
892ee80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -477,6 +477,11 @@ plan SQL queries. This metadata is cached on broker startup and also updated per | |
[SegmentMetadata queries](segmentmetadataquery.html). Background metadata refreshing is triggered by | ||
segments entering and exiting the cluster, and can also be throttled through configuration. | ||
|
||
Druid exposes system information through special system tables. There are two such schemas available: Information Schema and Sys Schema. | ||
Information schema provides details about table and column types. The "sys" schema provides information about Druid internals like segments/tasks/servers. | ||
|
||
## INFORMATION SCHEMA | ||
|
||
You can access table and column metadata through JDBC using `connection.getMetaData()`, or through the | ||
INFORMATION_SCHEMA tables described below. For example, to retrieve metadata for the Druid | ||
datasource "foo", use the query: | ||
|
@@ -528,6 +533,101 @@ SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'druid' AND TABLE_ | |
|COLLATION_NAME|| | ||
|JDBC_TYPE|Type code from java.sql.Types (Druid extension)| | ||
|
||
## SYSTEM SCHEMA | ||
|
||
The "sys" schema provides visibility into Druid segments, servers and tasks. | ||
For example to retrieve all segments for datasource "wikipedia", use the query: | ||
```sql | ||
SELECT * FROM sys.segments WHERE datasource = 'wikipedia' | ||
``` | ||
|
||
### SEGMENTS table | ||
Segments table provides details on all Druid segments, whether they are published yet or not. | ||
|
||
|
||
|Column|Notes| | ||
|------|-----| | ||
|segment_id|Unique segment identifier| | ||
|datasource|Name of datasource| | ||
|start|Interval start time (in ISO 8601 format)| | ||
|end|Interval end time (in ISO 8601 format)| | ||
|size|Size of segment in bytes| | ||
|version|Version string (generally an ISO8601 timestamp corresponding to when the segment set was first started). Higher version means the more recently created segment. Version comparing is based on string comparison.| | ||
|partition_num|Partition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)| | ||
|num_replicas|Number of replicas of this segment currently being served| | ||
|num_rows|Number of rows in current segment, this value could be null if unkown to broker at query time| | ||
|is_published|Boolean is represented as long type where 1 = true, 0 = false. 1 represents this segment has been published to the metadata store| | ||
|is_available|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any server(historical or realtime)| | ||
|is_realtime|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is being served on any type of realtime tasks| | ||
|payload|JSON-serialized data segment payload| | ||
|
||
### SERVERS table | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a blurb here explaining what this table is all about. Currently, it's listing all data servers (anything that might host a segment) and that includes both historicals and ingestion tasks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added blurb |
||
Servers table lists all data servers(any server that hosts a segment). It includes both historicals and peons. | ||
|
||
|Column|Notes| | ||
|------|-----| | ||
|server|Server name in the form host:port| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be useful to have another field for just the host. Imagine doing stuff like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, added another column for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does SystemSchema support concatenating strings? Then, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does support concatenating, but I think we should keep this here. It's basically the primary key of the server table, and is used for joins with the segment_servers table. The other fields (host, plaintext_port, etc) are provided too as conveniences. With system tables, since they're all generated dynamically anyway, it's ok to have some redundancy when it makes the user experience more convenient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just tried concat, it does not support that call |
||
|host|Hostname of the server| | ||
|plaintext_port|Unsecured port of the server, or -1 if plaintext traffic is disabled| | ||
|tls_port|TLS port of the server, or -1 if TLS is disabled| | ||
|server_type|Type of Druid service. Possible values include: historical, realtime and indexer_executor(peon).| | ||
|tier|Distribution tier see [druid.server.tier](#../configuration/index.html#Historical-General-Configuration)| | ||
|current_size|Current size of segments in bytes on this server| | ||
|max_size|Max size in bytes this server recommends to assign to segments see [druid.server.maxSize](#../configuration/index.html#Historical-General-Configuration)| | ||
|
||
To retrieve information about all servers, use the query: | ||
```sql | ||
SELECT * FROM sys.servers; | ||
``` | ||
|
||
### SERVER_SEGMENTS table | ||
|
||
SERVER_SEGMENTS is used to join servers with segments table | ||
|
||
|Column|Notes| | ||
|------|-----| | ||
|server|Server name in format host:port (Primary key of [servers table](#SERVERS-table))| | ||
|segment_id|Segment identifier (Primary key of [segments table](#SEGMENTS-table))| | ||
|
||
JOIN between "servers" and "segments" can be used to query the number of segments for a specific datasource, | ||
grouped by server, example query: | ||
```sql | ||
SELECT count(segments.segment_id) as num_segments from sys.segments as segments | ||
INNER JOIN sys.server_segments as server_segments | ||
ON segments.segment_id = server_segments.segment_id | ||
INNER JOIN sys.servers as servers | ||
ON servers.server = server_segments.server | ||
WHERE segments.datasource = 'wikipedia' | ||
GROUP BY servers.server; | ||
``` | ||
|
||
### TASKS table | ||
|
||
The tasks table provides information about active and recently-completed indexing tasks. For more information | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "check out" not "checkout out" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
check out [ingestion tasks](#../ingestion/tasks.html) | ||
|
||
|Column|Notes| | ||
|------|-----| | ||
|task_id|Unique task identifier| | ||
|type|Task type, for example this value is "index" for indexing tasks. See [tasks-overview](../ingestion/tasks.md)| | ||
|datasource|Datasource name being indexed| | ||
|created_time|Timestamp in ISO8601 format corresponding to when the ingestion task was created. Note that this value is populated for completed and waiting tasks. For running and pending tasks this value is set to 1970-01-01T00:00:00Z| | ||
|queue_insertion_time|Timestamp in ISO8601 format corresponding to when this task was added to the queue on the overlord| | ||
|status|Status of a task can be RUNNING, FAILED, SUCCESS| | ||
|runner_status|Runner status of a completed task would be NONE, for in-progress tasks this can be RUNNING, WAITING, PENDING| | ||
|duration|Time it took to finish the task in milliseconds, this value is present only for completed tasks| | ||
|location|Server name where this task is running in the format host:port, this information is present only for RUNNING tasks| | ||
|host|Hostname of the server where task is running| | ||
|plaintext_port|Unsecured port of the server, or -1 if plaintext traffic is disabled| | ||
|tls_port|TLS port of the server, or -1 if TLS is disabled| | ||
|error_msg|Detailed error message in case of FAILED tasks| | ||
|
||
For example, to retrieve tasks information filtered by status, use the query | ||
```sql | ||
SELECT * FROM sys.tasks where status='FAILED'; | ||
``` | ||
|
||
|
||
## Server configuration | ||
|
||
The Druid SQL server is configured through the following properties on the broker. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah,
sys
is the real name. Got it.