Skip to content

Commit d0837ce

Browse files
a49ayaooqinn
authored andcommitted
[KYUUBI #1866][DOCS] Add flink sql engine quick start
### _Why are the changes needed?_ Add quick start documents of the Flink SQL Engine. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #2106 from deadwind4/KYUUBI-1866-quickstart. Closes #1866 2533aaf [Ada Wong] remove Yarn section 6aa4db8 [Ada Wong] compress png ff6bff7 [Ada Wong] [KYUUBI #1866][DOCS] Add flink sql engine quick start Authored-by: Ada Wong <rsl4@foxmail.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 8f7b2c6) Signed-off-by: Kent Yao <yao@apache.org>
1 parent 46638d7 commit d0837ce

File tree

2 files changed

+140
-19
lines changed

2 files changed

+140
-19
lines changed

docs/imgs/flink/flink_jobs_page.png

64.4 KB
Loading

docs/quick_start/quick_start.md

Lines changed: 140 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -36,49 +36,51 @@ You can get the most recent stable release of Apache Kyuubi here:
3636
## Requirements
3737

3838
These are essential components required for Kyuubi to startup.
39-
For quick start deployment, the only thing you need is `JAVA_HOME` and `SPARK_HOME` being correctly set.
39+
For quick start deployment, the only thing you need is `JAVA_HOME` being correctly set.
4040
The Kyuubi release package you downloaded or built contains the rest prerequisites inside already.
4141

4242
Components| Role | Optional | Version | Remarks
4343
--- | --- | --- | --- | ---
4444
Java | Java<br>Runtime<br>Environment | Required | Java 8/11 | Kyuubi is pre-built with Java 8
45-
Spark | Distributed<br>SQL<br>Engine | Required | 3.0.0 and above | By default Kyuubi binary release is delivered without<br> a Spark tarball.
45+
Spark | Distributed<br>SQL<br>Engine | Optional | 3.0.0 and above | By default Kyuubi binary release is delivered without<br> a Spark tarball.
46+
Flink | Distributed<br>SQL<br>Engine | Optional | 1.14.0 and above | By default Kyuubi binary release is delivered without<br> a Flink tarball.
4647
HDFS | Distributed<br>File<br>System | Optional | referenced<br>by<br>Spark | Hadoop Distributed File System is a <br>part of Hadoop framework, used to<br> store and process the datasets.<br> You can interact with any<br> Spark-compatible versions of HDFS.
4748
Hive | Metastore | Optional | referenced<br>by<br>Spark | Hive Metastore for Spark SQL to connect
4849
Zookeeper | Service<br>Discovery | Optional | Any<br>zookeeper<br>ensemble<br>compatible<br>with<br>curator(2.12.0) | By default, Kyuubi provides a<br> embedded Zookeeper server inside for<br> non-production use.
4950

50-
Additionally, if you want to work with other Spark compatible systems or plugins, you only need to take care of them as using them with regular Spark applications.
51-
For example, you can run Spark SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
52-
Or, you can manipulate data from different data sources with the Spark Datasource API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
51+
Additionally, if you want to work with other Spark/Flink compatible systems or plugins, you only need to take care of them as using them with regular Spark/Flink applications.
52+
For example, you can run Spark/Flink SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
53+
Or, you can manipulate data from different data sources with the Spark Datasource/Flink Table API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
5354

5455
## Installation
5556

5657
To install Kyuubi, you need to unpack the tarball. For example,
5758

5859
```bash
59-
tar zxf apache-kyuubi-1.3.1-incubating-bin.tgz
60+
tar zxf apache-kyuubi-1.5.0-incubating-bin.tgz
6061
```
6162

62-
This will result in the creation of a subdirectory named `apache-kyuubi-1.3.1-incubating-bin` shown below,
63+
This will result in the creation of a subdirectory named `apache-kyuubi-1.5.0-incubating-bin` shown below,
6364

6465
```bash
65-
apache-kyuubi-1.3.1-incubating-bin
66+
apache-kyuubi-1.5.0-incubating-bin
6667
├── DISCLAIMER
6768
├── LICENSE
6869
├── NOTICE
6970
├── RELEASE
71+
├── beeline-jars
7072
├── bin
7173
├── conf
7274
| ├── kyuubi-defaults.conf.template
7375
│ ├── kyuubi-env.sh.template
74-
│ └── log4j.properties.template
76+
│ └── log4j2.properties.template
7577
├── docker
7678
│ ├── Dockerfile
79+
│ ├── helm
7780
│ ├── kyuubi-configmap.yaml
81+
│ ├── kyuubi-deployment.yaml
7882
│ ├── kyuubi-pod.yaml
7983
│ └── kyuubi-service.yaml
80-
├── extension
81-
│ └── kyuubi-extension-spark-3-1_2.12-1.3.1-incubating.jar
8284
├── externals
8385
│ └── engines
8486
├── jars
@@ -97,7 +99,7 @@ From top to bottom are:
9799
- bin: the entry of the Kyuubi server with `kyuubi` as the startup script.
98100
- conf: all the defaults used by Kyuubi Server itself or creating a session with Spark applications.
99101
- externals
100-
- engines: contains all kinds of SQL engines that we support, e.g. Apache Spark, Apache Flink(coming soon).
102+
- engines: contains all kinds of SQL engines that we support, e.g. Apache Spark, Apache Flink, Trino(coming soon).
101103
- licenses: a bunch of licenses included.
102104
- jars: packages needed by the Kyuubi server.
103105
- logs: where the logs of the Kyuubi server locates.
@@ -106,7 +108,11 @@ From top to bottom are:
106108

107109
## Running Kyuubi
108110

109-
As mentioned above, for a quick start deployment, then only you need to be sure is that your java runtime environment and `SPARK_HOME` are correct.
111+
As mentioned above, for a quick start deployment, then only you need to be sure is that the below environments are correct:
112+
113+
- Java runtime environment
114+
- `SPARK_HOME` for the Spark engine
115+
- `FLINK_HOME` and `kyuubi.engine.type` in `$KYUUBI_HOME/conf/kyuubi-defaults.conf` for the Flink engine.
110116

111117
### Setup JAVA
112118

@@ -132,7 +138,9 @@ Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)
132138
The recommended place to set `JAVA_HOME` is `$KYUUBI_HOME/conf/kyuubi-env.sh`, as the ways above are too flaky.
133139
The `JAVA_HOME` in `$KYUUBI_HOME/conf/kyuubi-env.sh` will take others' precedence.
134140

135-
### Setup Spark
141+
### Spark Engine
142+
143+
#### Setup Spark
136144

137145
Similar to `JAVA_HOME`, you can also set `SPARK_HOME` in different ways. However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
138146

@@ -142,6 +150,26 @@ For example,
142150
SPARK_HOME=~/Downloads/spark-3.2.0-bin-hadoop3.2
143151
```
144152

153+
### Flink Engine
154+
155+
#### Setup Flink
156+
157+
Similar to `JAVA_HOME`, you can also set `FLINK_HOME` in different ways. However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
158+
159+
For example,
160+
161+
```bash
162+
FLINK_HOME=/Downloads/flink-1.14.3
163+
```
164+
165+
#### Setup Kyuubi Flink Configration
166+
167+
To enable the Flink SQL engine, the `kyuubi.engine.type` in `$KYUUBI_HOME/conf/kyuubi-defaults.conf` need to be set as `FLINK_SQL`.
168+
169+
```bash
170+
kyuubi.engine.type FLINK_SQL
171+
```
172+
145173
### Starting Kyuubi
146174

147175
```bash
@@ -194,7 +222,7 @@ bin/kyuubi run
194222

195223
## Using Hive Beeline
196224

197-
Kyuubi server is compatible with Apache Hive beeline, so you can use `$SPARK_HOME/bin/beeline` for testing.
225+
Kyuubi server is compatible with Apache Hive beeline, so you can use `$KYUUBI_HOME/bin/beeline` for testing.
198226

199227
### Opening a Connection
200228

@@ -212,7 +240,7 @@ Beeline version 2.3.7 by Apache Hive
212240

213241
In this case, the session will create for the user named 'anonymous'.
214242

215-
Kyuubi will create a Spark SQL engine application using `kyuubi-spark-sql-engine_2.12-<version>.jar`.
243+
Kyuubi will create a Spark/Flink SQL engine application using `kyuubi-<engine>-sql-engine_2.12-<version>.jar`.
216244
It will cost awhile for the application to be ready before fully establishing the session.
217245
Otherwise, an existing application will be reused, and the time cost here is negligible.
218246

@@ -224,17 +252,28 @@ bin/beeline -u 'jdbc:hive2://localhost:10009/' -n kentyao
224252

225253
The formerly created Spark application for user 'anonymous' will not be reused in this case, while a brand new application will be submitted for user 'kentyao' instead.
226254

227-
Then, you can see 3 processes running in your local environment, including one `KyuubiServer` instance and 2 `SparkSubmit` instances as the SQL engines.
255+
Then, you can see two processes running in your local environment, including one `KyuubiServer` instance, one `SparkSubmit` or `FlinkSQLEngine` instances as the SQL engines.
256+
257+
- Spark
228258

229259
```
230260
75730 Jps
231261
70843 KyuubiServer
232262
72566 SparkSubmit
233-
75356 SparkSubmit
263+
```
264+
265+
- Flink
266+
267+
```
268+
43484 Jps
269+
43194 KyuubiServer
270+
43260 FlinkSQLEngine
234271
```
235272

236273
### Execute Statements
237274

275+
#### Execute Spark SQL Statements
276+
238277
If the beeline session is successfully connected, then you can run any query supported by Spark SQL now. For example,
239278

240279
```logtalk
@@ -303,6 +342,88 @@ For example, you can get the Spark web UI from the log for debugging or tuning.
303342

304343
![](../imgs/spark_jobs_page.png)
305344

345+
#### Execute Flink SQL Statements
346+
347+
If the beeline session is successfully connected, then you can run any query supported by Flink SQL now. For example,
348+
349+
```logtalk
350+
0: jdbc:hive2://127.0.0.1:10009/default> CREATE TABLE T (
351+
. . . . . . . . . . . . . . . . . . . . . . > a INT,
352+
. . . . . . . . . . . . . . . . . . . . . . > b VARCHAR(10)
353+
. . . . . . . . . . . . . . . . . . . . . . > ) WITH (
354+
. . . . . . . . . . . . . . . . . . . . . . > 'connector.type' = 'filesystem',
355+
. . . . . . . . . . . . . . . . . . . . . . > 'connector.path' = 'file:///tmp/T.csv',
356+
. . . . . . . . . . . . . . . . . . . . . . > 'format.type' = 'csv',
357+
. . . . . . . . . . . . . . . . . . . . . . > 'format.derive-schema' = 'true'
358+
. . . . . . . . . . . . . . . . . . . . . . > );
359+
16:28:47.164 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: INITIALIZED_STATE -> PENDING_STATE, statement: CREATE TABLE T(
360+
a INT,
361+
b VARCHAR(10)
362+
) WITH (
363+
'connector.type' = 'filesystem',
364+
'connector.path' = 'file:///tmp/T.csv',
365+
'format.type' = 'csv',
366+
'format.derive-schema' = 'true'
367+
)
368+
16:28:47.187 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: PENDING_STATE -> RUNNING_STATE, statement: CREATE TABLE T(
369+
a INT,
370+
b VARCHAR(10)
371+
) WITH (
372+
'connector.type' = 'filesystem',
373+
'connector.path' = 'file:///tmp/T.csv',
374+
'format.type' = 'csv',
375+
'format.derive-schema' = 'true'
376+
)
377+
16:28:47.320 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f] in FINISHED_STATE
378+
16:28:47.322 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: RUNNING_STATE -> FINISHED_STATE, statement: CREATE TABLE T(
379+
a INT,
380+
b VARCHAR(10)
381+
) WITH (
382+
'connector.type' = 'filesystem',
383+
'connector.path' = 'file:///tmp/T.csv',
384+
'format.type' = 'csv',
385+
'format.derive-schema' = 'true'
386+
), time taken: 0.134 seconds
387+
+---------+
388+
| result |
389+
+---------+
390+
| OK |
391+
+---------+
392+
1 row selected (0.341 seconds)
393+
0: jdbc:hive2://127.0.0.1:10009/default> INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello');
394+
16:28:52.780 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: INITIALIZED_STATE -> PENDING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
395+
16:28:52.786 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: PENDING_STATE -> RUNNING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
396+
16:28:57.827 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in RUNNING_STATE
397+
16:28:59.836 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in FINISHED_STATE
398+
16:28:59.837 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: RUNNING_STATE -> FINISHED_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello'), time taken: 7.05 seconds
399+
+-------------------------------------+
400+
| default_catalog.default_database.T |
401+
+-------------------------------------+
402+
| -1 |
403+
+-------------------------------------+
404+
1 row selected (7.104 seconds)
405+
0: jdbc:hive2://127.0.0.1:10009/default>
406+
0: jdbc:hive2://127.0.0.1:10009/default> SELECT * FROM T;
407+
16:29:08.092 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: INITIALIZED_STATE -> PENDING_STATE, statement: SELECT * FROM T
408+
16:29:08.101 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: PENDING_STATE -> RUNNING_STATE, statement: SELECT * FROM T
409+
16:29:12.519 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f] in FINISHED_STATE
410+
16:29:12.520 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: RUNNING_STATE -> FINISHED_STATE, statement: SELECT * FROM T, time taken: 4.419 seconds
411+
+----+--------+
412+
| a | b |
413+
+----+--------+
414+
| 1 | Hi |
415+
| 2 | Hello |
416+
+----+--------+
417+
2 rows selected (4.466 seconds)
418+
```
419+
420+
As shown in the above case, you can retrieve all the operation logs, the result schema, and the result to your client-side in the beeline console.
421+
422+
Additionally, some useful information about the background Flink SQL application associated with this connection is also printed in the operation log.
423+
For example, you can get the Flink web UI from the log for debugging or tuning.
424+
425+
![](../imgs/flink/flink_jobs_page.png)
426+
306427
### Closing a Connection
307428

308429
Close the session between beeline and Kyuubi server by executing `!quit`, for example,
@@ -338,4 +459,4 @@ Bye!
338459

339460
The `KyuubiServer` instance will be stopped immediately while the SQL engine's application will still be alive for a while.
340461

341-
If you start Kyuubi again before the SQL engine application terminates itself, it will reconnect to the newly created `KyuubiServer` instance.
462+
If you start Kyuubi again before the SQL engine application terminates itself, it will reconnect to the newly created `KyuubiServer` instance.

0 commit comments

Comments
 (0)