Skip to content

Commit da87ca5

Browse files
zhouyifan279yaooqinn
authored andcommitted
[KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB/TiKV
### _Why are the changes needed?_ Close #3154 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #3155 from zhouyifan279/3154. Closes #3154 682aaf5 [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB 4301ca4 [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB 65acabe [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiSpark Authored-by: zhouyifan279 <zhouyifan279@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>
1 parent 60cb4bd commit da87ca5

File tree

3 files changed

+104
-37
lines changed

3 files changed

+104
-37
lines changed

docs/connector/spark/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,6 @@ purpose.
3737
iceberg
3838
kudu
3939
flink_table_store
40-
tispark
40+
tidb
4141
tpcds
4242
tpch

docs/connector/spark/tidb.rst

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one or more
2+
contributor license agreements. See the NOTICE file distributed with
3+
this work for additional information regarding copyright ownership.
4+
The ASF licenses this file to You under the Apache License, Version 2.0
5+
(the "License"); you may not use this file except in compliance with
6+
the License. You may obtain a copy of the License at
7+
8+
.. http://www.apache.org/licenses/LICENSE-2.0
9+
10+
.. Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
16+
`TiDB`_
17+
==========
18+
19+
TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing
20+
(HTAP) workloads.
21+
22+
TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer complex OLAP
23+
queries. It enjoys the merits of both the Spark platform and the distributed clusters
24+
of TiKV while seamlessly integrated to TiDB to provide one-stop HTAP solutions for online
25+
transactions and analyses.
26+
27+
.. tip::
28+
This article assumes that you have mastered the basic knowledge and operation of TiDB and TiSpark.
29+
For the knowledge not mentioned in this article, you can obtain it from TiDB `Official Documentation`_.
30+
31+
By using kyuubi, we can run SQL queries towards TiDB/TiKV which is more
32+
convenient, easy to understand, and easy to expand than directly using
33+
spark to manipulate TiDB/TiKV.
34+
35+
TiDB Integration
36+
-------------------
37+
38+
To enable the integration of kyuubi spark sql engine and TiDB through
39+
Apache Spark Datasource V2 and Catalog APIs, you need to:
40+
41+
- Referencing the TiSpark :ref:`dependencies`
42+
- Setting the spark extension and catalog :ref:`configurations`
43+
44+
.. _dependencies:
45+
46+
Dependencies
47+
************
48+
The classpath of kyuubi spark sql engine with TiDB supported consists of
49+
50+
1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
51+
2. a copy of spark distribution
52+
3. tispark-assembly-<spark.version>_<scala.version>-<tispark.version>.jar (example: tispark-assembly-3.2_2.12-3.0.1.jar), which can be found in the `Maven Central`_
53+
54+
In order to make the TiSpark packages visible for the runtime classpath of engines, we can use one of these methods:
55+
56+
1. Put the TiSpark packages into ``$SPARK_HOME/jars`` directly
57+
2. Set ``spark.jars=/path/to/tispark-assembly``
58+
59+
.. warning::
60+
Please mind the compatibility of different TiDB, TiSpark and Spark versions, which can be confirmed on the page of `TiSpark Environment setup`_.
61+
62+
.. _configurations:
63+
64+
Configurations
65+
**************
66+
67+
To activate functionality of TiSpark, we can set the following configurations:
68+
69+
.. code-block:: properties
70+
71+
spark.tispark.pd.addresses $pd_host:$pd_port
72+
spark.sql.extensions org.apache.spark.sql.TiExtensions
73+
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
74+
spark.sql.catalog.tidb_catalog.pd.addresses $pd_host:$pd_port
75+
76+
The `spark.tispark.pd.addresses` and `spark.sql.catalog.tidb_catalog.pd.addresses` configurations
77+
allow you to put in multiple PD servers. Specify the port number for each of them.
78+
79+
For example, when you have multiple PD servers on `10.16.20.1,10.16.20.2,10.16.20.3` with the port `2379`,
80+
put it as `10.16.20.1:2379,10.16.20.2:2379,10.16.20.3:2379`.
81+
82+
TiDB Operations
83+
------------------
84+
85+
Taking ``SELECT`` as a example,
86+
87+
.. code-block:: sql
88+
89+
SELECT * FROM foo;
90+
91+
Taking ``DELETE FROM`` as a example, Spark 3 added support for DELETE FROM queries to remove data from tables.
92+
93+
.. code-block:: sql
94+
95+
DELETE FROM foo WHERE id >= 1 and id < 2;
96+
97+
.. note::
98+
As for now (TiSpark 3.0.1), TiSpark does not support ``CREATE TABLE``, ``INSERT INTO/OVERWRITE`` operations
99+
through Apache Spark Datasource V2 and Catalog APIs.
100+
101+
.. _Official Documentation: https://docs.pingcap.com/tidb/stable/overview
102+
.. _Maven Central: https://repo1.maven.org/maven2/com/pingcap/tispark/
103+
.. _TiSpark Environment setup: https://docs.pingcap.com/tidb/stable/tispark-overview#environment-setup

docs/connector/spark/tispark.rst

Lines changed: 0 additions & 36 deletions
This file was deleted.

0 commit comments

Comments
 (0)