|
| 1 | +.. Licensed to the Apache Software Foundation (ASF) under one or more |
| 2 | + contributor license agreements. See the NOTICE file distributed with |
| 3 | + this work for additional information regarding copyright ownership. |
| 4 | + The ASF licenses this file to You under the Apache License, Version 2.0 |
| 5 | + (the "License"); you may not use this file except in compliance with |
| 6 | + the License. You may obtain a copy of the License at |
| 7 | +
|
| 8 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +.. Unless required by applicable law or agreed to in writing, software |
| 11 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | + See the License for the specific language governing permissions and |
| 14 | + limitations under the License. |
| 15 | +
|
| 16 | +`TiDB`_ |
| 17 | +========== |
| 18 | + |
| 19 | +TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing |
| 20 | +(HTAP) workloads. |
| 21 | + |
| 22 | +TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer complex OLAP |
| 23 | +queries. It enjoys the merits of both the Spark platform and the distributed clusters |
| 24 | +of TiKV while seamlessly integrated to TiDB to provide one-stop HTAP solutions for online |
| 25 | +transactions and analyses. |
| 26 | + |
| 27 | +.. tip:: |
| 28 | + This article assumes that you have mastered the basic knowledge and operation of TiDB and TiSpark. |
| 29 | + For the knowledge not mentioned in this article, you can obtain it from TiDB `Official Documentation`_. |
| 30 | + |
| 31 | +By using kyuubi, we can run SQL queries towards TiDB/TiKV which is more |
| 32 | +convenient, easy to understand, and easy to expand than directly using |
| 33 | +spark to manipulate TiDB/TiKV. |
| 34 | + |
| 35 | +TiDB Integration |
| 36 | +------------------- |
| 37 | + |
| 38 | +To enable the integration of kyuubi spark sql engine and TiDB through |
| 39 | +Apache Spark Datasource V2 and Catalog APIs, you need to: |
| 40 | + |
| 41 | +- Referencing the TiSpark :ref:`dependencies` |
| 42 | +- Setting the spark extension and catalog :ref:`configurations` |
| 43 | + |
| 44 | +.. _dependencies: |
| 45 | + |
| 46 | +Dependencies |
| 47 | +************ |
| 48 | +The classpath of kyuubi spark sql engine with TiDB supported consists of |
| 49 | + |
| 50 | +1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions |
| 51 | +2. a copy of spark distribution |
| 52 | +3. tispark-assembly-<spark.version>_<scala.version>-<tispark.version>.jar (example: tispark-assembly-3.2_2.12-3.0.1.jar), which can be found in the `Maven Central`_ |
| 53 | + |
| 54 | +In order to make the TiSpark packages visible for the runtime classpath of engines, we can use one of these methods: |
| 55 | + |
| 56 | +1. Put the TiSpark packages into ``$SPARK_HOME/jars`` directly |
| 57 | +2. Set ``spark.jars=/path/to/tispark-assembly`` |
| 58 | + |
| 59 | +.. warning:: |
| 60 | + Please mind the compatibility of different TiDB, TiSpark and Spark versions, which can be confirmed on the page of `TiSpark Environment setup`_. |
| 61 | + |
| 62 | +.. _configurations: |
| 63 | + |
| 64 | +Configurations |
| 65 | +************** |
| 66 | + |
| 67 | +To activate functionality of TiSpark, we can set the following configurations: |
| 68 | + |
| 69 | +.. code-block:: properties |
| 70 | +
|
| 71 | + spark.tispark.pd.addresses $pd_host:$pd_port |
| 72 | + spark.sql.extensions org.apache.spark.sql.TiExtensions |
| 73 | + spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog |
| 74 | + spark.sql.catalog.tidb_catalog.pd.addresses $pd_host:$pd_port |
| 75 | +
|
| 76 | +The `spark.tispark.pd.addresses` and `spark.sql.catalog.tidb_catalog.pd.addresses` configurations |
| 77 | +allow you to put in multiple PD servers. Specify the port number for each of them. |
| 78 | + |
| 79 | +For example, when you have multiple PD servers on `10.16.20.1,10.16.20.2,10.16.20.3` with the port `2379`, |
| 80 | +put it as `10.16.20.1:2379,10.16.20.2:2379,10.16.20.3:2379`. |
| 81 | + |
| 82 | +TiDB Operations |
| 83 | +------------------ |
| 84 | + |
| 85 | +Taking ``SELECT`` as a example, |
| 86 | + |
| 87 | +.. code-block:: sql |
| 88 | +
|
| 89 | + SELECT * FROM foo; |
| 90 | +
|
| 91 | +Taking ``DELETE FROM`` as a example, Spark 3 added support for DELETE FROM queries to remove data from tables. |
| 92 | + |
| 93 | +.. code-block:: sql |
| 94 | +
|
| 95 | + DELETE FROM foo WHERE id >= 1 and id < 2; |
| 96 | +
|
| 97 | +.. note:: |
| 98 | + As for now (TiSpark 3.0.1), TiSpark does not support ``CREATE TABLE``, ``INSERT INTO/OVERWRITE`` operations |
| 99 | + through Apache Spark Datasource V2 and Catalog APIs. |
| 100 | + |
| 101 | +.. _Official Documentation: https://docs.pingcap.com/tidb/stable/overview |
| 102 | +.. _Maven Central: https://repo1.maven.org/maven2/com/pingcap/tispark/ |
| 103 | +.. _TiSpark Environment setup: https://docs.pingcap.com/tidb/stable/tispark-overview#environment-setup |
0 commit comments