Skip to content

Commit f1312ea

Browse files
a49ayaooqinn
authored andcommitted
[KYUUBI #3068][DOC] Add the Hudi connector doc for Spark SQL Query Engine
### _Why are the changes needed?_ Add the Hudi connector doc for Spark SQL Query Engine ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #3099 from deadwind4/hudi-spark-doc. Closes #3068 fcd2cf6 [Luning Wang] update doc 0ee870d [Luning Wang] [KYUUBI #3068][DOC] Add the Hudi connector doc for Spark SQL Query Engine Authored-by: Luning Wang <wang4luning@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>
1 parent 4b640b7 commit f1312ea

File tree

1 file changed

+77
-1
lines changed

1 file changed

+77
-1
lines changed

docs/connector/spark/hudi.rst

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,97 @@
1616
`Hudi`_
1717
========
1818

19+
Apache Hudi (pronounced “hoodie”) is the next generation streaming data lake platform.
20+
Apache Hudi brings core warehouse and database functionality directly to a data lake.
21+
22+
.. tip::
23+
This article assumes that you have mastered the basic knowledge and operation of `Hudi`_.
24+
For the knowledge about Hudi not mentioned in this article,
25+
you can obtain it from its `Official Documentation`_.
26+
27+
By using Kyuubi, we can run SQL queries towards Hudi which is more convenient, easy to understand,
28+
and easy to expand than directly using Spark to manipulate Hudi.
29+
1930
Hudi Integration
2031
----------------
2132

33+
To enable the integration of kyuubi spark sql engine and Hudi through
34+
Catalog APIs, you need to:
35+
36+
- Referencing the Hudi :ref:`dependencies`
37+
- Setting the Spark extension and catalog :ref:`configurations`
38+
2239
.. _dependencies:
2340

2441
Dependencies
2542
************
2643

44+
The **classpath** of kyuubi spark sql engine with Hudi supported consists of
45+
46+
1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
47+
2. a copy of spark distribution
48+
3. hudi-spark<spark.version>-bundle_<scala.version>-<hudi.version>.jar (example: hudi-spark3.2-bundle_2.12-0.11.1.jar), which can be found in the `Maven Central`_
49+
50+
In order to make the Hudi packages visible for the runtime classpath of engines, we can use one of these methods:
51+
52+
1. Put the Hudi packages into ``$SPARK_HOME/jars`` directly
53+
2. Set ``spark.jars=/path/to/hudi-spark-bundle``
54+
2755
.. _configurations:
2856

2957
Configurations
3058
**************
3159

60+
To activate functionality of Hudi, we can set the following configurations:
61+
62+
.. code-block:: properties
63+
# Spark 3.2
64+
spark.serializer=org.apache.spark.serializer.KryoSerializer
65+
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
66+
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
67+
68+
# Spark 3.1
69+
spark.serializer=org.apache.spark.serializer.KryoSerializer
70+
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
3271
3372
Hudi Operations
3473
---------------
3574

36-
.. _Hudi: https://hudi.apache.org/
75+
Taking ``Create Table`` as a example,
76+
77+
.. code-block:: sql
78+
79+
CREATE TABLE hudi_cow_nonpcf_tbl (
80+
uuid INT,
81+
name STRING,
82+
price DOUBLE
83+
) USING HUDI;
84+
85+
Taking ``Query Data`` as a example,
86+
87+
.. code-block:: sql
88+
89+
SELECT * FROM hudi_cow_nonpcf_tbl WHERE id < 20;
90+
91+
Taking ``Insert Data`` as a example,
92+
93+
.. code-block:: sql
94+
95+
INSERT INTO hudi_cow_nonpcf_tbl SELECT 1, 'a1', 20;
96+
97+
98+
Taking ``Update Data`` as a example,
99+
100+
.. code-block:: sql
101+
102+
UPDATE hudi_cow_nonpcf_tbl SET name = 'foo', price = price * 2 WHERE id = 1;
103+
104+
Taking ``Delete Data`` as a example,
105+
106+
.. code-block:: sql
107+
108+
DELETE FROM hudi_cow_nonpcf_tbl WHERE uuid = 1;
109+
110+
.. _Hudi: https://hudi.apache.org/
111+
.. _Official Documentation: https://hudi.apache.org/docs/overview
112+
.. _Maven Central: https://mvnrepository.com/artifact/org.apache.hudi

0 commit comments

Comments
 (0)