|
16 | 16 | `Hudi`_
|
17 | 17 | ========
|
18 | 18 |
|
| 19 | +Apache Hudi (pronounced “hoodie”) is the next generation streaming data lake platform. |
| 20 | +Apache Hudi brings core warehouse and database functionality directly to a data lake. |
| 21 | + |
| 22 | +.. tip:: |
| 23 | + This article assumes that you have mastered the basic knowledge and operation of `Hudi`_. |
| 24 | + For the knowledge about Hudi not mentioned in this article, |
| 25 | + you can obtain it from its `Official Documentation`_. |
| 26 | + |
| 27 | +By using Kyuubi, we can run SQL queries towards Hudi which is more convenient, easy to understand, |
| 28 | +and easy to expand than directly using Spark to manipulate Hudi. |
| 29 | + |
19 | 30 | Hudi Integration
|
20 | 31 | ----------------
|
21 | 32 |
|
| 33 | +To enable the integration of kyuubi spark sql engine and Hudi through |
| 34 | +Catalog APIs, you need to: |
| 35 | + |
| 36 | +- Referencing the Hudi :ref:`dependencies` |
| 37 | +- Setting the Spark extension and catalog :ref:`configurations` |
| 38 | + |
22 | 39 | .. _dependencies:
|
23 | 40 |
|
24 | 41 | Dependencies
|
25 | 42 | ************
|
26 | 43 |
|
| 44 | +The **classpath** of kyuubi spark sql engine with Hudi supported consists of |
| 45 | + |
| 46 | +1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions |
| 47 | +2. a copy of spark distribution |
| 48 | +3. hudi-spark<spark.version>-bundle_<scala.version>-<hudi.version>.jar (example: hudi-spark3.2-bundle_2.12-0.11.1.jar), which can be found in the `Maven Central`_ |
| 49 | + |
| 50 | +In order to make the Hudi packages visible for the runtime classpath of engines, we can use one of these methods: |
| 51 | + |
| 52 | +1. Put the Hudi packages into ``$SPARK_HOME/jars`` directly |
| 53 | +2. Set ``spark.jars=/path/to/hudi-spark-bundle`` |
| 54 | + |
27 | 55 | .. _configurations:
|
28 | 56 |
|
29 | 57 | Configurations
|
30 | 58 | **************
|
31 | 59 |
|
| 60 | +To activate functionality of Hudi, we can set the following configurations: |
| 61 | + |
| 62 | +.. code-block:: properties |
| 63 | + # Spark 3.2 |
| 64 | + spark.serializer=org.apache.spark.serializer.KryoSerializer |
| 65 | + spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension |
| 66 | + spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog |
| 67 | +
|
| 68 | + # Spark 3.1 |
| 69 | + spark.serializer=org.apache.spark.serializer.KryoSerializer |
| 70 | + spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension |
32 | 71 |
|
33 | 72 | Hudi Operations
|
34 | 73 | ---------------
|
35 | 74 |
|
36 |
| -.. _Hudi: https://hudi.apache.org/ |
| 75 | +Taking ``Create Table`` as a example, |
| 76 | + |
| 77 | +.. code-block:: sql |
| 78 | +
|
| 79 | + CREATE TABLE hudi_cow_nonpcf_tbl ( |
| 80 | + uuid INT, |
| 81 | + name STRING, |
| 82 | + price DOUBLE |
| 83 | + ) USING HUDI; |
| 84 | +
|
| 85 | +Taking ``Query Data`` as a example, |
| 86 | + |
| 87 | +.. code-block:: sql |
| 88 | +
|
| 89 | + SELECT * FROM hudi_cow_nonpcf_tbl WHERE id < 20; |
| 90 | +
|
| 91 | +Taking ``Insert Data`` as a example, |
| 92 | + |
| 93 | +.. code-block:: sql |
| 94 | +
|
| 95 | + INSERT INTO hudi_cow_nonpcf_tbl SELECT 1, 'a1', 20; |
| 96 | +
|
| 97 | +
|
| 98 | +Taking ``Update Data`` as a example, |
| 99 | + |
| 100 | +.. code-block:: sql |
| 101 | +
|
| 102 | + UPDATE hudi_cow_nonpcf_tbl SET name = 'foo', price = price * 2 WHERE id = 1; |
| 103 | +
|
| 104 | +Taking ``Delete Data`` as a example, |
| 105 | + |
| 106 | +.. code-block:: sql |
| 107 | +
|
| 108 | + DELETE FROM hudi_cow_nonpcf_tbl WHERE uuid = 1; |
| 109 | +
|
| 110 | +.. _Hudi: https://hudi.apache.org/ |
| 111 | +.. _Official Documentation: https://hudi.apache.org/docs/overview |
| 112 | +.. _Maven Central: https://mvnrepository.com/artifact/org.apache.hudi |
0 commit comments