You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### _Why are the changes needed?_
Following #3406 , fixing spelling mistakes and adding new DDL usage for jdbc source in PySpark client docs.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3552 from bowenliang123/pyspark-docs-improve.
Closes#3406eb05a30 [Bowen Liang] add docs for using as JDBC Datasource table with DDL. and minor spelling fix.
Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit eb04c7f)
Signed-off-by: Cheng Pan <chengpan@apache.org>
For installation using Conda or manually downloading, please refer to [PySpark installation](https://spark.apache.org/docs/latest/api/python/getting_started/install.html).
33
33
34
-
## Preperation
34
+
## Preparation
35
35
36
36
37
37
### Prepare JDBC driver
@@ -46,15 +46,15 @@ Refer to docs of the driver and prepare the JDBC driver jar file.
46
46
47
47
### Prepare JDBC Hive Dialect extension
48
48
49
-
Hive Dialect support is requried by Spark for wraping SQL correctly and sending to JDBC driver. Kyuubi provides a JDBC dialect extension with auto regiested Hive Daliect support for Spark. Follow the instrunctions in [Hive Dialect Support](../../engines/spark/jdbc-dialect.html) to prepare the plugin jar file `kyuubi-extension-spark-jdbc-dialect_-*.jar`.
49
+
Hive Dialect support is required by Spark for wrapping SQL correctly and sending it to the JDBC driver. Kyuubi provides a JDBC dialect extension with auto-registered Hive Daliect support for Spark. Follow the instructions in [Hive Dialect Support](../../engines/spark/jdbc-dialect.html) to prepare the plugin jar file `kyuubi-extension-spark-jdbc-dialect_-*.jar`.
50
50
51
-
### Including jars of JDBC driver and Hive Dialect extention
51
+
### Including jars of JDBC driver and Hive Dialect extension
52
52
53
-
Choose one of following ways to include jar files to Spark.
53
+
Choose one of the following ways to include jar files in Spark.
54
54
55
55
- Put the jar file of JDBC driver and Hive Dialect to `$SPARK_HOME/jars` directory to make it visible for the classpath of PySpark. And adding `spark.sql.extensions = org.apache.spark.sql.dialect.KyuubiSparkJdbcDialectExtension` to `$SPARK_HOME/conf/spark_defaults.conf.`
56
56
57
-
- With spark's start shell, include JDBC driver when you submit the application with `--packages`, and the Hive Dialect plugins with `--jars`
57
+
- With spark's start shell, include the JDBC driver when submitting the application with `--packages`, and the Hive Dialect plugins with `--jars`
For further information about PySpark JDBC usage and options, please refer to Spark's [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html).
81
81
82
-
### Reading and Writing via JDBC data source
82
+
### Using as JDBC Datasource programmingly
83
83
84
84
```python
85
-
# Loading data from Kyuubi via HiveDriver as JDBC source
85
+
# Loading data from Kyuubi via HiveDriver as JDBC datasource
From Spark 3.2.0, [`CREATE DATASOURCE TABLE`](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-datasource.html) is supported to create jdbc source with SQL.
112
+
113
+
114
+
```python
115
+
# create JDBC Datasource table with DDL
116
+
spark.sql("""CREATE TABLE kyuubi_table USING JDBC
117
+
OPTIONS (
118
+
driver='org.apache.hive.jdbc.HiveDriver',
119
+
url='jdbc:hive2://kyuubi_server_ip:port',
120
+
user='user',
121
+
password='password',
122
+
dbtable='testdb.some_table'
123
+
)""")
124
+
125
+
# read data to dataframe
126
+
jdbcDF = spark.sql("SELECT * FROM kyuubi_table")
127
+
128
+
# write data from dataframe in overwrite mode
129
+
df.writeTo("kyuubi_table").overwrite
130
+
131
+
# write data from query
132
+
spark.sql("INSERT INTO kyuubi_table SELECT * FROM some_table")
133
+
```
134
+
109
135
110
136
### Use PySpark with Pandas
111
137
From PySpark 3.2.0, PySpark supports pandas API on Spark which allows you to scale your pandas workload out.
0 commit comments