[Doc] Add missing docs for the coming version 1.1.1 #80

banmoy · 2023-08-30T09:10:16Z

What type of PR is this：

Which issues of this PR fixes ：

Fixes #

Problem Summary(Required) ：

Checklist:

I have added test cases for my bug fix or my new feature
This pr will affect users' behaviors
This pr needs user documentation (for new or modified features or behaviors)
I have added documentation for my new feature or new function

Signed-off-by: PengFei Li <lpengfei2016@gmail.com>

hellolilyliuyi · 2023-09-01T03:01:57Z

docs/connector-read.md

@@ -161,6 +165,7 @@ The following parameters apply to all three reading methods: Spark SQL, Spark Da
 | starrocks.deserialize.arrow.async    | false             | Specifies whether to support asynchronously converting the Arrow memory format to RowBatches required for the iteration of the Spark connector. |
 | starrocks.deserialize.queue.size     | 64                | The size of the internal queue that holds tasks for asynchronously converting the Arrow memory format to RowBatches. This parameter is valid when `starrocks.deserialize.arrow.async` is set to `true`. |
 | starrocks.filter.query               | None              | The condition based on which you want to filter data on StarRocks. You can specify multiple filter conditions, which must be joined by `and`. StarRocks filters the data from the StarRocks table based on the specified filter conditions before the data is read by Spark. |
+| starrocks.timezone                   | Default timezone of JVM | Supported since 1.1.1. The timezone used to convert StarRocks `DATETIME` to Spark `TimestampType`. The default is the timezone of JVM returned by `ZoneId#systemDefault()`. The format could be a timezone name such as `Asia/Shanghai`, or a zone offset such as `+08:00`. |


这张参数说明的表是不是要补充一列是否必选

emm, it's better to have this column, and we can optimize it in another pr

hellolilyliuyi · 2023-09-01T06:20:01Z

docs/connector-write.md

 | starrocks.fe.jdbc.url                          | YES      | None          | The address that is used to connect to the MySQL server of the FE. Format: `jdbc:mysql://<fe_host>:<fe_query_port>`. |
 | starrocks.table.identifier                     | YES      | None          | The name of the StarRocks table. Format: `<database_name>.<table_name>`. |
 | starrocks.user                                 | YES      | None          | The username of your StarRocks cluster account.              |
 | starrocks.password                             | YES      | None          | The password of your StarRocks cluster account.              |
 | starrocks.write.label.prefix                   | NO       | spark-        | The label prefix used by Stream Load.                        |
-| starrocks.write.enable.transaction-stream-load | NO       | TRUE          | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. This feature is supported in StarRocks v2.4 and later. |
+| starrocks.write.enable.transaction-stream-load | NO       | TRUE          | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. It requires StarRocks v2.5 or later. This feature could load more data in a transaction with less memory usage, and improve performance. <br/> **NOTICE:** Since 1.1.1, this option is only available if `starrocks.write.max.retries` is non-positive, that's, transaction stream load does not support retry.                                                                                                                                                                                                                                 |


如果是streamload事务接口一次性发送的数据

Stream Load 事务接口允许在一个导入作业中按需合并发送多次小批量的数据后“提交事务”--这个是flink connector后台操作的吗？不需要用户设置吗？这里面遵循的逻辑是啥？

yes, the connector will split the data in one load to multiple batches

hellolilyliuyi · 2023-09-01T06:30:58Z

docs/connector-read.md

-### Upgrade from version 1.0.0 to version 1.1.0
+### Upgrade from version 1.1.0 to 1.1.1
+
+* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations.


MySQL JDBC driver 似乎一直用gpl协议为啥现在要用户自己手动下载，gpl限制是啥

GPL is an open source license, and has some limitations for the free use of the software. https://en.wikipedia.org/wiki/GNU_General_Public_License

hellolilyliuyi · 2023-09-01T06:33:07Z

docs/connector-write.md

+* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations.
+  The connector needs the JDBC driver to visit StarRocks for the metas of tables, so you need add the driver to the spark classpath manually. You can find the
+  driver on [MySQL site](https://dev.mysql.com/downloads/connector/j/) or [Maven Central](https://repo1.maven.org/maven2/mysql/mysql-connector-java/).
+* Since 1.1.1, the connector will use general stream load by default rather than transaction stream load in version 1.1.0. If you want to go back to transaction stream load, you


为啥flink推荐用事务接口但是spark推荐用非事务接口？

from the feedback of the users, they think the retry is more important to make the spark stable, but transaction stream could not support retry, so disable it by default

Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com>

[Doc] Add missing docs for the coming version 1.1.1

dd31734

Signed-off-by: PengFei Li <lpengfei2016@gmail.com>

banmoy force-pushed the spark_doc branch from 8ed126c to dd31734 Compare August 30, 2023 10:14

hellolilyliuyi reviewed Sep 1, 2023

View reviewed changes

hellolilyliuyi and others added 2 commits September 1, 2023 16:46

Update connector-write.md

69f619d

edit language

823bc55

Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com>

hellolilyliuyi merged commit 30b0fda into StarRocks:main Sep 1, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add missing docs for the coming version 1.1.1 #80

[Doc] Add missing docs for the coming version 1.1.1 #80

banmoy commented Aug 30, 2023

hellolilyliuyi Sep 1, 2023

banmoy Sep 1, 2023

hellolilyliuyi Sep 1, 2023

banmoy Sep 1, 2023

hellolilyliuyi Sep 1, 2023

banmoy Sep 1, 2023

hellolilyliuyi Sep 1, 2023

banmoy Sep 1, 2023

[Doc] Add missing docs for the coming version 1.1.1 #80

[Doc] Add missing docs for the coming version 1.1.1 #80

Conversation

banmoy commented Aug 30, 2023

What type of PR is this：

Which issues of this PR fixes ：

Problem Summary(Required) ：

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment