-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add missing docs for the coming version 1.1.1 #80
Conversation
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
@@ -161,6 +165,7 @@ The following parameters apply to all three reading methods: Spark SQL, Spark Da | |||
| starrocks.deserialize.arrow.async | false | Specifies whether to support asynchronously converting the Arrow memory format to RowBatches required for the iteration of the Spark connector. | | |||
| starrocks.deserialize.queue.size | 64 | The size of the internal queue that holds tasks for asynchronously converting the Arrow memory format to RowBatches. This parameter is valid when `starrocks.deserialize.arrow.async` is set to `true`. | | |||
| starrocks.filter.query | None | The condition based on which you want to filter data on StarRocks. You can specify multiple filter conditions, which must be joined by `and`. StarRocks filters the data from the StarRocks table based on the specified filter conditions before the data is read by Spark. | | |||
| starrocks.timezone | Default timezone of JVM | Supported since 1.1.1. The timezone used to convert StarRocks `DATETIME` to Spark `TimestampType`. The default is the timezone of JVM returned by `ZoneId#systemDefault()`. The format could be a timezone name such as `Asia/Shanghai`, or a zone offset such as `+08:00`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这张参数说明的表是不是要补充一列 是否必选
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, it's better to have this column, and we can optimize it in another pr
docs/connector-write.md
Outdated
| starrocks.fe.jdbc.url | YES | None | The address that is used to connect to the MySQL server of the FE. Format: `jdbc:mysql://<fe_host>:<fe_query_port>`. | | ||
| starrocks.table.identifier | YES | None | The name of the StarRocks table. Format: `<database_name>.<table_name>`. | | ||
| starrocks.user | YES | None | The username of your StarRocks cluster account. | | ||
| starrocks.password | YES | None | The password of your StarRocks cluster account. | | ||
| starrocks.write.label.prefix | NO | spark- | The label prefix used by Stream Load. | | ||
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. This feature is supported in StarRocks v2.4 and later. | | ||
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. It requires StarRocks v2.5 or later. This feature could load more data in a transaction with less memory usage, and improve performance. <br/> **NOTICE:** Since 1.1.1, this option is only available if `starrocks.write.max.retries` is non-positive, that's, transaction stream load does not support retry. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the connector will split the data in one load to multiple batches
docs/connector-read.md
Outdated
### Upgrade from version 1.0.0 to version 1.1.0 | ||
### Upgrade from version 1.1.0 to 1.1.1 | ||
|
||
* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL JDBC driver 似乎一直用gpl协议为啥现在要用户自己手动下载,gpl限制是啥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPL is an open source license, and has some limitations for the free use of the software. https://en.wikipedia.org/wiki/GNU_General_Public_License
docs/connector-write.md
Outdated
* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations. | ||
The connector needs the JDBC driver to visit StarRocks for the metas of tables, so you need add the driver to the spark classpath manually. You can find the | ||
driver on [MySQL site](https://dev.mysql.com/downloads/connector/j/) or [Maven Central](https://repo1.maven.org/maven2/mysql/mysql-connector-java/). | ||
* Since 1.1.1, the connector will use general stream load by default rather than transaction stream load in version 1.1.0. If you want to go back to transaction stream load, you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥flink推荐用事务接口 但是spark推荐用非事务接口?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the feedback of the users, they think the retry is more important to make the spark stable, but transaction stream could not support retry, so disable it by default
Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com>
What type of PR is this:
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
Checklist: