Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add missing docs for the coming version 1.1.1 #80

Merged
merged 3 commits into from
Sep 1, 2023

Conversation

banmoy
Copy link
Collaborator

@banmoy banmoy commented Aug 30, 2023

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
  • I have added documentation for my new feature or new function

Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
@@ -161,6 +165,7 @@ The following parameters apply to all three reading methods: Spark SQL, Spark Da
| starrocks.deserialize.arrow.async | false | Specifies whether to support asynchronously converting the Arrow memory format to RowBatches required for the iteration of the Spark connector. |
| starrocks.deserialize.queue.size | 64 | The size of the internal queue that holds tasks for asynchronously converting the Arrow memory format to RowBatches. This parameter is valid when `starrocks.deserialize.arrow.async` is set to `true`. |
| starrocks.filter.query | None | The condition based on which you want to filter data on StarRocks. You can specify multiple filter conditions, which must be joined by `and`. StarRocks filters the data from the StarRocks table based on the specified filter conditions before the data is read by Spark. |
| starrocks.timezone | Default timezone of JVM | Supported since 1.1.1. The timezone used to convert StarRocks `DATETIME` to Spark `TimestampType`. The default is the timezone of JVM returned by `ZoneId#systemDefault()`. The format could be a timezone name such as `Asia/Shanghai`, or a zone offset such as `+08:00`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这张参数说明的表是不是要补充一列 是否必选

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, it's better to have this column, and we can optimize it in another pr

| starrocks.fe.jdbc.url | YES | None | The address that is used to connect to the MySQL server of the FE. Format: `jdbc:mysql://<fe_host>:<fe_query_port>`. |
| starrocks.table.identifier | YES | None | The name of the StarRocks table. Format: `<database_name>.<table_name>`. |
| starrocks.user | YES | None | The username of your StarRocks cluster account. |
| starrocks.password | YES | None | The password of your StarRocks cluster account. |
| starrocks.write.label.prefix | NO | spark- | The label prefix used by Stream Load. |
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. This feature is supported in StarRocks v2.4 and later. |
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [the transactional interface of Stream Load](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. It requires StarRocks v2.5 or later. This feature could load more data in a transaction with less memory usage, and improve performance. <br/> **NOTICE:** Since 1.1.1, this option is only available if `starrocks.write.max.retries` is non-positive, that's, transaction stream load does not support retry. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是streamload事务接口 一次性发送的数据
截屏2023-09-01 下午2 18 58
Stream Load 事务接口允许在一个导入作业中按需合并发送多次小批量的数据后“提交事务”--这个是flink connector后台操作的吗?不需要用户设置吗?这里面遵循的逻辑是啥?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the connector will split the data in one load to multiple batches

### Upgrade from version 1.0.0 to version 1.1.0
### Upgrade from version 1.1.0 to 1.1.1

* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL JDBC driver 似乎一直用gpl协议为啥现在要用户自己手动下载,gpl限制是啥

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPL is an open source license, and has some limitations for the free use of the software. https://en.wikipedia.org/wiki/GNU_General_Public_License

* Since 1.1.1, the connector does not provide `mysql-connector-java` which is the official JDBC driver for MySQL. It uses GPL license which has some limitations.
The connector needs the JDBC driver to visit StarRocks for the metas of tables, so you need add the driver to the spark classpath manually. You can find the
driver on [MySQL site](https://dev.mysql.com/downloads/connector/j/) or [Maven Central](https://repo1.maven.org/maven2/mysql/mysql-connector-java/).
* Since 1.1.1, the connector will use general stream load by default rather than transaction stream load in version 1.1.0. If you want to go back to transaction stream load, you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥flink推荐用事务接口 但是spark推荐用非事务接口?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the feedback of the users, they think the retry is more important to make the spark stable, but transaction stream could not support retry, so disable it by default

hellolilyliuyi and others added 2 commits September 1, 2023 16:46
Signed-off-by: hellolilyliuyi <hellolilyliuyi123@163.com>
@hellolilyliuyi hellolilyliuyi merged commit 30b0fda into StarRocks:main Sep 1, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants