-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Connector Hive] support hive savemode #6842
base: dev
Are you sure you want to change the base?
Conversation
@EricJoy2048 @dailai @ruanwenjun hi, guys. PTAL when you have time. |
@@ -66,7 +74,8 @@ public OptionRule optionRule() { | |||
|
|||
ReadonlyConfig finalReadonlyConfig = | |||
generateCurrentReadonlyConfig(readonlyConfig, catalogTable); | |||
return () -> new HiveSink(finalReadonlyConfig, catalogTable); | |||
CatalogTable finalCatalog = renameCatalogTable(finalReadonlyConfig, catalogTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with target hive sink table name, if not replace here, will pass source table name to hive.
like fake
to hive sink. so when use this catalog, will has issue, replaced here
String describeFormattedTableQuery = "describe formatted " + tablePath.getFullName(); | ||
try (PreparedStatement ps = connection.prepareStatement(describeFormattedTableQuery)) { | ||
ResultSet rs = ps.executeQuery(); | ||
return processResult(rs, tablePath, builder, partitionKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now hive table informaction is parse from the query result. That's not very elegant, but it work.....
.withValue( | ||
FIELD_DELIMITER.key(), | ||
ConfigValueFactory.fromAnyRef( | ||
parameters.get("field.delim"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line will has issue if the field.delim is \t
, the ConfigValueFactory.fromAnyRef
with replace it to \\t
. then the writted data will has issue.
大佬你好,创建的statement好像有点问题 |
是提交partition信息的语句吗 还是哪个语句? |
docs/en/connector-v2/sink/Hive.md
Outdated
@@ -33,7 +33,7 @@ By default, we use 2PC commit to ensure `exactly-once` | |||
| name | type | required | default value | | |||
|-------------------------------|---------|----------|----------------| | |||
| table_name | string | yes | - | | |||
| metastore_uri | string | yes | - | | |||
| hive_jdbc_url | string | yes | - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to be compatible with older versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it not compatible with old version, because i don't want both use hive2 jdbc and hive metastore. so i removed the metastore only use jdbc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it not compatible with old version, because i don't want both use hive2 jdbc and hive metastore. so i removed the metastore only use jdbc.
As an open source project we have to consider the compatibility of features, we know that many users are using Hive Connector, in order to be compatible with those older users, I think it is a better way to support both jdbc and metastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
but if we want use savemode feature only with hive metastore, is difficulty to create table.
like table format, bucket setting, table location etc. we need lots of parameter to config them.
so i want use sql template to let user define the template we can replace table name, columns in this template and run the sql to create table.
I can add metastore_url back, and it will make the code more easier. But user need both config jdbc
and thrift
on hive connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. but if we want use savemode feature only with hive metastore, is difficulty to create table. like table format, bucket setting, table location etc. we need lots of parameter to config them. so i want use sql template to let user define the template we can replace table name, columns in this template and run the sql to create table.
I can add metastore_url back, and it will make the code more easier. But user need both config
jdbc
andthrift
on hive connector.
Yes, you can add metastore_url
back and tell users save mode
only can be use when jdbc_url
is configed.
In the future, in seatunnel version 2.4.x, we can remove metastore_url configuration, and we can make some incompatible changes from 2.3.x to 2.4.x.
是的,当schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"时,save_mode_create_template 是必须的吗?当为 “” 时,报错为我描述的那个错误 |
是必须的, 这个是你表不存在时的要执行的建表语句 |
那如果我不知道source的表结构的话,是否该source-sink的conf就不能成立?这个和mysql的schema_save_mode 的配置实现的效果不一样吗? |
有些许的不一样 |
|
目前的代码还有几个问题:
|
Modify the hive documentation by referring to mysql.md |
991eb3b
to
d004328
Compare
docs/en/connector-v2/sink/Hive.md
Outdated
|
||
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. | ||
|
||
If you use SeaTunnel Engine, You need put `seatunnel-hadoop3-3.1.4-uber.jar` and `hive-exec-<hive_version>.jar` and `hive-jdbc-<hive_version>.jar` and `libfb303-0.9.3.jar` in $SEATUNNEL_HOME/lib/ dir. | ||
|
||
## Key features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key Features
docs/en/connector-v2/sink/Hive.md
Outdated
| abort_drop_partition_metadata | boolean | no | true | Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process). | | ||
| common-options | | no | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details | | ||
|
||
### schema_save_mode[Enum] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is already in options, just add more explanation
docs/en/connector-v2/source/Hive.md
Outdated
|
||
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. | ||
|
||
If you use SeaTunnel Engine, You need put `seatunnel-hadoop3-3.1.4-uber.jar` and `hive-exec-<hive_version>.jar` and `libfb303-0.9.3.jar` in $SEATUNNEL_HOME/lib/ dir. | ||
|
||
## Key features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
docs/en/connector-v2/source/Hive.md
Outdated
|
||
## Source Options | ||
|
||
| name | type | required | default value | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capitalize the first letter
@Override | ||
public boolean isExistsData(TablePath tablePath) { | ||
String tableName = tablePath.getFullName(); | ||
String sql = String.format("select * from %s limit 1;", tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use show create table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this method is to check whether this table has data.
if use show create table
, can't check has data or not. can only check table exist
problem 1, 2 solved |
代码更新了一版, 可以在不知道上游表结构的情况下, 使用变量的形式 运行时替换掉. |
Purpose of this pull request
subtask of #5390
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
release-note
.