Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] jdbc support dialect.name to choose dialect. #7294

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

liunaijie
Copy link
Member

@liunaijie liunaijie commented Aug 1, 2024

Purpose of this pull request

close #7293

Does this PR introduce any user-facing change?

How was this patch tested?

add new test

Check list

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -60,6 +60,7 @@ supports query SQL and can achieve projection effect.
| table_list | Array | No | - | The list of tables to be read, you can use this configuration instead of `table_path` |
| where_condition | String | No | - | Common row filter conditions for all tables/queries, must start with `where`. for example `where id > 100` |
| split.size | Int | No | 8096 | How many rows in one split, captured tables are split into multiple splits when read of table. |
| split.hash-function | String | No | - | Which hash function use to split. <br/> default is `MD5`, oracle is `ORA_HASH`, Postgres is `HASHTEXT`.<br/> if the default function is not work, you can use this parameter to customize the hash function. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| split.hash-function | String | No | - | Which hash function use to split. <br/> default is `MD5`, oracle is `ORA_HASH`, Postgres is `HASHTEXT`.<br/> if the default function is not work, you can use this parameter to customize the hash function. | |
| split.hash-function | String | No | - | Which hash function use to split. <br/> Default is `MD5`, Oracle is `ORA_HASH`, PostgreSQL is `HASHTEXT`.<br/> If the default function is not work, you can use this parameter to customize the hash function. | |

Hisoka-X
Hisoka-X previously approved these changes Aug 6, 2024
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Hisoka-X
Hisoka-X previously approved these changes Aug 6, 2024
@hailin0
Copy link
Member

hailin0 commented Aug 6, 2024

Why not separate the dialect from the driver?

cc @Hisoka-X

Jdbc {
    driver = com.mysql.cj.jdbc.Driver
    url = "jdbc:mysql://e2e_starRocksdb:9030"
    
    dialect = "starrocks"
}

@liunaijie
Copy link
Member Author

Why not separate the dialect from the driver?

cc @Hisoka-X

Jdbc {
    driver = com.mysql.cj.jdbc.Driver
    url = "jdbc:mysql://e2e_starRocksdb:9030"
    
    dialect = "starrocks"
}

Now the dialect is parse from jdbc url, mysql and starrocks both use mysql driver jdbc:mysql, so will use same mysql dialect.

@Hisoka-X
Copy link
Member

Hisoka-X commented Aug 7, 2024

Why not separate the dialect from the driver?

cc @Hisoka-X

Jdbc {
    driver = com.mysql.cj.jdbc.Driver
    url = "jdbc:mysql://e2e_starRocksdb:9030"
    
    dialect = "starrocks"
}

I prefer adding the split.hash-function parameter because it gives users more options.

@hailin0
Copy link
Member

hailin0 commented Aug 7, 2024

We often use a certain client to access a variety of different database servers, which have many differences. Dialects can cover more differences.

e.g:
mysql client -> doris,starrocks,ocenbase,...
pg client -> ...

@Hisoka-X
Copy link
Member

Hisoka-X commented Aug 7, 2024

We often use a certain client to access a variety of different database servers, which have many differences. Dialects can cover more differences.

e.g: mysql client -> doris,starrocks,ocenbase,... pg client -> ...

I get it. Make sense to me.

@liunaijie
Copy link
Member Author

Why not separate the dialect from the driver?

cc @Hisoka-X

Jdbc {
    driver = com.mysql.cj.jdbc.Driver
    url = "jdbc:mysql://e2e_starRocksdb:9030"
    
    dialect = "starrocks"
}

updated, has add a parameter dialect.name, when choose dialect, will both check jdbc url and name.

@@ -60,6 +60,7 @@ supports query SQL and can achieve projection effect.
| table_list | Array | No | - | The list of tables to be read, you can use this configuration instead of `table_path` |
| where_condition | String | No | - | Common row filter conditions for all tables/queries, must start with `where`. for example `where id > 100` |
| split.size | Int | No | 8096 | How many rows in one split, captured tables are split into multiple splits when read of table. |
| split.hash-function | String | No | - | Which hash function use to split. <br/> Default is `MD5`, Oracle is `ORA_HASH`, PostgreSQL is `HASHTEXT`.<br/> If the default function is not work, you can use this parameter to customize the hash function. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back, is it necessary to add this parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will remove this parameter

@liunaijie liunaijie force-pushed the feature/jdbc_split_customize_function branch from e689924 to 214d5f9 Compare August 9, 2024 07:10
}

@Override
public String hashModForField(String functionName, String fieldName, int mod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public String hashModForField(String functionName, String fieldName, int mod) {
public String hashModForField(String nativeType, String fieldName, int mod) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -62,7 +63,9 @@ public static JdbcDialect load(String url, String compatibleMode, String fieldId
}

final List<JdbcDialectFactory> matchingFactories =
foundFactories.stream().filter(f -> f.acceptsURL(url)).collect(Collectors.toList());
foundFactories.stream()
.filter(f -> f.acceptsURL(url) && f.getDialectName().equals(dialectName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore case?


@Override
public String getDialectName() {
return "starrocks";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "starrocks";
return DatabaseIdentifier.STARROCKS;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Hisoka-X Hisoka-X changed the title [Improvement] jdbc split support customize function [Improvement] jdbc support dialect.name to choose dialect. Aug 12, 2024
Comment on lines 66 to 69
public static Map<TablePath, JdbcSourceTable> getTables(
JdbcConnectionConfig jdbcConnectionConfig, List<JdbcSourceTableConfig> tablesConfig)
JdbcConnectionConfig jdbcConnectionConfig,
List<JdbcSourceTableConfig> tablesConfig,
JdbcSourceConfig sourceConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdbcSourceConfig sourceConfig

change to

String dialectName

since we only use this parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

Comment on lines 38 to 43
Option<String> DIALECT_NAME =
Options.key("dialect.name")
.stringType()
.defaultValue("default")
.withDescription(
"The dialect name, when use same jdbc drive, use this parameter to choose the special implement.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I use this config will not work right?

Jdbc {
    driver = com.mysql.cj.jdbc.Driver
    url = "jdbc:mysql://mysql:9030"
    user = root
    password = ""
    query = "select * from `test`.`e2e_table_source`"
    partition_column = "STRING_COL"
    dialect.name = "mysql"
  }

It's weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we support dialect.name = "mysql" then this parameter is required to mysql connector, it will impact the current user.

and i think this feature is the special requirement, use same jdbc driver, and the default dialect can not support some feature, it can use this parameter to choose the corresponding dialect implement.

In this case, only those user need this feature need add this parameter

@liunaijie liunaijie force-pushed the feature/jdbc_split_customize_function branch from d4713ea to d5f3ac7 Compare August 13, 2024 06:59
@github-actions github-actions bot removed the document label Aug 13, 2024
@liunaijie liunaijie force-pushed the feature/jdbc_split_customize_function branch 3 times, most recently from 6b62ed5 to a71f906 Compare August 14, 2024 00:47
@@ -39,6 +41,9 @@ public JdbcDialect create() {

@Override
public JdbcDialect create(@Nonnull String compatibleMode, String fieldIde) {
if (DatabaseIdentifier.STARROCKS.equalsIgnoreCase(compatibleMode)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost forgot! We have compatibleMode to do such thing.

@liunaijie liunaijie force-pushed the feature/jdbc_split_customize_function branch from a71f906 to 25d1b0f Compare August 14, 2024 03:05
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @liunaijie

@hailin0 hailin0 merged commit b5140f5 into apache:dev Aug 14, 2024
7 checks passed
@liunaijie liunaijie deleted the feature/jdbc_split_customize_function branch August 15, 2024 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [JDBC] use mysql source extract starrocks split function has issue
3 participants