Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](sql-dialect) support multiple servers of dialect sql converter #36674

Closed

Conversation

Nicholas-ehui
Copy link

@Nicholas-ehui Nicholas-ehui commented Jun 21, 2024

Add a new FE Config dialect_converter_services.
If this config is set, and the session variable enableMultiDialectConvertService is set,
Doris will poll using multiple sql converter services to convert user input sql to
specified sql dialect. but if the session variable sql_converter_service_url is set at the same time, use session variable sql_converter_service_url. eg:

set fe.conf:
dialect_converter_services = 192.168.1.13:5001;192.168.1.13:5002;192.168.1.13:5003;192.168.1.13:5004

mysql> UNSET GLOBAL VARIABLE sql_converter_service_url;
Query OK, 0 rows affected (0.07 sec)

mysql> UNSET VARIABLE enable_multi_dialect_convert_service;
Query OK, 0 rows affected (0.01 sec)

mysql> set sql_dialect=presto;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT cast(start_time as varchar(20)) as col1,
    ->             array_distinct(arr_int) as col2,
    ->             FILTER(arr_str, x -> x LIKE '%World%') as col3,
    ->             to_date(value,'%Y-%m-%d') as col4,
    ->             YEAR(start_time) as col5,
    ->             date_add('month', 1, start_time) as col6,
    ->             REGEXP_EXTRACT_ALL(value, '-.') as col7,
    ->             JSON_EXTRACT('{"id": "33"}', '$.id')as col8,
    ->             element_at(arr_int, 1) as col9,
    ->             date_trunc('day',start_time) as col10
    ->          FROM test_sqlconvert
    ->          where date_trunc('day',start_time)= DATE'2024-05-20'
    ->      order by id;
ERROR 1105 (HY000): errCode = 2, detailMessage = Syntax error in line 3:
            FILTER(arr_str, x -> x LIKE '%World%') as col3,
                              ^
Encountered: ->
Expected: ||, COMMA, ., IDENTIFIER

mysql>
mysql>
mysql> set global sql_converter_service_url = "http://192.168.1.13:5001/api/v1/convert";
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT cast(start_time as varchar(20)) as col1,
    ->             array_distinct(arr_int) as col2,
    ->             FILTER(arr_str, x -> x LIKE '%World%') as col3,
    ->             to_date(value,'%Y-%m-%d') as col4,
    ->             YEAR(start_time) as col5,
    ->             date_add('month', 1, start_time) as col6,
    ->             REGEXP_EXTRACT_ALL(value, '-.') as col7,
    ->             JSON_EXTRACT('{"id": "33"}', '$.id')as col8,
    ->             element_at(arr_int, 1) as col9,
    ->             date_trunc('day',start_time) as col10
    ->          FROM test_sqlconvert
    ->          where date_trunc('day',start_time)= DATE'2024-05-20'
    ->      order by id;
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| col1                | col2      | col3      | col4       | col5 | col6                | col7        | col8 | col9 | col10               |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| 2024-05-20 13:14:52 | [1, 2, 3] | ["World"] | 2024-01-14 | 2024 | 2024-06-20 13:14:52 | ['-0','-1'] | "33" |    1 | 2024-05-20 00:00:00 |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
1 row in set (0.36 sec)

mysql>
mysql> UNSET GLOBAL VARIABLE sql_converter_service_url;
Query OK, 0 rows affected (0.03 sec)

mysql> SELECT cast(start_time as varchar(20)) as col1,
    ->             array_distinct(arr_int) as col2,
    ->             FILTER(arr_str, x -> x LIKE '%World%') as col3,
    ->             to_date(value,'%Y-%m-%d') as col4,
    ->             YEAR(start_time) as col5,
    ->             date_add('month', 1, start_time) as col6,
    ->             REGEXP_EXTRACT_ALL(value, '-.') as col7,
    ->             JSON_EXTRACT('{"id": "33"}', '$.id')as col8,
    ->             element_at(arr_int, 1) as col9,
    ->             date_trunc('day',start_time) as col10
    ->          FROM test_sqlconvert
    ->          where date_trunc('day',start_time)= DATE'2024-05-20'
    ->      order by id;
ERROR 1105 (HY000): errCode = 2, detailMessage = Syntax error in line 3:
            FILTER(arr_str, x -> x LIKE '%World%') as col3,
                              ^
Encountered: ->
Expected: ||, COMMA, ., IDENTIFIER

mysql>
mysql>
mysql> set enable_multi_dialect_convert_service = true;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT cast(start_time as varchar(20)) as col1,
    ->             array_distinct(arr_int) as col2,
    ->             FILTER(arr_str, x -> x LIKE '%World%') as col3,
    ->             to_date(value,'%Y-%m-%d') as col4,
    ->             YEAR(start_time) as col5,
    ->             date_add('month', 1, start_time) as col6,
    ->             REGEXP_EXTRACT_ALL(value, '-.') as col7,
    ->             JSON_EXTRACT('{"id": "33"}', '$.id')as col8,
    ->             element_at(arr_int, 1) as col9,
    ->             date_trunc('day',start_time) as col10
    ->          FROM test_sqlconvert
    ->          where date_trunc('day',start_time)= DATE'2024-05-20'
    ->      order by id;
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| col1                | col2      | col3      | col4       | col5 | col6                | col7        | col8 | col9 | col10               |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| 2024-05-20 13:14:52 | [1, 2, 3] | ["World"] | 2024-01-14 | 2024 | 2024-06-20 13:14:52 | ['-0','-1'] | "33" |    1 | 2024-05-20 00:00:00 |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
1 row in set (0.40 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Nicholas-ehui
Copy link
Author

run buildall

@morningman morningman self-assigned this Jun 22, 2024
@morningman
Copy link
Contributor

2 questions:

  1. Why do we need multi convertor services? What is the use case?
  2. Why not just use sql_converter_service_url variable and make it support multiple urls? so that we can remove the fe config item and also remove the enable_multi_dialect_convert_service variable. If sql_converter_service_url has only one url, use that one. If sql_converter_service_url has multi urls, poll each of them

@Nicholas-ehui Nicholas-ehui deleted the multi_sqlconvert branch June 22, 2024 05:06
@Nicholas-ehui Nicholas-ehui restored the multi_sqlconvert branch June 22, 2024 05:07
@Nicholas-ehui
Copy link
Author

Nicholas-ehui commented Jun 22, 2024

2 questions:

  1. Why do we need multi convertor services? What is the use case?
  2. Why not just use sql_converter_service_url variable and make it support multiple urls? so that we can remove the fe config item and also remove the enable_multi_dialect_convert_service variable. If sql_converter_service_url has only one url, use that one. If sql_converter_service_url has multi urls, poll each of them

image
1: If multi FE nodes have started the Doris SQL Convertor service, how to use these converter services? Currently, each FE can only use at most one converter set by sql_converter_service_url.

image
2: I thought about setting up multiple conversion servers using only sql_converter_service_url, but I didn’t know how to describe it in one URL, so I used fe.conf dialect_converter_services + session variable enable_multi_dialect_convert_service

@Nicholas-ehui Nicholas-ehui reopened this Jun 22, 2024
@morningman
Copy link
Contributor

2 questions:

  1. Why do we need multi convertor services? What is the use case?
  2. Why not just use sql_converter_service_url variable and make it support multiple urls? so that we can remove the fe config item and also remove the enable_multi_dialect_convert_service variable. If sql_converter_service_url has only one url, use that one. If sql_converter_service_url has multi urls, poll each of them

image 1: If multi FE nodes have started the Doris SQL Convertor service, how to use these converter services? Currently, each FE can only use at most one converter set by sql_converter_service_url.

image 2: I thought about setting up multiple conversion servers using only sql_converter_service_url, but I didn’t know how to describe it in one URL, so I used fe.conf dialect_converter_services + session variable enable_multi_dialect_convert_service

Oh, I see.
The recommendation in document is not very clear.
Actually, it means that user can deploy sql convertor services(same service) on each FE node,
and then setting url using the loopback ip: 127.0.0.1.
So that each FE node can access its own converter service locally without network traffic.

In addition, user can also deploy only on convertor service on arbitrary node and setting url using the
node ip, just making sure this ip can be accessed by all FEs

@Nicholas-ehui
Copy link
Author

Yes, what you mentioned above is that each FE can use at most one dialect converter server. In this PR, each FE can use multi dialect converter servers using polling. Check to see if this PR is needed? If don’t need it, you can close it.

@morningman
Copy link
Contributor

Yes, what you mentioned above is that each FE can use at most one dialect converter server. In this PR, each FE can use multi dialect converter servers using polling. Check to see if this PR is needed? If don’t need it, you can close it.

I don't think we need to start multi converter servers for a single FE?

}

public PluginInfo getPluginInfo() {
return pluginInfo;
}

private static void analyzeConverterSevicesInfo() {
String dialectConverterServices = Config.dialect_converter_services;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double spaces after String

return resultStr;
}
}
return originSql;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log here if the conversion fails?
The log should contain how many times tried and which server has been tried.


int retryCount = 0;
while (retryCount < dialectConverterSerInfo.size()) {
String chooseServer = dialectConverterSerInfo.get(chooseServerIdx++ % dialectConverterSerInfo.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to skip the servers that have been marked unavailable, which can be deduced by line 152, for better availability.
e.g. remember some unavailable servers, and reduce the possibility accessing that servers with a back-off algorithm.

@@ -2612,6 +2612,10 @@ public class Config extends ConfigBase {
public static boolean enable_proxy_protocol = false;
public static int profile_async_collect_expire_time_secs = 5;

@ConfField(description = {"方言转换服务器地址",
"Dialect translation services info"})
public static String dialect_converter_services = "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example here, what the format is
e.g. a.com:8080;b.com:8090

Copy link
Collaborator

@gavinchou gavinchou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix some obvious errors and consider further improvement.

@morningman morningman closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants