Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-5744] Add MAP_FROM_ARRAYS, STR_TO_MAP function (enabled in Spark library) #3238

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

liuyongvs
Copy link
Contributor

@liuyongvs liuyongvs commented Jun 2, 2023

 STR_TO_MAP(string[, stringDelimiter[, keyValueDelimiter]])

Returns a map after splitting the string into key/value pairs using delimiters. Default delimiters are ',' for stringDelimiter and ':' for keyValueDelimiter Both pairDelim and keyValueDelim are treated as regular expressions.

Examples:

> SELECT str_to_map('a:1,b:2,c:3', ',', ':');

- {"a":"1","b":"2","c":"3"}

> SELECT str_to_map('a');
 {"a":null} 

MAP_FROM_ARRAYS (array1, array2)

Returns a map created from an array1 and *array2. Note that the lengths of two arrays should be the same

SELECT map_from_arrays(array(1.0, 3.0), array('2', '4'));
{1.0:"2",3.0:"4"}

requireNonNull(keysArrayType.getComponentType(), "inferred key type"),
requireNonNull(valuesArrayType.getComponentType(), "inferred value type"),
nullable);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments:
final boolean nullable = keysArrayType.isNullable() || valuesArrayType.isNullable();
this should be ,so i add the unit test
f.checkType("str_to_map(cast(null as varchar))",
"(VARCHAR, VARCHAR) MAP");

/** Support the STR_TO_MAP function. */
public static Map strToMap(String string, String stringDelimiter, String keyValueDelimiter) {
final Map map = new LinkedHashMap();
final String[] keyValues = string.split(stringDelimiter, -1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the limit should be -1, which aligns with spark.
what is the difference with limit -1, you can se here
https://stackabuse.com/how-to-split-a-string-in-java/

/** Support the MAP_FROM_ARRAYS function. */
public static Map mapFromArrays(List keysArray, List valuesArray) {
if (keysArray.size() != valuesArray.size()) {
throw new IllegalArgumentException(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the length check is need, and i add a negative test for it

    f.checkFails("map_from_arrays(array[1, 2], array['foo'])",
        "Invalid function MAP_FROM_ARRAYS call:\n"
            + "The length of the keys array 2 is not equal to the length of the values array 1",
        true);

@liuyongvs
Copy link
Contributor Author

hi @tanclary @JiajunBernoulli @MasseGuillaume do you have time to help review ?

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, hope this helps!


@Override Expression implementSafe(RexToLixTranslator translator,
RexCall call, List<Expression> argValueList) {
String defaultStringDelimiter = ",";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to use the "SPLIT" implementation offered by Bigquery here? Some of this looks very similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not same

/** Support the MAP_FROM_ARRAYS function. */
public static Map mapFromArrays(List keysArray, List valuesArray) {
if (keysArray.size() != valuesArray.size()) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this exception be added as a constant to Resources like the other errors thrown from this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Resources

@liuyongvs
Copy link
Contributor Author

liuyongvs commented Jun 6, 2023

hi @tanclary thanks for your review and your several valuable suggestion +1

and do you have time to look again?

@liuyongvs liuyongvs requested a review from tanclary June 6, 2023 04:07
@liuyongvs liuyongvs force-pushed the str_to_map2 branch 2 times, most recently from 61de8ef to 067ac20 Compare June 6, 2023 05:12
@liuyongvs liuyongvs changed the title [CALCITE-5744] Add STR_TO_MAP function (enabled in Spark library) [CALCITE-5744] Add MAP_FROM_ARRAYS, STR_TO_MAP function (enabled in Spark library) Jun 6, 2023
@@ -965,6 +965,31 @@ private static RelDataType arrayReturnType(SqlOperatorBinding opBinding) {
ReturnTypes.TO_MAP_VALUES_NULLABLE,
OperandTypes.MAP);

private static RelDataType deriveTypeMapFromArrays(SqlOperatorBinding opBinding) {
final RelDataType keysArrayType = opBinding.collectOperandTypes().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do opBinding.getOperandType? if so, that's a little cleaner I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed @tanclary

@liuyongvs liuyongvs requested a review from tanclary June 6, 2023 15:22
@liuyongvs
Copy link
Contributor Author

@tanclary @MasseGuillaume will you help look again?

Comment on lines +5557 to +5703
f.checkScalar("map_from_arrays(array[1, 1, null], array['foo', 'bar', 'name'])",
"{1=bar, null=name}", "(INTEGER, CHAR(4) NOT NULL) MAP NOT NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.sql("""select map_from_arrays(array(1, 1, null), array('foo', 'bar', 'name'))""").show
java.lang.RuntimeException: Duplicate map key 1 was found, please check the input data. If you want to remove the duplicated keys, you can set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last takes precedence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because spark's map is different from calcite is map implementation.
spark suppprts two way, while calcite is LAST_WIN. if we 100% match spark, it should break the behavior of all map functions

  object MapKeyDedupPolicy extends Enumeration {
    val EXCEPTION, LAST_WIN = Value
  }

@liuyongvs
Copy link
Contributor Author

because both of you @tanclary @MasseGuillaume says add array with different element type, it will throw exception.
i add

   f.checkFails("map_from_arrays(^array[1, '1', true]^, array['a', 'b', 'c'])",
        "Parameters must be of the same type",
        false);
    f.checkFails("map_from_arrays(array['a', 'b', 'c'], ^array[1, '1', true]^)",
        "Parameters must be of the same type",
        false);

and from my side, it should be test in array_value_constructor function instead of this. i also add test in array_value_constructor

    f.checkFails("^array[1, '1', true]^", "Parameters must be of the same type", false);

@liuyongvs
Copy link
Contributor Author

hi @tanclary @MasseGuillaume fix all your reviews, will you have a look again?

site/_docs/reference.md Outdated Show resolved Hide resolved
@liuyongvs
Copy link
Contributor Author

hi @tanclary the pr has approved, i squash the commits. will you help review it again or merge it?

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from one nit

@liuyongvs
Copy link
Contributor Author

hi @tanclary thanks for your review, fix the nit.

Copy link
Contributor

@NobiGo NobiGo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only need to add some tests.

@sonarcloud
Copy link

sonarcloud bot commented Jun 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 12 Code Smells

95.6% 95.6% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@NobiGo NobiGo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NobiGo NobiGo merged commit 95089c8 into apache:main Jun 26, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants