-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-5744] Add MAP_FROM_ARRAYS, STR_TO_MAP function (enabled in Spark library) #3238
Conversation
3a1f8b8
to
0b37684
Compare
requireNonNull(keysArrayType.getComponentType(), "inferred key type"), | ||
requireNonNull(valuesArrayType.getComponentType(), "inferred value type"), | ||
nullable); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments:
final boolean nullable = keysArrayType.isNullable() || valuesArrayType.isNullable();
this should be ,so i add the unit test
f.checkType("str_to_map(cast(null as varchar))",
"(VARCHAR, VARCHAR) MAP");
/** Support the STR_TO_MAP function. */ | ||
public static Map strToMap(String string, String stringDelimiter, String keyValueDelimiter) { | ||
final Map map = new LinkedHashMap(); | ||
final String[] keyValues = string.split(stringDelimiter, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the limit should be -1, which aligns with spark.
what is the difference with limit -1, you can se here
https://stackabuse.com/how-to-split-a-string-in-java/
/** Support the MAP_FROM_ARRAYS function. */ | ||
public static Map mapFromArrays(List keysArray, List valuesArray) { | ||
if (keysArray.size() != valuesArray.size()) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the length check is need, and i add a negative test for it
f.checkFails("map_from_arrays(array[1, 2], array['foo'])",
"Invalid function MAP_FROM_ARRAYS call:\n"
+ "The length of the keys array 2 is not equal to the length of the values array 1",
true);
hi @tanclary @JiajunBernoulli @MasseGuillaume do you have time to help review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, hope this helps!
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java
Outdated
Show resolved
Hide resolved
|
||
@Override Expression implementSafe(RexToLixTranslator translator, | ||
RexCall call, List<Expression> argValueList) { | ||
String defaultStringDelimiter = ","; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to use the "SPLIT" implementation offered by Bigquery here? Some of this looks very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not same
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java
Outdated
Show resolved
Hide resolved
/** Support the MAP_FROM_ARRAYS function. */ | ||
public static Map mapFromArrays(List keysArray, List valuesArray) { | ||
if (keysArray.size() != valuesArray.size()) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this exception be added as a constant to Resources like the other errors thrown from this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Resources
core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java
Outdated
Show resolved
Hide resolved
hi @tanclary thanks for your review and your several valuable suggestion +1 and do you have time to look again? |
61de8ef
to
067ac20
Compare
@@ -965,6 +965,31 @@ private static RelDataType arrayReturnType(SqlOperatorBinding opBinding) { | |||
ReturnTypes.TO_MAP_VALUES_NULLABLE, | |||
OperandTypes.MAP); | |||
|
|||
private static RelDataType deriveTypeMapFromArrays(SqlOperatorBinding opBinding) { | |||
final RelDataType keysArrayType = opBinding.collectOperandTypes().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do opBinding.getOperandType? if so, that's a little cleaner I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed @tanclary
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java
Show resolved
Hide resolved
@tanclary @MasseGuillaume will you help look again? |
f.checkScalar("map_from_arrays(array[1, 1, null], array['foo', 'bar', 'name'])", | ||
"{1=bar, null=name}", "(INTEGER, CHAR(4) NOT NULL) MAP NOT NULL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql("""select map_from_arrays(array(1, 1, null), array('foo', 'bar', 'name'))""").show
java.lang.RuntimeException: Duplicate map key 1 was found, please check the input data. If you want to remove the duplicated keys, you can set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last takes precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because spark's map is different from calcite is map implementation.
spark suppprts two way, while calcite is LAST_WIN. if we 100% match spark, it should break the behavior of all map functions
object MapKeyDedupPolicy extends Enumeration {
val EXCEPTION, LAST_WIN = Value
}
because both of you @tanclary @MasseGuillaume says add array with different element type, it will throw exception.
and from my side, it should be test in array_value_constructor function instead of this. i also add test in array_value_constructor
|
hi @tanclary @MasseGuillaume fix all your reviews, will you have a look again? |
hi @tanclary the pr has approved, i squash the commits. will you help review it again or merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from one nit
hi @tanclary thanks for your review, fix the nit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only need to add some tests.
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
STR_TO_MAP(string[, stringDelimiter[, keyValueDelimiter]])
Returns a map after splitting the string into key/value pairs using delimiters. Default delimiters are ',' for stringDelimiter and ':' for keyValueDelimiter Both pairDelim and keyValueDelim are treated as regular expressions.
MAP_FROM_ARRAYS (array1, array2)
Returns a map created from an array1 and *array2. Note that the lengths of two arrays should be the same