[BEAM-2990] support MAP in SQL schema#5079
Conversation
|
R: + @akedin Can you guys take a look and let's try to finish it before 2.5 cutoff? Thanks! |
|
retest this please |
akedin
left a comment
There was a problem hiding this comment.
Looks good to me overall, few questions:
| // For MAP type, returns the type of the key element. | ||
| @Nullable public abstract FieldType getComponentKeyType(); | ||
| // For MAP type, returns the type of the value element. | ||
| @Nullable public abstract FieldType getComponentValueType(); |
There was a problem hiding this comment.
to me it looks like both getComponentValueType() and getComponentType() mean the same thing, i.e. they both describe the type of the value in the container. Keep just one of them?
There was a problem hiding this comment.
I would prefer to have different fields for MAP and ARRAY, if it doesn't cause significant performance issue.
| @@ -222,6 +224,9 @@ public boolean isDateType() { | |||
| public boolean isContainerType() { | |||
There was a problem hiding this comment.
Is map a container type as well? A composite type?
There was a problem hiding this comment.
while, my thought is MAP-> MAP, ARRAY -> CONTAINER, ROW -> COMPOSITE, to make it clear for the backend types.
|
the failure seems unrelated, any tips to handle it? |
|
retest this please |
|
run java precommit |
| put(1, "value1"); | ||
| put(2, "value2"); | ||
| put(3, "value3"); | ||
| put(4, "value4"); |
There was a problem hiding this comment.
I would also add a test where the map value is itself a complex or nested type.
There was a problem hiding this comment.
will update to support primitive/array/map/row as value type in Map
| Map<Object, Object> valueMap = (Map<Object, Object>) value; | ||
| Map<Object, Object> verifiedMap = Maps.newHashMapWithExpectedSize(valueMap.size()); | ||
| for (Entry<Object, Object> kv : valueMap.entrySet()) { | ||
| verifiedMap.put(verifyPrimitiveType(kv.getKey(), componentKeyType.getTypeName(), fieldName), |
There was a problem hiding this comment.
This is actually incorrect right now, since as coded the key type might not be a primitive type. However as I mentioned in Schema.java, I think it would be better to change the key type to be a TypeName, in which case this logic will be correct.
| return this; | ||
| } | ||
|
|
||
| public <T1, T2> Builder addMap(Map<T1, T2> data) { |
There was a problem hiding this comment.
Is this overload needed? addArray was needed because otherwise passing an array into addValues tended to unroll the array, but I don't think Java will do that for maps.
There was a problem hiding this comment.
it's not necessary, will remove.
| public static final Set<TypeName> STRING_TYPES = ImmutableSet.of(STRING); | ||
| public static final Set<TypeName> DATE_TYPES = ImmutableSet.of(DATETIME); | ||
| public static final Set<TypeName> CONTAINER_TYPES = ImmutableSet.of(ARRAY); | ||
| public static final Set<TypeName> MAP_TYPES = ImmutableSet.of(MAP); |
There was a problem hiding this comment.
should we consider this part of CONTAINER_TYPES?
There was a problem hiding this comment.
would separate here, CONTAINER should be ARRAY/SET. List<KV<>> could be a CONTAINER_TYPE, Map<> is not.
| // For MAP type, returns the type of the key element. | ||
| @Nullable public abstract FieldType getComponentKeyType(); | ||
| // For MAP type, returns the type of the value element. | ||
| @Nullable public abstract FieldType getComponentValueType(); |
There was a problem hiding this comment.
XuMingmin wrote:
I would prefer to have different fields for MAP and ARRAY, if it doesn't cause significant performance issue.
two questions/comments
-
I think the key type be a TypeName instead of a FieldType? Making it a FieldType makes it legal for the key to be a complex value - e.g. they key could be an array type - which doesn't sound very meaningful to me.
-
Have you considered introducing a new key-value type here (pair of TypeName, FieldType)?
There was a problem hiding this comment.
+1, will change to key as primitive, and value can be primitive/array/map/row
| @@ -222,6 +224,9 @@ public boolean isDateType() { | |||
| public boolean isContainerType() { | |||
There was a problem hiding this comment.
XuMingmin wrote:
while, my thought is MAP-> MAP, ARRAY -> CONTAINER, ROW -> COMPOSITE, to make it clear for the backend types.
I think it's actually just another type of container. i.e. one way of imagining .a map is it's just a container of key-value pairs.
|
retest this please |
|
any comments on the change? Would like to close this PR asap as my repository is broken after #4964 |
|
run java precommit |
|
retest this please |
5949dbe to
78f845e
Compare
reuvenlax
left a comment
There was a problem hiding this comment.
At sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java:438:
} else if (TypeName.ROW.equals(componentType.getTypeName())) {missing Map case.
I think we need to factor this switch out into a helper function used in all these cases, so we don't have to keep updating all of them.
| public static final Set<TypeName> COMPOSITE_TYPES = ImmutableSet.of(ROW); | ||
|
|
||
| public boolean isPrimitiveType() { | ||
| return isNumericType() || isStringType() || isDateType(); |
There was a problem hiding this comment.
this is not correct (e.g. it excludes boolean). better off making this exclusive (return !isContainterType() && !isCompositeType()).
| public static final Set<TypeName> STRING_TYPES = ImmutableSet.of(STRING); | ||
| public static final Set<TypeName> DATE_TYPES = ImmutableSet.of(DATETIME); | ||
| public static final Set<TypeName> CONTAINER_TYPES = ImmutableSet.of(ARRAY); | ||
| public static final Set<TypeName> MAP_TYPES = ImmutableSet.of(MAP); |
There was a problem hiding this comment.
XuMingmin wrote:
would separate here, CONTAINER should be ARRAY/SET.List<KV<>>could be a CONTAINER_TYPE,Map<>is not.
I'm still confused about this. In most systems, Map is considered a container (e.g. in Java Map is a container type)
There was a problem hiding this comment.
In Java, container extends Collection and map extends Map, they're very different IMO. If we merge them together I don't see any benefit as this is a backend function and developers are using either TypeName.ARRAY.type().withComponentType() or TypeName.MAP.type().withMapType.
To make it clear, I would prefer to use the term Collection instead of Component or Contianer. Any comments?
| return this; | ||
| } | ||
|
|
||
| public <T1, T2> Builder addMap(Map<T1, T2> data) { |
There was a problem hiding this comment.
XuMingmin wrote:
it's not necessary, will remove.
Acknowledged.
|
run java precommit |
| abstract static class Builder { | ||
| abstract Builder setTypeName(TypeName typeName); | ||
| abstract Builder setComponentType(@Nullable FieldType componentType); | ||
| abstract Builder setCollectionType(@Nullable FieldType collectionType); |
There was a problem hiding this comment.
nitpick here: collectionType reads weird to me, as it seems like it's the type of collection (e.g. array, list, tree, etc.) instead of the type of the component elements of the collection.
There was a problem hiding this comment.
I didn\t mind componentType, but if you want to include collection in there then maybe CollectionElementType or CollectionValueType?
There was a problem hiding this comment.
ok, let me change it to CollectionElementType
There was a problem hiding this comment.
ok, LGTM from me once that change is made.
There was a problem hiding this comment.
updated, please have a look when you've time.
|
lgtm |
|
Appreciate @reuvenlax , squash and merging |
Add type MAP.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.