[FLINK-16996][table] Refactor planner and connectors to use new data structures#11925
[FLINK-16996][table] Refactor planner and connectors to use new data structures#11925wuchong wants to merge 9 commits intoapache:masterfrom
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit fd62e8e (Mon Apr 27 16:23:50 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
Hi @dianfu , could you help to review the python part? |
KurtYoung
left a comment
There was a problem hiding this comment.
I've checked runtime part and LGTM
| @@ -66,4 +69,20 @@ public static boolean byteArrayEquals( | |||
| return true; | |||
| } | |||
|
|
|||
| public static String toString(RowData row, LogicalType[] types) { | |||
There was a problem hiding this comment.
This is only used in tests, move it to tests?
| str.ensureMaterialized(); | ||
|
|
||
| if (precision > Decimal.MAX_LONG_DIGITS || str.getSizeInBytes() > Decimal.MAX_LONG_DIGITS) { | ||
| if (DecimalDataUtils.isByteArrayDecimal(precision) || DecimalDataUtils.isByteArrayDecimal(str.getSizeInBytes())) { |
There was a problem hiding this comment.
Is this right? why we check both precision and sizeInBytes with the same method?
There was a problem hiding this comment.
cc @JingsongLi , do you know why we check both of them?
There was a problem hiding this comment.
It is right.
The case is, precision is 10 (less than MAX_LONG_DIGITS), but the string data is more than 10 precision maybe, we can use big decimal to convert, and then remove the extra decimals according to the precision.
| * Converts a {@link MapData} into Java {@link Map}, the keys and values of the Java map | ||
| * still holds objects of internal data structures. | ||
| */ | ||
| public static Map<Object, Object> convertToJavaMap( |
There was a problem hiding this comment.
|
@wuchong The python part LGTM. |
| if (i != 0) { | ||
| sb.append(","); | ||
| } | ||
| sb.append(StringUtils.arrayAwareToString(fields[i])); |
There was a problem hiding this comment.
Use org.apache.flink.table.utils.EncodingUtils#objectToString
| * | ||
| * <p>By default the result type of an evaluation method is determined by Flink's type extraction | ||
| * facilities. Currently, only support {@link org.apache.flink.types.Row} and {@code BaseRow} as | ||
| * facilities. Currently, only support {@link org.apache.flink.types.Row} and {@code RowData} as |
| case RAW: | ||
| return RawValueData.class; | ||
| default: | ||
| throw new UnsupportedOperationException("Not support type: " + type); |
| } | ||
|
|
||
| /** | ||
| * Get internal(sql engine execution data formats) conversion class for {@link LogicalType}. |
There was a problem hiding this comment.
nit: Returns the conversion class for the given {@link LogicalType} that is used by the table runtime.
| /** | ||
| * Get internal(sql engine execution data formats) conversion class for {@link LogicalType}. | ||
| */ | ||
| public static Class<?> internalConversionClass(LogicalType type) { |
There was a problem hiding this comment.
nit: toInternalConversionClass
twalthr
left a comment
There was a problem hiding this comment.
Some feedback to 9517bc9.
What I don't like is that we have a lot of runtime code in flink-table-common now which means it is also available in the API. I'm wondering if we could at least hide some util classes by a default scope visibility. At least we should move all of those utilities to the binary package.
| * Precision is not compact: can not call setNullAt when decimal is null, must call | ||
| * setDecimal(i, null, precision) because we need update var-length-part. | ||
| */ | ||
| void setDecimal(int i, DecimalData value, int precision); |
| void setNullAt(int pos); | ||
|
|
||
| /** | ||
| * Set boolean value. |
There was a problem hiding this comment.
Remove those JavaDocs. They are not useful and just make the code more complicated.
| * <p>Note: | ||
| * Precision is compact: can call setNullAt when decimal is null. | ||
| * Precision is not compact: can not call setNullAt when decimal is null, must call | ||
| * setDecimal(i, null, precision) because we need update var-length-part. |
There was a problem hiding this comment.
nit: use {@code } to format the JavaDoc
| /** | ||
| * Binary format spanning {@link MemorySegment}s. | ||
| */ | ||
| public interface BinaryFormat { |
| * <p>It can lazy the conversions as much as possible. It will be converted into required form | ||
| * only when it is needed. | ||
| */ | ||
| public abstract class LazyBinaryFormat<T> implements BinaryFormat { |
| * Utilities for binary data segments which heavily uses {@link MemorySegment}. | ||
| */ | ||
| @Internal | ||
| public class BinarySegmentUtils { |
There was a problem hiding this comment.
Put this under org.apache.flink.table.data.binary. Make the class final with a private default constructor.
| * Murmur Hash. This is inspired by Guava's Murmur3_32HashFunction. | ||
| */ | ||
| @Internal | ||
| public final class MurmurHashUtils { |
There was a problem hiding this comment.
put under org.apache.flink.table.data.binary add private default constructor
| * Utilities for String UTF-8. | ||
| */ | ||
| @Internal | ||
| public class StringUtf8Utils { |
There was a problem hiding this comment.
put under org.apache.flink.table.data.binary make final with private default constructor
| * used on the binary format such as {@link BinaryRowData}. | ||
| */ | ||
| @Internal | ||
| public interface TypedSetters { |
There was a problem hiding this comment.
If this is mainly used for binary formats, put it in the binary package.
|
Thanks @wuchong for this massive PR. I took a look at the classes in |
|
Thanks @KurtYoung @dianfu @twalthr for the quickly reviewing. I have addressed all the comments. Hi Timo, currently, it's hard to make all classes under |
| @Internal | ||
| public final class NestedRowData extends BinarySection implements RowData, TypedSetters { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
|
|
||
| import java.io.Serializable; | ||
|
|
||
| /** | ||
| * Record equaliser for BaseRow which can compare two BaseRows and returns whether they are equal. | ||
| * Record equaliser for RowData which can compare two RowDatas and returns whether they are equal. |
There was a problem hiding this comment.
RowData data no plurality "s"
| * Returns a term for representing the given class in Java code. | ||
| */ | ||
| def typeTerm(clazz: Class[_]): String = { | ||
| if (clazz == classOf[StringData]) { |
There was a problem hiding this comment.
Could this put CodeGen at risk of missing cast?
If possible, I prefer cast in accessing its methods.
There was a problem hiding this comment.
Consider RowData and others, using StringData more reasonable to me.
There was a problem hiding this comment.
There are too many calls on the BinaryStringData in code generation now, if we use StringData in code generation, we have to refactor a lot of codes. And we don't benefit much from this effort. I created FLINK-17437 to track this, we can refactor this in the future.
| case None => | ||
| val term = newName("typeSerializer") | ||
| val ser = InternalSerializers.create(t, new ExecutionConfig) | ||
| val ser = InternalSerializers.createInternalSerializer(t, new ExecutionConfig) |
There was a problem hiding this comment.
createInternalSerializer a little redundant
|
There are still 100+ BaseRow in codes (variable name or something else), you can modify them all. |
|
Thanks for the reviewing @JingsongLi . I have addressed the comments and renamed lagecy field names and method which use BaseRow, SqlTimestamp and BinaryGeneric. |
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks @wuchong , looks good to me from my side.
…ructures - Add to primitive array methods to ArrayData - Add get(ArrayData, int, LogicalType) utility to ArrayData - Add get(Object key) to GenericMapData - Add get(RowData, int, LogicalType) utility to RowData - Add toString() to RowData
…and serializers around RowData
3cbfdd0 to
0a1e660
Compare
|
Thanks all for the reviewing. I have rebased the commits. Will merge this once builds passed. |
|
Hi @wuchong , do you want to squash or not? Here is my question: |
|
Hi @JingsongLi , I want to keep the splitted commits. From my point of view, squashing such a large PR into one commit is not good. |
So the answer is we can push commit with broken compilation. |
|
@JingsongLi , as far as I know, the community doesn't have a rule to make sure independent commits can pass build, but there is a rule to separate commits if it is a big commit: |
|
Travis is passed: https://travis-ci.org/github/wuchong/flink/builds/680843224 |
|
@wuchong go you. |
…ta structures This closes #11925
…and serializers around RowData This closes #11925
What is the purpose of the change
Refactors existing code to use the new data structures interfaces.
Brief change log
The commints in the order:
Some notable changes:
StringDatatoBinaryStringData. This makes the code generator easily to generate opeartions based on string. The same toRawValueData.Decimalhave been moved toDecimalDateUtil. So I also updated the code generation logic.RecordEqualiser#equalsWithoutHeaderinterface. This method is used less and can be replaced byRecordEqualiser#equals. This can also avoid addequalsWithoutHeadermethod to the public APIGenericRowData.Verifying this change
This change is covered by existing tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): yesDocumentation