Skip to content

Conversation

@wuchong
Copy link
Member

@wuchong wuchong commented Jul 30, 2019

What is the purpose of the change

Fix SinkCodeGenerator shouldn't compare type field names. Also we added blink planner for hbase IT cases.

Brief change log

  • SinkCodeGenerator still use type information to compare
  • make hbase IT case to run with blink planner and flink planner both.

Verifying this change

  • We added some IT case in blink planner to verify SinkCodeGenerator

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@wuchong
Copy link
Member Author

wuchong commented Jul 30, 2019

cc @carp84 could you help review the hbase part?
cc @JingsongLi @godfreyhe welcome to review when you are free.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 30, 2019

CI report:

return descriptorProperties.asMap();
}

@Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore?

public class HBaseSinkITCase extends HBaseTestingClusterAutostarter {
private static final long serialVersionUID = 1L;

private static final String BLINK_PLANNER = "Blink";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use enum instead?

val fieldIndexProcessCode =
if (getCompositeTypes(convertOutputType).map(fromTypeInfoToLogicalType) sameElements
inputTypeInfo.getFieldTypes.map(fromTypeInfoToLogicalType)) {
if (getCompositeTypes(convertOutputType) sameElements inputTypeInfo.getFieldTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it not work now...
inputTypeInfo is a BaseRowTypeInfo with internal type informations.
And convertOutputType is the type information from sink and users.
Consider Types.STRING and BinaryStringTypeInfo, Types.DECIMAL and DecimalTypeInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think maybe you can try LogicalType.asSerializableString

public void testHBaseLookupTableSource() throws Exception {
EnvironmentSettings settings = EnvironmentSettings.newInstance()
// only blink planner support lookup table source
.useBlinkPlanner().inStreamingMode().build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use parameterized test to both cover blink-planner and flink-planner just like the way in HBaseSinkITCase, and remove the testHBaseLookupFunction test.

@wuchong wuchong force-pushed the hbase-test branch 2 times, most recently from 9e7c51e to cef04c0 Compare July 31, 2019 12:41
@wuchong
Copy link
Member Author

wuchong commented Jul 31, 2019

Hi @carp84 , @JingsongLi , I have updated the PR.

I split it into 3 commits:

  • 0a59bfc: Add method to equals LogicalType without field names
  • 6649228: SinkCodeGenerator should not compare row type field names
  • cef04c0: Enable blink planner for integration tests of HBase

cc @twalthr , it would be great if you can help to review 0a59bfc.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wuchong, I had a look at 0a59bfc and some questions regarding the use case for this utility method. Wouldn't it be enough to just go through the types returned by getChildren() for structured, distinct type, and row types? If we skip the field names we could also skip more logical attributes such as isFinal, description, etc. and focus on the pure "more physical" types.

}

public static boolean areTypeEqualsWithoutNames(LogicalType thisType, LogicalType thatType) {
if (thisType == null || thatType == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add a precondition here. Types should not be null. If they are null, it is a bug that should throw an exception.

}
}

private static class TypeEquivalenceIgnoreNamesExtractor extends Extractor<Boolean> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should directly extend from LogicalTypeDefaultVisitor because we override the default method anyway.

* Returns true if tow LogicalType can equal. This is the same with {@link LogicalType#equals(Object)}.
*/
private boolean canEqual(LogicalType thatType) {
if (thisType == null || thatType == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above regarding null checking

if (!canEqual(thatType)) {
return false;
}
boolean thisIsFinal = ((DistinctType) thisType).isFinal();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distinct types are always final. We don't need to check that.

return logicalType.accept(SINGLE_FIELD_INTERVAL_EXTRACTOR);
}

public static boolean areTypeEqualsWithoutNames(LogicalType thisType, LogicalType thatType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to hasEqualTypesWithoutNames

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to use hasEqualTypesWithoutNames? It looks like whether it contains some equal types (might children types are equal).

RowType.RowField thatField = thatFields.get(i);
// ignore field names
if (!thisField.getType().equals(thatField.getType())
|| !Objects.equals(thisField.getDescription(), thatField.getDescription())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do we check the description? Isn't it as meaningless as the field names?

return thisIsFinal == thatType.isFinal() &&
thisObjectIdentifier.equals(thatType.getObjectIdentifier()) &&
thisDescription.equals(thatType.getDescription()) &&
areTypeEqualsWithoutNames(thisSourceType, thatType.getSourceType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just compare the source type. Everything else is logical meta data, no?

@wuchong
Copy link
Member Author

wuchong commented Jul 31, 2019

Hi @twalthr , thanks for the advice. The implementation of type equals has been much simpler.

Please have a look again.

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for further refactoring the patch @wuchong , added some more comments.


package org.apache.flink.addons.hbase.util;

import org.apache.flink.addons.hbase.HBaseConnectorITCase2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The L21-L26 imports are unused and should be removed, let alone L21 introduces a compilation error.

The same applies to the L32 import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it seems this issue is already resolved by the later force push, and now the code here looks good.

};
// split keys
byte[][] splitKeys = new byte[][]{ Bytes.toBytes(4) };
createTable(tableName, families, splitKeys);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we could extract L101-L111 out to remove some duplicated codes.

// https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/FilterTestingCluster.java
//
public class HBaseTestingClusterAutostarter extends TestLogger implements Serializable {
public abstract class HBaseTestingClusterAutostarter extends AbstractTestBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: HBaseTestingClusterAutostarter -> HBaseTestingClusterAutoStarter

@Test
public void testTableInputFormat() throws Exception {
if (BLINK_PLANNER.equals(planner)) {
// this case is for testing TableInputFormat which is not works for flink-table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little bit ambiguous, from the code logic it seems we should say "this case is for testing TableInputFormat which only applies to flink-table planner" or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. I mean TableInputFormat is a runtime implementation, not for flink table, not relative to what the planner is. You can see the test is only verified on DataSet, not on TableEnvironment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion, the case is irrelative to planner actually, but in case that later change somehow making the test case relative to planner implementation (although low possibility), let's just don't skip the test for any planner.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @wuchong. The code looks beautiful now. The only concern left is the naming of the method and visitor. Because it does more than just ignoring names now. How about hasEqualAtomicTypes?

wuchong added 2 commits August 1, 2019 20:25
Add a areTypesCompatible() method to LogicalTypeChecks. This will compare two LogicalTypes without field names and other logical attributes (e.g. description, isFinal).
This commit combines HBaseTableSourceITCase and HBaseLookupFunctionITCase and HBaseConnectorITCase into one class.
This can save much cluster initialization time for us.
@wuchong
Copy link
Member Author

wuchong commented Aug 1, 2019

As discussed with @twalthr , we would like to go with areTypeCompatible for the method. And add a detailed javadoc on the method.

PR updated and squashed.

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes LGTM. Thanks for the efforts @wuchong !

asfgit pushed a commit that referenced this pull request Aug 2, 2019
This commit combines HBaseTableSourceITCase and HBaseLookupFunctionITCase and HBaseConnectorITCase into one class.
This can save much cluster initialization time for us.

This closes #9275
@asfgit asfgit closed this in 5d981eb Aug 2, 2019
@wuchong wuchong deleted the hbase-test branch August 2, 2019 02:16
wzhero1 pushed a commit to wzhero1/flink that referenced this pull request Aug 2, 2019
This commit combines HBaseTableSourceITCase and HBaseLookupFunctionITCase and HBaseConnectorITCase into one class.
This can save much cluster initialization time for us.

This closes apache#9275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants