Skip to content

Conversation

@lincoln-lil
Copy link
Contributor

@lincoln-lil lincoln-lil commented Jul 21, 2022

What is the purpose of the change

This is a followup implementation of FLINK-28570 which introduces a new lookup join operator (sync mode only) with state to eliminate the non determinism.

Brief change log

  • add KeyedLookupJoinWrapper to process I/+U vs D/-U differently
  • add ListenableCollector to offer a callback when original records were collected

Verifying this change

newly added KeyedLookupJoinHarnessTest and existing LookupJoinITCase、AsyncLookupJoinITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @public(Evolving): (no)
  • The serializers: (no )
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 21, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@lincoln-lil lincoln-lil force-pushed the FLINK-28568 branch 3 times, most recently from 7fa8139 to 236db88 Compare August 6, 2022 13:10
@lincoln-lil
Copy link
Contributor Author

rebased latest master branch

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, I left some comments

*/
@Internal
public abstract class ListenableCollector<T> extends TableFunctionCollector<T> {
private CollectListener<T> collectListener;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mark it as @Nullable

* TableFunctionCollector} which combines left and right into a JoinedRowData.
*/
public static final class TestingFetcherCollector extends TableFunctionCollector {
public static final class TestingFetcherCollector extends ListenableCollector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type should be ListenableCollector<RowData> ?


List<Integer> refKeys =
allLookupKeys.entrySet().stream()
.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly filter the FieldRefLookupKeys using filter(key -> (key.getValue() instanceof LookupJoinUtil.FieldRefLookupKey)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

new KeyedProcessOperator<>(keyedLookupJoinWrapper);

List<Integer> refKeys =
allLookupKeys.entrySet().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use allLookupKeys.values() can avoid key.getValue()

OneInputTransformation<RowData, RowData> transform =
ExecNodeUtil.createOneInputTransformation(
partitionedTransform,
createTransformationMeta(LOOKUP_JOIN_TRANSFORMATION, config),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should define another operatorName for the Transformation, Because LOOKUP_JOIN_TRANSFORMATION has used for the join Transformation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

@Test
def testAggAndAsyncLeftJoinWithTryResolveMode(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any IT case to verify the change? this pr aims to support sync LookupJoin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case can cover the change, the legacy source can provide both sync and async functions, so it can fallback to sync lookup function with state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more cases in LookupJoinITCase

Copy link
Contributor Author

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@godfreyhe thank you for reviewing this! I've address your comments and updated the pr


List<Integer> refKeys =
allLookupKeys.entrySet().stream()
.filter(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

OneInputTransformation<RowData, RowData> transform =
ExecNodeUtil.createOneInputTransformation(
partitionedTransform,
createTransformationMeta(LOOKUP_JOIN_TRANSFORMATION, config),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

@Test
def testAggAndAsyncLeftJoinWithTryResolveMode(): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case can cover the change, the legacy source can provide both sync and async functions, so it can fallback to sync lookup function with state.

}

@Test
public void testAggAndAllConstantLookupKeyWithTryResolveMode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be moved to LookupJoinTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@godfreyhe I've rebased master to resolve conflicts and added more cases according to your comments, thanks again !

}

@Test
public void testAggAndAllConstantLookupKeyWithTryResolveMode() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

@Test
def testAggAndAsyncLeftJoinWithTryResolveMode(): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more cases in LookupJoinITCase

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, I left some comments

return transform;
}

private LogicalType getLookupKeyLogicalType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this newly method from master is unused, I'll remove it


public static final String LOOKUP_JOIN_TRANSFORMATION = "lookup-join";

public static final String LOOKUP_JOIN_WITH_STATE_TRANSFORMATION = "lookup-join-with-state";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about lookup-join-with-materialize ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to 'lookup-join-materialize', omit the word 'with'

? (RowType) toLogicalType(temporalTableOutputType.get())
: tableSourceRowType;
GeneratedCollector<TableFunctionCollector<RowData>> generatedCollector =
getProjectionRowTypeOnTemporalTable(relBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call the getProjectionRowTypeOnTemporalTable method in if (projectionOnTemporalTable != null) branch, and the implementation of getProjectionRowTypeOnTemporalTable method can be simplified

There is some duplicated code in createAsyncLookupJoin method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public abstract class ListenableCollector<T> extends TableFunctionCollector<T> {
@Nullable private CollectListener<T> collectListener;

public void setCollectListener(CollectListener<T> collectListener) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @nullable CollectListener collectListener

RowData right = (RowData) record;
RowData right = record;
getCollectListener()
.ifPresent(listener -> ((CollectListener) listener).onCollect(record));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast is no needed here

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, LGTM. Please rebase master, the failure test had been fixed

@lincoln-lil
Copy link
Contributor Author

rebased latest master to get rid of failed yarn it case

@lincoln-lil
Copy link
Contributor Author

@lincoln-lil run azure

@lincoln-lil
Copy link
Contributor Author

An irrelevant failure case of es sink https://issues.apache.org/jira/browse/FLINK-28877

@godfreyhe
Copy link
Contributor

An irrelevant failure case of es sink https://issues.apache.org/jira/browse/FLINK-28877

I will merge it

@godfreyhe godfreyhe closed this in 3f18caf Aug 9, 2022
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
…ync mode only) with state to eliminate non-deterministic result

This closes apache#20324
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants