Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-13433][table-planner-blink] Do not fetch data from LookupableTableSource if the JoinKey in left side of LookupJoin contains null value. #9285

Closed
wants to merge 2 commits into from

Conversation

beyond1920
Copy link
Contributor

What is the purpose of the change

For LookupJoin, if joinKey in left side of a LeftOuterJoin/InnerJoin contains null values, there is no need to fetch data from LookupableTableSource.
However, we don't shortcut the fetch function under the case at present, the correctness of results depends on the TableFunction implementation of each LookupableTableSource.

Brief change log

  • correct the behavior of LookupJoin where lookupKeys contains null.
  • Add more testcases.

Verifying this change

testcases

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 31, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit a10620b (Wed Aug 07 08:17:21 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@beyond1920
Copy link
Contributor Author

cc @wuchong @lincoln-lil

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 31, 2019

CI report:

Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've left some minor comments, btw, would it be better to parameterize the synchronous/asynchronous mode for the it case ?

@JingsongLi
Copy link
Contributor

I am interested to sql: SELECT T.id, T.len, T.content, D.name FROM T JOIN userTable for system_time as of T.proctime AS D ON T.id = D.id OR (T.id is null and D.id is null).
Does it work?

@beyond1920
Copy link
Contributor Author

beyond1920 commented Jul 31, 2019

@JingsongLi At first glance, I guess the query is not supported.
However, after run it actually, The query could be executed, but results in error outputs.
Because LookupJoin does not take filterNulls (which says k=v or k is not distinct from v) into consideration.

Good catch, I will looks into the problem.

@beyond1920 beyond1920 force-pushed the FLINK-13433 branch 4 times, most recently from 3e6a308 to 5958000 Compare August 2, 2019 08:32
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beyond1920 , could you also update InMemoryAsyncLookupFunction#eval and InMemoryLookupFunction#eval to throw exception when one of the argument is null?

@beyond1920
Copy link
Contributor Author

beyond1920 commented Aug 5, 2019

@wuchong

could you also update InMemoryAsyncLookupFunction#eval and InMemoryLookupFunction#eval to throw exception when one of the argument is null?

Hi, JarkWu, InMemoryAsyncLookupFunction and InMemoryLookupFunction lookup records with null value on the lookup key field when argument contains null, which complies with the contract mentioned in https://github.com/apache/flink/pull/9335/files, I think the behavior is also reasonable.

@wuchong
Copy link
Member

wuchong commented Aug 5, 2019

I agree with you @beyond1920 , however, I want to have some test to cover we didn't push any nulls into LookupFunction. That's why I want to add such check in InMemoryLookupFunction. We can remove the check when we support push is null into LookupFunction?

…ableSource if the JoinKey in left side of LookupJoin contains null value.
@beyond1920
Copy link
Contributor Author

@wuchong , Ok

@wuchong
Copy link
Member

wuchong commented Aug 7, 2019

Thanks @beyond1920 , looks good to me now.

asfgit pushed a commit that referenced this pull request Aug 7, 2019
…ableSource if the JoinKey in left side of LookupJoin contains null value

This closes #9285
@asfgit asfgit closed this in 4b48eb8 Aug 7, 2019
@KurtYoung
Copy link
Contributor

BTW, could you also open a jira for empty key for hbase lookup function? hbase forbidden empty row key.

@beyond1920
Copy link
Contributor Author

@KurtYoung The hbase lookup function is updated in #9335.

becketqin pushed a commit to becketqin/flink that referenced this pull request Aug 17, 2019
…ableSource if the JoinKey in left side of LookupJoin contains null value

This closes apache#9285
becketqin pushed a commit to becketqin/flink that referenced this pull request Aug 19, 2019
…ableSource if the JoinKey in left side of LookupJoin contains null value

This closes apache#9285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants