Skip to content

[HUDI-6563] Supports flink lookup join#9228

Merged
danny0405 merged 1 commit intoapache:masterfrom
waywtdcc:support_flink_lookup_join
May 20, 2024
Merged

[HUDI-6563] Supports flink lookup join#9228
danny0405 merged 1 commit intoapache:masterfrom
waywtdcc:support_flink_lookup_join

Conversation

@waywtdcc
Copy link
Contributor

@waywtdcc waywtdcc commented Jul 19, 2023

Change Logs

Supports flink lookup join

can use

CREATE TABLE  datagen_source(
                               id  int,
                               name STRING,
                               proctime as PROCTIME()
) WITH (
      'connector' = 'datagen',
      'rows-per-second'='1',
      'number-of-rows' = '2',
     'fields.id.kind'='sequence',
     'fields.id.start'='1',
     'fields.id.end'='2'
 );

select o.id,o.name,b.id as id2
from datagen_source AS o
 join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */   FOR SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; 

This is basically the same as hive's lookup principle. Cache the hudi table data into the memory, set the ttl time, and read it with lookup

Impact

Supports flink lookup join

Risk level (write none, low medium or high below)

low

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@danny0405 danny0405 self-assigned this Jul 21, 2023
@waywtdcc
Copy link
Contributor Author

@hudi-bot run azure

@danny0405
Copy link
Contributor

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

@waywtdcc
Copy link
Contributor Author

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

The FileSystemLookupFunction of flink is reused here.
First time: load all data into each task memory
Subsequent update: the data in the memory will be refreshed at regular intervals

@waywtdcc
Copy link
Contributor Author

@hudi-bot run azure

@danny0405
Copy link
Contributor

Thanks for the contribution @waywtdcc , can you explain in high level how the hudi table is loaded and what is the refresh strategy of the table ?

The FileSystemLookupFunction of flink is reused here. First time: load all data into each task memory Subsequent update: the data in the memory will be refreshed at regular intervals

You mean Flink itself would take care of the data fresh.

@waywtdcc
Copy link
Contributor Author

waywtdcc commented Aug 1, 2023

@danny0405 hello?

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Feb 26, 2024
@danny0405
Copy link
Contributor

@waywtdcc Hi, can you rebase with the latest master and I will take a look of this PR.

@danny0405 danny0405 force-pushed the support_flink_lookup_join branch from 2e76abc to 28351cb Compare May 13, 2024 05:01
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels May 13, 2024
@danny0405 danny0405 force-pushed the support_flink_lookup_join branch 3 times, most recently from 8d29905 to 5292c26 Compare May 14, 2024 10:33
@danny0405 danny0405 changed the title [HUDI-6563]Supports flink lookup join [HUDI-6563] Supports flink lookup join May 14, 2024
@danny0405 danny0405 force-pushed the support_flink_lookup_join branch 4 times, most recently from d1a5bab to ab86410 Compare May 15, 2024 09:36
@danny0405 danny0405 force-pushed the support_flink_lookup_join branch from ab86410 to 5f38864 Compare May 15, 2024 11:17
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 761af87 into apache:master May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:flink Flink integration size:L PR with lines of changes in (300, 1000]

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants