Skip to content

Conversation

@crossoverJie
Copy link
Contributor

@crossoverJie crossoverJie commented Sep 10, 2024

Why I'm doing:

Create three tables(2 partiton tables, 1 non-partition table):

CREATE TABLE IF NOT EXISTS test.par_tbl1
(
    datekey DATETIME,
    k1      INT,
    item_id STRING,
    v2      INT
)PRIMARY KEY (`datekey`,`k1`)
 PARTITION BY date_trunc('day', `datekey`);

 CREATE TABLE IF NOT EXISTS test.par_tbl2
(
    datekey DATETIME,
    k1      INT,
    item_id STRING,
    v2      INT
)PRIMARY KEY (`datekey`,`k1`)
 PARTITION BY date_trunc('day', `datekey`);

 CREATE TABLE IF NOT EXISTS test.par_tbl3
(
    datekey DATETIME,
    k1      INT,
    item_id STRING,
    v2      INT
)
 PRIMARY KEY (`datekey`,`k1`);

Create MaterializedView:

CREATE
MATERIALIZED VIEW test.mv_test
REFRESH ASYNC
PARTITION BY a_time
PROPERTIES (
"excluded_trigger_tables" = "par_tbl3"
)
AS
select date_trunc("day", a.datekey) as a_time, date_trunc("day", b.datekey) as b_time,date_trunc("day", c.datekey) as c_time
from test.par_tbl1 a
         left join test.par_tbl2 b on a.datekey = b.datekey and a.k1 = b.k1
         left join test.par_tbl3 c on a.k1 = c.k1;

When I have three tables, par_tbl1 and par_tbl2 are partition table, and par_tbl3 is non partition table.

UPDATE `par_tbl1` SET `v2` = 2 WHERE `datekey` = '2024-08-05 01:00:00' AND `k1` = 3;
UPDATE `par_tbl3` SET `item_id` = '3' WHERE `datekey` = '2024-10-01 01:00:00' AND `k1` = 3;

When I update partiton and non-partition table (par_tbl1, par_tbl3), all data in par_tbl1 and par_tbl2 will be refreshed once.

The expected result is to only refresh the modified data in par_tbl1 to the MaterializedView.

Use this SQL to view the task execution records.

SELECT * FROM information_schema.task_runs order by create_time desc;

What I'm doing:

Therefore, I have added a new attribute: excluded_refresh_base_tables to the MaterializedView.
As a result, even if both the partitioned table and the non-partitioned table are updated simultaneously, only the updated data in the partitioned table will be refreshed.

like this:

CREATE
MATERIALIZED VIEW test.mv_test
REFRESH ASYNC
PARTITION BY a_time
PROPERTIES (
"excluded_trigger_tables" = "par_tbl3",
"excluded_refresh_base_tables"="par_tbl3"
)
AS
select date_trunc("day", a.datekey) as a_time, date_trunc("day", b.datekey) as b_time,date_trunc("day", c.datekey) as c_time
from test.par_tbl1 a
         left join test.par_tbl2 b on a.datekey = b.datekey and a.k1 = b.k1
         left join test.par_tbl3 c on a.k1 = c.k1;

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
Signed-off-by: crossoverJie <crossoverJie@gmail.com>
@crossoverJie crossoverJie requested a review from a team as a code owner September 10, 2024 10:52
@github-actions github-actions bot added the 3.3 label Sep 10, 2024
@crossoverJie
Copy link
Contributor Author

@LiShuMing Please take a look, thanks.

@LiShuMing
Copy link
Contributor

LiShuMing commented Sep 11, 2024

Thanks for your contributions.

Can you figure out what's the exact differences between excluded_refresh_base_tables and excluded_trigger_tables?

I doubt we can fix above problem in excluded_trigger_tables option?

@crossoverJie
Copy link
Contributor Author

Thanks for your reply.

Thanks for your contributions.

Can you figure out what's the exact differences between excluded_refresh_base_tables and excluded_trigger_tables?

I doubt we can fix above problem in excluded_trigger_tables option?

In the current situation, the role of excluded_trigger_tables is to prevent the creation of refresh tasks for MaterializedView when par_tbl3(non-partiton) changes.

Even with this configuration, if other partitioned tables change(par_tbl1 or par_tbl2), it will still trigger data refresh, which will lead to refreshing all the data in the partitioned tables, whereas our expectation is to only refresh the data that has been modified in the partitioned tables.

UPDATE `par_tbl1` SET `v2` = 2 WHERE `datekey` = '2024-08-05 01:00:00' AND `k1` = 3;
UPDATE `par_tbl3` SET `item_id` = '3' WHERE `datekey` = '2024-10-01 01:00:00' AND `k1` = 3;

Expect: Only refresh the data:

select * from `par_tbl1` WHERE `datekey` = '2024-08-05 01:00:00' AND `k1` = 3;

The trigger code:

if (materializedView.shouldTriggeredRefreshBy(db.getFullName(), table.getName())) {


The excluded_refresh_base_tables can determine during the refresh task execution whether all data needs to be refreshed, thus achieving the goal of only refreshing the updated data to the MaterializedView.

The key code:

boolean isRefreshMvBaseOnNonRefTables = needsRefreshBasedOnNonRefTables(snapshotBaseTables);
Set<String> mvRangePartitionNames = getMVPartitionNamesWithTTL(mv, start, end, partitionTTLNumber, isAutoRefresh);
LOG.info("Get partition names by range with partition limit, start: {}, end: {}, force:{}, " +
"partitionTTLNumber: {}, isAutoRefresh: {}, mvRangePartitionNames: {}, isRefreshMvBaseOnNonRefTables:{}",
start, end, force, partitionTTLNumber, isAutoRefresh, mvRangePartitionNames, isRefreshMvBaseOnNonRefTables);
// check non-ref base tables or force refresh
if (force || isRefreshMvBaseOnNonRefTables) {

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
Signed-off-by: crossoverJie <crossoverJie@gmail.com>
LiShuMing
LiShuMing previously approved these changes Oct 14, 2024
Signed-off-by: crossoverJie <crossoverJie@gmail.com>
LiShuMing
LiShuMing previously approved these changes Oct 15, 2024
Signed-off-by: crossoverJie <crossoverJie@gmail.com>
@LiShuMing LiShuMing changed the title [Enhancement] Add properties excluded_refresh_base_tables for MaterializedView [Enhancement] Add properties excluded_refresh_tables for MaterializedView Oct 15, 2024
LiShuMing
LiShuMing previously approved these changes Oct 15, 2024
Signed-off-by: crossoverJie <crossoverJie@gmail.com>
@sonarqubecloud
Copy link

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 46 / 56 (82.14%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/alter/AlterMVJobExecutor.java 3 9 33.33% [120, 125, 251, 252, 253, 254]
🔵 com/starrocks/catalog/MaterializedView.java 11 14 78.57% [1194, 1195, 1201]
🔵 com/starrocks/common/util/PropertyAnalyzer.java 21 22 95.45% [510]
🔵 com/starrocks/catalog/MvRefreshArbiter.java 1 1 100.00% []
🔵 com/starrocks/catalog/TableProperty.java 10 10 100.00% []

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@LiShuMing LiShuMing merged commit 58fc586 into StarRocks:main Oct 17, 2024
@github-actions
Copy link
Contributor

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Oct 17, 2024
@mergify
Copy link
Contributor

mergify bot commented Oct 17, 2024

backport branch-3.3

✅ Backports have been created

Details

@LiShuMing
Copy link
Contributor

Merged, THX for your contribution.

mergify bot pushed a commit that referenced this pull request Oct 17, 2024
…View (#50926)

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
(cherry picked from commit 58fc586)
dirtysalt pushed a commit to dirtysalt/starrocks that referenced this pull request Oct 18, 2024
…View (StarRocks#50926)

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
wanpengfei-git pushed a commit that referenced this pull request Oct 19, 2024
…View (backport #50926) (#52037)

Co-authored-by: crossoverJie <crossoverJie@gmail.com>
ZiheLiu pushed a commit to ZiheLiu/starrocks that referenced this pull request Oct 31, 2024
…View (StarRocks#50926)

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
renzhimin7 pushed a commit to renzhimin7/starrocks that referenced this pull request Nov 7, 2024
…View (StarRocks#50926)

Signed-off-by: crossoverJie <crossoverJie@gmail.com>
Signed-off-by: zhiminr.ren <1240388654@qq.com>
@bulolo
Copy link

bulolo commented Dec 31, 2024

请问这个具体改变了什么?这边测试好像无效

https://junyao.tech/posts/ee5c2ad.html

@crossoverJie
Copy link
Contributor Author

请问这个具体改变了什么?这边测试好像无效

junyao.tech/posts/ee5c2ad.html

@bulolo Now it supports the base table as a materialized view #56428.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants