Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24854: Incremental MV refresh in presence of update/delete operations #2119

Merged
merged 31 commits into from
Apr 14, 2021

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Mar 25, 2021

What changes were proposed in this pull request?

Implement incremental materialized view rebuild when any of the source tables has update/delete operations since the last rebuild.

Steps before generating incremental rebuild plan, these are already implemented:

  1. Load materialization from Registry/Metastore and update it with invalidation info.
  2. Create MV rebuild plan using Union:
    2.1. branch for scanning the existing view
    2.2. branch for executing the view definition query but filtering out rows from source tables already exists in the view.

Prerequisite:

  • Source tables are not compacted since the last MV rebuild
  • Any of the source tables has update/delete operations since the last rebuild.
  • View definition does not have aggregation - this will be tackled by a follow-up patch

Basic workflow:

  1. Rewrite the plan having Union to an incremental rebuild plan: Replace Union operator with Right outer join: left input MV scan, right input: query delta rows (HiveJoinIncrementalRewritingRule)
  2. applyPreJoinOrderingTransforms
  3. Propagate boolean column rowIsDeleted and enable fetching deleted rows in HiveTableScan operators in the right branch of the Right outer join which queries the delta rows since last MV rebuild. (HiveRowIsDeletedPropagatorRule)
  4. Continue with CBO and ASTConversion
  5. Rewrite the new AST to a multi insert statement: (CalcitePlanner.fixUpASTJoinIncrementalRebuild())
    5.1. Query the delta rows
    5.2. First insert into MV delete delta; filter: rowIsDeleted
    5.2. Second insert into MV delta; filter: not rowIsDeleted

Why are the changes needed?

This is an extension of existing incremental rebuild.

Does this PR introduce any user-facing change?

Yes. Plan of Materialized view rebuild may change when user executes commands like EXPLAIN [CBO] ALTER MATERIALIZED VIEW <mv name> REBUILD;

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_5.q,materialized_view_create_rewrite_4.q,materialized_view_create_rewrite_dummy.q,materialized_view_create_rewrite_3.q -pl itests/qtest -Pitests

@kasakrisz kasakrisz marked this pull request as draft March 25, 2021 06:54
@kasakrisz kasakrisz changed the title HIVE-24854: Incremental MV refresh in presence of update/delete operations [Draft] HIVE-24854: Incremental MV refresh in presence of update/delete operations Mar 25, 2021
@kasakrisz kasakrisz force-pushed the HIVE-24854-master-mv-inc-delete branch from 20ebfdd to a6b5f9b Compare March 25, 2021 20:21
@kasakrisz kasakrisz force-pushed the HIVE-24854-master-mv-inc-delete branch from a6b5f9b to 6574400 Compare March 26, 2021 06:51
@kasakrisz kasakrisz force-pushed the HIVE-24854-master-mv-inc-delete branch from 6574400 to e7eb3ff Compare March 28, 2021 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants