Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Answer Query Using Materialized Views #298

Conversation

avamingli
Copy link
Collaborator

@avamingli avamingli commented Nov 14, 2023

AQUMV for short, is used to compute part or all of a Query from materialized views during planning.
It could provide massive improvements in query processing time, especially for aggregation queries over large tables[1].

AQUMV usually uses Incremental Materialized Views(IMV) as candidates, as IMV usually have the up-to-date data when there are writable operations on related tables.

Example:

create table aqumv_t1(c1 int, c2 int, c3 int) distributed by (c1);
insert into aqumv_t1 select i, i+1, i+2 from generate_series(1, 100000000) i;
analyze aqumv_t1;
gpadmin=# explain(costs off)
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------
---------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on aqumv_t1
         Filter: ((c1 > 30) AND (c1 < 40) AND (sqrt((abs(c2))::double precision) > '5.8'::double pre
cision))
 Optimizer: Postgres query optimizer
(4 rows)

select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
       sqrt
-------------------
                 6
 6.082762530298219
 6.244997998398398
 5.916079783099616
 6.164414002968976
 6.324555320336759
 5.830951894845301
(7 rows)

Time: 7384.329 ms (00:07.384)

Answer Query Using Materialized Views:

create materialized view mvt1 as
  select c1 as mc1, c2 as mc2, abs(c2) as mc3, abs(abs(c2) - c1 - 1) as mc4
  from aqumv_t1 where c1 > 30 and c1 < 40;
analyze mvt1;

set answer_query_using_materialized_views = on;
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
       sqrt
-------------------
                 6
 6.082762530298219
 6.244997998398398
 5.830951894845301
 5.916079783099616
 6.164414002968976
 6.324555320336759
(7 rows)

Time: 45.701 ms

gpadmin=# explain(verbose, costs off)
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: (sqrt(((mc4 + mc3))::double precision))
   ->  Seq Scan on public.mvt1
         Output: sqrt(((mc4 + mc3))::double precision)
         Filter: (sqrt((mvt1.mc3)::double precision) > '5.8'::double precision)
 Settings: answer_query_using_materialized_views = 'on', optimizer = 'off'
 Optimizer: Postgres query optimizer
(7 rows)

This perfect example shows AQUMV's magic,
The mv has the quals : c1 > 30 and c1 < 40 from table aqumv_t1, and the rows we want to query c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8.

It means that all rows in mv meet the requirement c1 > 30 and c1 < 40 and mvt1 has column mc3 corresponding to abs(c2) in aqumv_t1, so that adding a qual sqrt(mc3) > 5.8 to mvt1 will filter all rows we want.

And the target list we want is sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) while mvt1 has the column : abs(abs(c2) - c1 - 1) AS mc4, abs(c2) AS mc3, we could compute the target expression from mvt1:
sqrt(mc4 + mc3).

And the query :

select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;

could be rewritten to:

select sqrt(mc4 + mc3) from mvt1 where sqrt(mc3) > 5.8;

This example shows AQUMV has significant improvements Time: 7384.329 ms (00:07.384) -> Time: 45.701 ms, not only the rows are reduced to the corresponding results, but also expressions cloud be eliminated for each row.

See more in README.cbdb.aqumv and the codes and reference[1].
And internal talk in feishu(Chinese): https://hashdata.feishu.cn/minutes/obcn419j6v19e47snk2pfj6e
slide: https://hashdata.feishu.cn/file/GeMBbVRMNowL7Cxh52acWJD6n0e

AQUMV is actually a Equivalent Transformation on Query tree.
A materialized view(MV) could be use to compute a Query if:

  1. The view contains all rows needed by the query expression.
    If MV has more rows than query wants, additional filter may be addedif possible.
  2. All output expressions can be computed from the output of the view.
    The output expressions could be fully or partially matched from MV's target list.
  3. Cost-based.
    There may be multiple valid MV candidates, or select from MV is not better than select from origin table(ex: has an index and etc), let planner decide the best one.

Construct rows by splitting MV query quals(mv_query_quals) and Query quals (origin_query_quals) to difference set and intersection set. And post_quals formed by:{origin_query_quals - mv_query_quals} will be processed by MV query's target list, and rewritten to MV relation's target list expressions.

Construct columns expressions using a Greedy Algorithm. Sort the MV query's target list by complexity, and try to rewrite expressions by that order.
Expressions that have no Vars are kept to upper(Const Expressions) or be rewritten if there were corresponding expressions.

This pr is a start of AQUMV, for MVP0:
Only support SELECT FROM a single relation both for mv_query and the origin_query.
Below are not supported now:

  • AGG
  • Subquery
  • Order by(for origin_query)
  • Join
  • Sublink
  • Group by
  • Window Functions
  • CTE
  • Distinct On
  • Refresh Materialized View
  • Create AS

Reference:
[1] Optimizing Queries Using Materialized Views: A Practical,
Scalable Solution.
https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf

Authored-by: Zhang Mingli avamingli@gmail.com

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

@avamingli avamingli self-assigned this Nov 14, 2023
@avamingli
Copy link
Collaborator Author

FYI: This pr is based on Incremental Materialized Views(IMV), should wait for #280 merged first, and a FIXME:
/* AQUMV_FIXME_MVP: TOOD use Incremental Materialized View. */ resolved.

@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 0e71814 to a7f459e Compare November 14, 2023 02:55
src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved
src/test/regress/sql/aqumv.sql Show resolved Hide resolved
@yjhjstz
Copy link
Collaborator

yjhjstz commented Nov 17, 2023

Datalog program:

Q(T,Y,D) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
                   Director(I,D),Actor(I,D)

V2 (I,T,Y) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"

V3(I,D) :−Director(I,D),Actor(ID,D)

V2 and V3 are useful for answering Q.
Q'(T,Y,D) :− V2(I,T,Y), V3(I,D)
How do we express this case ?

@avamingli
Copy link
Collaborator Author

Datalog program:

Q(T,Y,D) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
                   Director(I,D),Actor(I,D)

V2 (I,T,Y) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"

V3(I,D) :−Director(I,D),Actor(ID,D)

V2 and V3 are useful for answering Q. Q'(T,Y,D) :− V2(I,T,Y), V3(I,D) How do we express this case ?

We have an offline discussion, it's not related to this MVP0, may be considered in the future.
MVP0 has the ability to process multiple mv candidates but only for one relation, see more in READEME.cbdb.aqumv

@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from a7f459e to 2e06466 Compare November 19, 2023 13:40
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
src/backend/optimizer/README.cbdb.aqumv Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved
src/backend/optimizer/plan/planner.c Show resolved Hide resolved
@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 2e06466 to b584b5f Compare November 23, 2023 07:05
@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from b584b5f to e977d0e Compare November 24, 2023 05:58
@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from e977d0e to 4078ac3 Compare November 24, 2023 07:33
@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 4078ac3 to 9c2951c Compare December 1, 2023 03:01
AQUMV for short, is used to compute part or all of a Query from
materialized views during planning.
It could provide massive improvements in query processing time,
especially for aggregation queries over large tables[1].

AQUMV usually uses Incremental Materialized Views(IMV) as candidates,
as IMV usually have real time data when there are writable operations
on related tables.

AQUMV is actually a Equivalent Transformation on Query tree.
A materialized view(MV) could be use to compute a Query if:
1.The view contains all rows needed by the query expression.
  If MV has more rows than query wants, additional filter may be added
  if possible.
2.All output expressions can be computed from the output of the view.
  The output expressions could be fully or partially matched from MV's
  target list.
3.Cost-based.
  There may be multiple valid MV candidates, or select from MV is not
  better than select from origin table(ex: has an index and etc),
  let planner decide the best one.

Construct rows by splitting MV query quals(mv_query_quals) and Query quals
(origin_query_quals) to difference set and intersection set.
And post_quals formed by:{origin_query_quals - mv_query_quals} will be
processed by MV query's target list, and rewritten to MV relation's
target list expressions.

Construct columns expressions using a Greedy Algorithm.
Sort the MV query's target list by complexity, and try to rewrite expressions
by that order.
Expressions that have no Vars are kept to upper(Const Expressions) or be
rewritten if there were corresponding expressions.

Reference:
   [1] Optimizing Queries Using Materialized Views: A Practical,
Scalable Solution.
   https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf

Authored-by: Zhang Mingli avamingli@gmail.com
@avamingli avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 9c2951c to 7522c1c Compare December 1, 2023 03:50
@yjhjstz
Copy link
Collaborator

yjhjstz commented Dec 1, 2023

LGTM

@my-ship-it my-ship-it merged commit 2ffc7e6 into cloudberrydb:main Dec 1, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants