Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Add support for aliases in GROUP BY. Add Postgres-style date_part. #1607

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lmwnshn
Copy link
Contributor

@lmwnshn lmwnshn commented Jun 2, 2021

Heading

Add support for aliases in GROUP BY. Add Postgres-style date_part.

Description

Add support for aliases in GROUP BY. Add Postgres-style date_part.

Note that the Postgres parser rewrites EXTRACT(YEAR FROM some_timestamp)
as date_part("year", some_timestamp).

Note that the optimizer gives us a bad OutputSchema when the GROUP BY expression
is on an aggregated column of the table, so this does not fix chbenchmark. It is currently
unclear if this is primarily a problem with the binder, optimizer, or translator.

Reasons:

  • Translator: This is where the output schema kaboom happens.
  • Optimizer: This produced the output schema. Maybe it never accounted for select abs(a) as x ... group by x.
  • Binder: But maybe the optimizer would be fine if the binder wasn't swapping out expressions (quite literally, the group by x child becomes the expression abs(a) in the above example. This seems to work for ORDER BY fwiw.)

This is an example of the OutputSchema produced by the optimizer for the following query:

SELECT n_name,
       extract(YEAR
               FROM o_entry_d) AS l_year,
       sum(ol_amount) AS sum_profit
FROM item,
     stock,
     supplier,
     order_line,
     oorder,
     nation
WHERE ol_i_id = s_i_id
  AND ol_supply_w_id = s_w_id
  AND MOD ((s_w_id * s_i_id), 10000) = su_suppkey
  AND ol_w_id = o_w_id
  AND ol_d_id = o_d_id
  AND ol_o_id = o_id
  AND ol_i_id = i_id
  AND su_nationkey = n_nationkey
  AND i_data LIKE '%bb'
GROUP BY n_name,
         l_year
ORDER BY n_name,
         l_year DESC;
>>> pprint.pprint({"columns":[{"expr":{"alias":"","children":[],"depth":-1,"expression_name":"","expression_type":"VALUE_TUPLE","has_subquery":false,"return_value_type":6,"tuple_idx":1,"value_idx":0},"name":"sum_profit","type":6},{"expr":{"alias":"","children":[],"depth":-1,"expression_name":"","expression_type":"VALUE_TUPLE","has_subquery":false,"return_value_type":10,"tuple_idx":0,"value_idx":0},"name":"n_name","type":10}]})
{'columns': [{'expr': {'alias': '',
                       'children': [],
                       'depth': -1,
                       'expression_name': '',
                       'expression_type': 'VALUE_TUPLE',
                       'has_subquery': False,
                       'return_value_type': 6,
                       'tuple_idx': 1,
                       'value_idx': 0},
              'name': 'sum_profit',
              'type': 6},
             {'expr': {'alias': '',
                       'children': [],
                       'depth': -1,
                       'expression_name': '',
                       'expression_type': 'VALUE_TUPLE',
                       'has_subquery': False,
                       'return_value_type': 10,
                       'tuple_idx': 0,
                       'value_idx': 0},
              'name': 'n_name',
              'type': 10}]}

Further work

This is enough to merge on its own, seeing if it gets through CI.

Note that the Postgres parser rewrites EXTRACT(YEAR FROM some_timestamp)
as date_part("year", some_timestamp).

Note that the optimizer gives us a bad OutputSchema when the GROUP BY expression
is on a column of the table, so this does not fix chbenchmark. It is currently
unclear if this is primarily a problem with the binder, optimizer, or translator.
@lmwnshn lmwnshn added in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. ready-for-ci Indicate that this build should be run through CI. labels Jun 2, 2021
@lmwnshn lmwnshn self-assigned this Jun 2, 2021
@apavlo
Copy link
Member

apavlo commented Jun 3, 2021

@lmwnshn This is a hot PR!

@noisepage-checks
Copy link

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

@lmwnshn lmwnshn added ready-for-review This PR passes all checks and is ready to be reviewed. Mark PRs with this. and removed in-progress This PR is being actively worked on and not ready to be reviewed or merged. Mark PRs with this. labels Jun 4, 2021
@noisepage-checks
Copy link

Performance Boost!

Nice job! This PR has increased the throughput of the system.

Could not find any performance results to compare for this commit.

@noisepage-checks
Copy link

Minor Decrease in Performance

Be warned: this PR may have decreased the throughput of the system slightly.

tps (%change) benchmark_type wal_device details
-0.94% tpcc RAM disk
Detailsmaster tps=22192.31, commit tps=21983.07, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=RAM disk, max_connection_threads=32
-2.05% tpcc None
Detailsmaster tps=29640.72, commit tps=29032.2, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=None, max_connection_threads=32
-0.63% tpcc HDD
Detailsmaster tps=21889.82, commit tps=21752.79, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=HDD, max_connection_threads=32
7.81% tatp RAM disk
Detailsmaster tps=6430.86, commit tps=6932.9, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=RAM disk, max_connection_threads=32
-0.18% tatp None
Detailsmaster tps=7417.62, commit tps=7403.93, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=None, max_connection_threads=32
6.65% tatp HDD
Detailsmaster tps=6302.21, commit tps=6721.46, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=HDD, max_connection_threads=32

@noisepage-checks
Copy link

Minor Decrease in Performance

Be warned: this PR may have decreased the throughput of the system slightly.

tps (%change) benchmark_type wal_device details
1.21% tpcc RAM disk
Detailsmaster tps=22501.62, commit tps=22774.19, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=RAM disk, max_connection_threads=32
-0.33% tpcc None
Detailsmaster tps=28985.3, commit tps=28890.16, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=None, max_connection_threads=32
2.1% tpcc HDD
Detailsmaster tps=21689.03, commit tps=22144.32, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=HDD, max_connection_threads=32
8.72% tatp RAM disk
Detailsmaster tps=6387.57, commit tps=6944.71, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=RAM disk, max_connection_threads=32
1.62% tatp None
Detailsmaster tps=7309.67, commit tps=7428.41, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=None, max_connection_threads=32
9.38% tatp HDD
Detailsmaster tps=6168.58, commit tps=6747.12, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=HDD, max_connection_threads=32

@codecov
Copy link

codecov bot commented Jun 8, 2021

Codecov Report

Merging #1607 (ef282f2) into master (39d598b) will decrease coverage by 0.04%.
The diff coverage is 64.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1607      +/-   ##
==========================================
- Coverage   80.73%   80.69%   -0.05%     
==========================================
  Files         751      751              
  Lines       52798    52873      +75     
==========================================
+ Hits        42627    42665      +38     
- Misses      10171    10208      +37     
Impacted Files Coverage Δ
src/execution/vm/vm.cpp 83.60% <0.00%> (-0.31%) ⬇️
src/include/binder/bind_node_visitor.h 100.00% <ø> (ø)
src/include/execution/ast/builtins.h 100.00% <ø> (ø)
src/include/execution/vm/bytecode_handlers.h 69.08% <0.00%> (-0.50%) ⬇️
src/include/execution/vm/bytecodes.h 96.42% <ø> (ø)
src/execution/sema/sema_builtin.cpp 61.12% <30.00%> (-0.33%) ⬇️
src/execution/vm/bytecode_generator.cpp 83.75% <52.38%> (-0.27%) ⬇️
src/binder/bind_node_visitor.cpp 84.72% <84.31%> (-0.52%) ⬇️
src/catalog/postgres/pg_proc_impl.cpp 96.84% <100.00%> (+0.04%) ⬆️
src/include/parser/select_statement.h 75.78% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39d598b...ef282f2. Read the comment docs.

@lmwnshn
Copy link
Contributor Author

lmwnshn commented Aug 10, 2021

@lmwnshn fix up poke matt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ready-for-ci Indicate that this build should be run through CI. ready-for-review This PR passes all checks and is ready to be reviewed. Mark PRs with this.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants