Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge alts & perf improvements #2378

Merged
merged 13 commits into from
Nov 9, 2021
Merged

merge alts & perf improvements #2378

merged 13 commits into from
Nov 9, 2021

Conversation

OleksiiKovalov
Copy link
Contributor

@OleksiiKovalov OleksiiKovalov commented Nov 1, 2021

What does this PR do?
Lots recursion->kleene operators conversion for lists
a_expr rewritten in "old style" to minimize ambiguities
merged alts for some rules
Why?
now grammar looks uniform with other grammars
How is it checked?
Tested using unit tests over the tests scripts from the original PostgreSQL repository

Pro

  1. parse time improvements, ~10%
  2. Less ambiguities when parsing expression -> less memory consumption/better performance

Contra

  1. Parsed tree heavily changed for a_expr
  2. a_expr tree is much deeper even for simple expressions

@OleksiiKovalov OleksiiKovalov marked this pull request as ready for review November 1, 2021 13:16
Copy link
Member

@KvanTTT KvanTTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a_expr rewritten in "old style" to minimize ambiguities

Not sure it's a good change. Have you tested performance before and after only that change (ignoring merging of alternatives and applying kleene operators)? Not recursive rules create a lot of redundant nodes in the parse tree and they take up extra memory. Also, they uglify the grammar.

sql/postgresql/PostgreSQLParser.g4 Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
sql/postgresql/PostgreSQLParser.g4 Outdated Show resolved Hide resolved
@OleksiiKovalov
Copy link
Contributor Author

in short

  1. I do not merge all possible alts/common parts as it does not affect (significantly) the parse time of a typical SQL script
  2. I do not change (yet) formatting - I'm using antlr formatter to format grammar and plan to do partial manual formatting when grammar becomes stable. Also maybe @teverett can take a look/help with formatting for complex/long rules?

regarding a_expr:
yup, I'm worried about this change too - mostly 'cause it heavily changes result tree
however, it greatly reduces the number of ambiguities (from hundreds of occurrences to isolated cases) and looks like antlr runtime consumes less memory. yep, the result tree is deeper, but it is compensated by the gain from runtime resources.
It looks ugly, agreed, but seems like it works better, faster, and with fewer resources.

@kaby76
Copy link
Contributor

kaby76 commented Nov 4, 2021

There are a large number of transformations in this PR. Is it possible to break this PR into several? For example, one can focus on kleene-operator rewrites; a second can focus on alt merging; a third can focus on (non-controversial) reformating. As it is, this is a lot to go over. And, there are several transformations that are missing or incorrect.

For example, there are three kleene rewrites possible before alter_table_cmds that you have missed (opt_execute_using_list, proc_exceptions, proc_conditions).

And, just reading some of the alt-merges, createfunctionstamt is not strictly equivalent because it now accepts create ...procedure ... returns ..., i.e., a procedure that returns a value which it shouldn't unless a semantic predicate is added.

I can check these changes against what my tools in Trash that do, which includes these two transformations (and a ML reformatter), but many more.

And, by the way, the changes here are excellent. Thanks for making this PR.

--Ken

@OleksiiKovalov
Copy link
Contributor Author

OleksiiKovalov commented Nov 4, 2021

hi @kaby76 , thanks for the comment
in short
I did not all possible cleanup/transformation. the main goal (for me) now is to make grammar faster when parsing "everyday scripts" - mainly DML/plsql. I assume that DDL scripts/system setting scripts are not often used, so if such scripts will be not parsed as fast as they can - no problem, we will not see significant performance penalties.
However, I plan to make more transformations in the future - it looks weird when the same grammar uses at least 2 different approaches for similar rules/cases.

yes, not all (but not so many) transformations are strict, eg createfunctionstamt . I suppose that grammar will be used to parse syntactically correct script - and will not be used to check the syntax of the script, so if grammar will be able to parse to "non-allowed" statement variations - it is ok.

Why do I do it that way? I do it for my practical usage in some pet-project tool, so, obviously, I do changes, needed specifically to me - at the 1st place.

About breaking PR into several smallest.
Changes were made not by type - eg "kleene" / "merge alts" etc, but "by hot points" - using parse duration info/profile info I've made changes to rules which were "hottest", or by DML statement - optimization to select/insert/delete/update statements, or by pl/sql statements
I can re-do the same changes "from scratch", it is not a big deal. @KvanTTT what do you think?

btw, @kaby76 , what do you think about rewriting a_expr using old-style?
It looks faster than before, but it is much harder to maintain and it produced a deeper parsed tree even in simple cases (for c# I plan to handle this situation y collapsing tree in a_expr post-action processor)

@kaby76
Copy link
Contributor

kaby76 commented Nov 4, 2021

@OleksiiKovalov Sorry, I haven't looked over the changes that much. I only noticed the change an hour or two ago. I'll start with the non-controversial (i.e., easy) kleene rewrites to verify they are good.

@kaby76
Copy link
Contributor

kaby76 commented Nov 4, 2021

Again, thank you @OleksiiKovalov for this PR.

I am first looking over the +- and *-closure transforms, which my tool can do, and compared what the tool produces vs. what you wrote. The tool discovered 61 of the rules that could be rewritten and resulted in the exact same rewrite (labeled in a table below with '='). 31 additional rules you missed that the tool pointed out and modified (labeled below with a '>'). 4 rules that you modified my tool also modified, but came up with different results (labeled with both '>' and '<'). I looked over the 4 rules that were different and 2 seemed ok, but 2 other rules were modified wrong by my tool. So, this is great--something I need to fix. If you like, you could add the rules labeled with a '>' in the list below. However, you should check the answer that my tool produces. I don't trust it yet.

BTW, the tool is important because I am using it to modify the C++14/17/20/23 grammars I am writing, derived from the ISO C++ Specs. So, I need it to work flawlessly.

> optrolelist : createoptroleelem * ;
> alteroptrolelist : alteroptroleelem * ;
> optschemaeltlist : schema_stmt * ;

= alter_table_cmds : alter_table_cmd ( COMMA alter_table_cmd ) * ;

= alter_identity_column_option_list : alter_identity_column_option + ;

= hash_partbound : hash_partbound_elem ( COMMA hash_partbound_elem ) * ;

= alter_type_cmds : alter_type_cmd ( COMMA alter_type_cmd ) * ;

= copy_opt_list : copy_opt_item * ;

= copy_generic_opt_list : copy_generic_opt_elem ( COMMA copy_generic_opt_elem ) * ;

= copy_generic_opt_arg_list : copy_generic_opt_arg_list_item ( COMMA copy_generic_opt_arg_list_item ) * ;

> colquallist : colconstraint * ;
> tablelikeoptionlist : ( ( INCLUDING | EXCLUDING ) tablelikeoption ) * ;
> seqoptlist : seqoptelem + ;
> create_extension_opt_list : create_extension_opt_item * ;
> alter_extension_opt_list : alter_extension_opt_item * ; 
> fdw_options : fdw_option + ;
> triggerevents : triggeroneevent ( OR triggeroneevent ) * ;
> triggertransitions : triggertransition + ;
> triggerfuncargs : ( triggerfuncarg | ) ( COMMA triggerfuncarg ) * ;
> constraintattributespec : constraintattributeElem * ;

= event_trigger_when_list : event_trigger_when_item ( AND event_trigger_when_item ) * ;

= event_trigger_value_list : sconst ( COMMA sconst ) * ;

= def_list : def_elem ( COMMA def_elem ) * ;

= old_aggr_list : old_aggr_elem ( COMMA old_aggr_elem ) * ;

= enum_val_list : sconst ( COMMA sconst ) * ;

= opclass_item_list : opclass_item ( COMMA opclass_item ) * ;

= opclass_drop_list : opclass_drop ( COMMA opclass_drop ) * ;

= any_name_list : any_name ( COMMA any_name ) * ;

> attrs : ( DOT attr_name + ) ;
> type_name_list : typename ( COMMA typename ) * ;
> privilege_list : privilege ( COMMA privilege ) * ;

= grantee_list : grantee ( COMMA grantee ) * ;

> defacloptionlist : defacloption * ;
> index_params : index_elem ( COMMA index_elem ) * ;
> index_including_params : index_elem ( COMMA index_elem ) * ;

= func_args_list : func_arg ( COMMA func_arg ) * ;

= function_with_argtypes_list : function_with_argtypes ( COMMA function_with_argtypes ) * ;

= func_args_with_defaults_list : func_arg_with_default ( COMMA func_arg_with_default ) * ;

= aggr_args_list : aggr_arg ( COMMA aggr_arg ) * ;

= aggregate_with_argtypes_list : aggregate_with_argtypes ( COMMA aggregate_with_argtypes ) * ;

> transform_type_list : ( FOR TYPE_P typename ) ( COMMA FOR TYPE_P typename ) * ;

= table_func_column_list : table_func_column ( COMMA table_func_column ) * ;

= alterfunc_opt_list : common_func_opt_item + ;

> any_operator : ( colid DOT ) * all_op ;

= operator_with_argtypes_list : operator_with_argtypes ( COMMA operator_with_argtypes ) * ;

= dostmt_opt_list : dostmt_opt_item + ;

= reindex_option_list : reindex_option_elem ( COMMA reindex_option_elem ) * ;

= operator_def_list : operator_def_elem ( COMMA operator_def_elem ) * ;

= publication_name_list : publication_name_item ( COMMA publication_name_item ) * ;

> ruleactionmulti : ruleactionstmtOrEmpty ( SEMI ruleactionstmtOrEmpty ) * ;

< transaction_mode_list : transaction_mode_item ( COMMA ? transaction_mode_item ) * ;
> transaction_mode_list : transaction_mode_item ( COMMA transaction_mode_item | transaction_mode_item ) * ;

= createdb_opt_items : createdb_opt_item + ;

= drop_option_list : drop_option ( COMMA drop_option ) * ;

= vac_analyze_option_list : vac_analyze_option_elem ( COMMA vac_analyze_option_elem ) * ;

= vacuum_relation_list : vacuum_relation ( COMMA vacuum_relation ) * ;

= explain_option_list : explain_option_elem ( COMMA explain_option_elem ) * ;

= insert_column_list : insert_column_item ( COMMA insert_column_item ) * ;

= set_clause_list : set_clause ( COMMA set_clause ) * ;

= set_target_list : set_target ( COMMA set_target ) * ;

= cursor_options : ( NO SCROLL | SCROLL | BINARY | INSENSITIVE ) * ;

> simple_select : ( SELECT ( opt_all_clause into_clause opt_target_list | distinct_clause target_list ) into_clause from_clause where_clause group_clause having_clause window_clause | values_clause | TABLE relation_expr | select_with_parens set_operator_with_all_or_distinct ( simple_select | select_with_parens ) ) ( set_operator_with_all_or_distinct ( simple_select | select_with_parens ) ) * ;

= cte_list : common_table_expr ( COMMA common_table_expr ) * ;

= group_by_list : group_by_item ( COMMA group_by_item ) * ;

= for_locking_items : for_locking_item + ;

< values_clause : VALUES OPEN_PAREN expr_list CLOSE_PAREN ( COMMA OPEN_PAREN expr_list CLOSE_PAREN ) * ;
> values_clause : ( VALUES OPEN_PAREN expr_list CLOSE_PAREN ) ( COMMA OPEN_PAREN expr_list CLOSE_PAREN ) * ;

> table_ref : ( relation_expr opt_alias_clause tablesample_clause ? | func_table func_alias_clause | xmltable opt_alias_clause | select_with_parens opt_alias_clause | LATERAL_P ( xmltable opt_alias_clause | func_table func_alias_clause | select_with_parens opt_alias_clause ) | OPEN_PAREN table_ref ( CROSS JOIN table_ref | NATURAL join_type ? JOIN table_ref | join_type ? JOIN table_ref join_qual ) ? CLOSE_PAREN opt_alias_clause ) ( ( CROSS JOIN table_ref | NATURAL join_type ? JOIN table_ref | join_type ? JOIN table_ref join_qual ) ) * ;

= relation_expr_list : relation_expr ( COMMA relation_expr ) * ;

= rowsfrom_list : rowsfrom_item ( COMMA rowsfrom_item ) * ;

= tablefuncelementlist : tablefuncelement ( COMMA tablefuncelement ) * ;

= xmltable_column_list : xmltable_column_el ( COMMA xmltable_column_el ) * ;

= xml_namespace_list : xml_namespace_el ( COMMA xml_namespace_el ) * ;

> opt_array_bounds : ( OPEN_BRACKET iconst ? CLOSE_BRACKET ) * ;

= xml_attribute_list : xml_attribute_el ( COMMA xml_attribute_el ) * ;

= window_definition_list : window_definition ( COMMA window_definition ) * ;

= func_arg_list : func_arg_expr ( COMMA func_arg_expr ) * ;

= type_list : typename ( COMMA typename ) * ;

= array_expr_list : array_expr ( COMMA array_expr ) * ;

= when_clause_list : when_clause + ;

= indirection : indirection_el + ;

> opt_indirection : indirection_el * ;

= qualified_name_list : qualified_name ( COMMA qualified_name ) * ;

= name_list : name ( COMMA name ) * ;

> role_list : rolespec ( COMMA rolespec ) * ;
> comp_options : comp_option * ;

= decl_stmts : decl_stmt + ;

= decl_cursor_arglist : decl_cursor_arg ( COMMA decl_cursor_arg ) * ;

> proc_sect : proc_stmt * ;

= getdiag_list : getdiag_list_item ( COMMA getdiag_list_item ) * ;

> assign_var : ( any_name | PARAM ) ( OPEN_BRACKET expr_until_rightbracket CLOSE_BRACKET ) * ;
> stmt_elsifs : ( ELSIF a_expr THEN proc_sect ) * ;

= case_when_list : case_when + ;

< opt_raise_list : | ( COMMA a_expr ) + ;
> opt_raise_list : ( COMMA a_expr * ) ;

< opt_raise_using_elem_list : opt_raise_using_elem ( COMMA opt_raise_using_elem ) * ;
> opt_raise_using_elem_list : ( opt_raise_using_elem COMMA ) * opt_raise_using_elem ;

= opt_execute_using_list : a_expr ( COMMA a_expr ) * ;

= opt_open_bound_list : opt_open_bound_list_item ( COMMA opt_open_bound_list_item ) * ;

= proc_exceptions : proc_exception + ;

= proc_conditions : proc_condition ( OR proc_condition ) * ;

@OleksiiKovalov
Copy link
Contributor Author

hi @KvanTTT , I've applied all recommended transformations, including undoing "avoid leading options", formatting long rules, recommendations by kaby76, etc

@KvanTTT
Copy link
Member

KvanTTT commented Nov 9, 2021

Looks good. Let's leave other fixes to further pull requests.

@tisonkun
Copy link
Contributor

@OleksiiKovalov @KvanTTT May I ask what is "a_expr" "b_expr" "c_expr" in PG grammar?

@tisonkun
Copy link
Contributor

I know that they come from PG's definitions now.

| TABLE relation_expr
| select_with_parens set_operator_with_all_or_distinct (simple_select | select_with_parens)
)
(set_operator_with_all_or_distinct (simple_select | select_with_parens))*
;

set_operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems PG treats INTERSECT tighter than UNION and EXCEPT but in our grammar file it's the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.postgresql.org/docs/current/queries-union.html

As shown here, you can use parentheses to control the order of evaluation. Without parentheses, UNION and EXCEPT associate left-to-right, but INTERSECT binds more tightly than those two operators. Thus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants