Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: sql parser for CREATE FUNCTION statement #83891

Merged

Conversation

chengxiong-ruan
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan commented Jul 6, 2022

handles #83218

This commit added parser support for CREATE FUNCTION sql
statement. Scanner was extended so that it can recognize the
BEGIN ATOMIC context so that it doesnot return early when it
sees ; charater which normally indicates the end of a
statement.

Release note (sql change): CREATE FUNCTION statement now
can be parsed by crdb, but an unimplemented error would be
thrown since the statement processing is not done yet.

@chengxiong-ruan chengxiong-ruan requested a review from a team July 6, 2022 14:09
@chengxiong-ruan chengxiong-ruan requested review from a team as code owners July 6, 2022 14:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

if lval.id == 0 || lval.id == ';' {

if preValId == BEGIN && lval.id == ATOMIC {
curFuncBodyCnt += 1
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hack is to recognize the function body context so that the scanner won't return early. for example:

CREATE FUNCTION f() RETURNS INT LANGUAGE SQL
BEGIN ATOMIC
SELECT 1;
SELECT 2;
END;

SELECT 1 and SELECT 2 are considered function body statements instead of individual statements out side of CREATE FUNCTION. We'd need to fix sql prompt to recognize the BEGIN ATOMIC context so that it would treat the input above as a multiline input instead of concluding a statement when it sees ;.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also feeling open to cut the support of BEGIN ATOMIC thing since we can define function with string function body as well...

@chengxiong-ruan chengxiong-ruan requested a review from a team as a code owner July 6, 2022 17:02
@chengxiong-ruan chengxiong-ruan force-pushed the create-function-sql-parser branch 2 times, most recently from 25a51b2 to 5711894 Compare July 6, 2022 18:14
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan and @knz)


pkg/sql/parser/sql.y line 1613 at r3 (raw file):

  HELPTOKEN { return helpWith(sqllex, "") }
| stmt_without_legacy_transaction
| legacy_transaction_stmt

can you choose a different name than "legacy" - I don't think there's anything legacy about this, it simply is a different syntax. Do you perhaps mean "extended txn stmt"?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @knz)


pkg/sql/parser/sql.y line 1613 at r3 (raw file):

Previously, knz (kena) wrote…

can you choose a different name than "legacy" - I don't think there's anything legacy about this, it simply is a different syntax. Do you perhaps mean "extended txn stmt"?

https://github.com/postgres/postgres/blob/2d7ead85267cc0a41ea4e94ee0ac144d5214d353/src/backend/parser/gram.y#L10990

Apparently postgres considers BEGIN to be legacy :). @chengxiong-ruan taught me this earlier today

@chengxiong-ruan
Copy link
Contributor Author

pkg/sql/parser/sql.y line 1613 at r3 (raw file):

Previously, ajwerner wrote…

https://github.com/postgres/postgres/blob/2d7ead85267cc0a41ea4e94ee0ac144d5214d353/src/backend/parser/gram.y#L10990

Apparently postgres considers BEGIN to be legacy :). @chengxiong-ruan taught me this earlier today

correct, I only found that out when trying to sort out a reduce/reduce conflict on keyword END and realized that they don't consider BEGIN and END transaction statements in BEGIN ATOMIC...END context (by grouping them as legacy....).

@knz
Copy link
Contributor

knz commented Jul 6, 2022

thanks for explaining. Maybe worth mentioning in the grammar rules which are the rules you took over from pg's gram.y, and which are the new rules you've invented.

(We've referred to pg's grammar in other places already)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I want to give this another pass, but it's very close.

Comment on lines 155 to 161
var preValID int32
curFuncBodyCnt := 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some commentary to indicate what this is about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Comment on lines +3999 to +4019
| EXTERNAL SECURITY DEFINER
{
return unimplemented(sqllex, "create function...security")
}
| EXTERNAL SECURITY INVOKER
{
return unimplemented(sqllex, "create function...security")
}
| SECURITY DEFINER
{
return unimplemented(sqllex, "create function...security")
}
| SECURITY INVOKER
{
return unimplemented(sqllex, "create function...security")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside I could see of returning these errors at parse time is I'm not sure we'll get telemetry for the feature we don't support. Maybe create a follow-up issue to make sure we get good telemetry out of unsupported features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point, I created #83964 to create this

FuncName FunctionName
Args FuncArgs
ReturnType FuncReturnType
Options FunctionOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there or should there be a canonical ordering of these options? I don't think we need to preserve the order the user gave us. In particular, I feel like we may want to put the definition string at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Are you concerned about the formatting of the string representation? Yeah, I think it'd be good to have a canonical ordering of them and having function body at the end. I'll make it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. I actually already put function body at the end of the formatting output.. Maybe you have different concerns. One thing to note that we'll convert this statement into a PlanNode which would have these options as individual fields, and we'd add validation on conflicting options like having IMMUTABLE and STABLE in one statement...

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work!

Reviewed 6 of 14 files at r1, 15 of 19 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @knz)


pkg/sql/parser/parse.go line 166 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I'm also feeling open to cut the support of BEGIN ATOMIC thing since we can define function with string function body as well...

We're not aiming to support the sql_body syntax initially, correct? I'd be in favor of dropping this until we want to add support for sql_body.


pkg/sql/parser/sql.y line 3866 at r3 (raw file):

| /* EMPTY */ { $$.val = false }

func_create_name:

nit: Can we call this just func_name?


pkg/sql/parser/sql.y line 3875 at r3 (raw file):

func_arg_with_default_list:
  func_arg_with_default { $$.val = tree.FuncArgs{$1.functionArg()} }
| func_arg_with_default_list ',' func_arg_with_default { $$.val = append($1.functionArgs(), $3.functionArg()) }

nit: use multiple lines to make this a bit more legible:

| func_arg_with_default_list ',' func_arg_with_default
  {
    $$.val = append($1.functionArgs(), $3.functionArg())
  }

pkg/sql/parser/sql.y line 3885 at r3 (raw file):

    $$.val = arg
  }
| func_arg '=' a_expr

TIL this syntax is supported...


pkg/sql/parser/sql.y line 3943 at r3 (raw file):

func_return_type:
  func_arg_type

nit: Do we need the func_arg_type and func_return_type aliases, or can we just use typename instead?


pkg/sql/parser/sql.y line 3951 at r3 (raw file):

create_func_opt_list:
  create_func_opt_item { $$.val = tree.FunctionOptions{$1.functionOption()} }
| create_func_opt_list create_func_opt_item { $$.val = append($1.functionOptions(), $2.functionOption())}

nit: multiple lines


pkg/sql/parser/sql.y line 3967 at r3 (raw file):

  }
| TRANSFORM { return unimplemented(sqllex, "create function...transform") }
| WINDOW { return unimplemented(sqllex, "create function...window") }

nit: "create transform function" and "create window function".


pkg/sql/parser/sql.y line 4014 at r3 (raw file):

    return unimplemented(sqllex, "create function...security")
  }
| LEAKPROOF

We'll have to error if the user supplies STABLE or VOLATILE with LEAKPROOF, because our current volatility only has a concept of IMMUTABLE+LEAKPROOF. No need to change anything here, but thought I'd mention it so we don't forget.


pkg/sql/parser/sql.y line 4081 at r3 (raw file):

| /* Empty */
  {
    $$.val = (*tree.RoutineBody)(nil)

Should this be an unimplemented error for now? We will only support functions with bodies initially, correct?


pkg/sql/parser/sql.y line 4014 at r4 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

ah, good point, I created #83964 to create this

It would be nice if unimplemented reported telemetry automatically. I imagine that'd involve unraveling some import cycles...


pkg/sql/sem/tree/udf.go line 93 at r5 (raw file):

}

// RoutineReturn represent a RETURN statement in a UDF body.

Should we hold off on adding this until we add support for sql_body?


pkg/sql/sem/tree/udf.go line 119 at r5 (raw file):

func (FunctionLanguage) functionOption()          {}

// FunctionNullInputBehavior represent UDF property on null parameters.

nit: represents the


pkg/sql/parser/testdata/create_function line 56 at r5 (raw file):

CREATE OR REPLACE FUNCTION f(IN a INT8) RETURNS INT8 LANGUAGE SQL BEGIN ATOMIC SELECT _; CREATE OR REPLACE FUNCTION g() RETURNS INT8 BEGIN ATOMIC SELECT _; END; END -- literals removed
CREATE OR REPLACE FUNCTION _(IN _ INT8) RETURNS INT8 LANGUAGE SQL BEGIN ATOMIC SELECT 1; CREATE OR REPLACE FUNCTION _() RETURNS INT8 BEGIN ATOMIC SELECT 2; END; END -- identifiers removed

nit: add tests for RETURNS SETOF ...

You could also add tests for some of the unimplemented syntaxes, up to you.

This commit added parser support for `CREATE FUNCTION` sql
statement. Scanner was extended so that it can recognize the
`BEGIN ATOMIC` context so that it doesnot return early when it
sees `;` charater which normally indicates the end of a
statement.

Release note (sql change): `CREATE FUNCTION` statement now
can be parsed by crdb, but an unimplemented error would be
thrown since the statement processing is not done yet.
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @knz, and @mgartner)


pkg/sql/parser/parse.go line 166 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We're not aiming to support the sql_body syntax initially, correct? I'd be in favor of dropping this until we want to add support for sql_body.

We could just throw an unimplemented with telemetry support if we see this version of function body is set. It would be interesting data to look up.
Sorry that I wasn't consistent on other unimplemented UDF options... I have an issue tracking this work for them #83964.


pkg/sql/parser/sql.y line 3866 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Can we call this just func_name?

func_name has been taken, I recall I didn't like just using func_name because it uses prefixed_column_path and I was not comfortable with that.


pkg/sql/parser/sql.y line 3875 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: use multiple lines to make this a bit more legible:

| func_arg_with_default_list ',' func_arg_with_default
  {
    $$.val = append($1.functionArgs(), $3.functionArg())
  }

done


pkg/sql/parser/sql.y line 3885 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

TIL this syntax is supported...

:)


pkg/sql/parser/sql.y line 3943 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Do we need the func_arg_type and func_return_type aliases, or can we just use typename instead?

oh, I created the aliases just in case for future syntax extension on the types.. for example you could use table_name.column_name%TYPE as the type name >_>


pkg/sql/parser/sql.y line 3951 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: multiple lines

done!


pkg/sql/parser/sql.y line 3967 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "create transform function" and "create window function".

done!


pkg/sql/parser/sql.y line 4014 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We'll have to error if the user supplies STABLE or VOLATILE with LEAKPROOF, because our current volatility only has a concept of IMMUTABLE+LEAKPROOF. No need to change anything here, but thought I'd mention it so we don't forget.

acked.


pkg/sql/parser/sql.y line 4081 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should this be an unimplemented error for now? We will only support functions with bodies initially, correct?

answered above, hope that make sense :)


pkg/sql/sem/tree/udf.go line 93 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should we hold off on adding this until we add support for sql_body?

same as sql_body will throw unimplemented in post-parsing validations.


pkg/sql/sem/tree/udf.go line 119 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: represents the

done!


pkg/sql/parser/testdata/create_function line 56 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: add tests for RETURNS SETOF ...

You could also add tests for some of the unimplemented syntaxes, up to you.

added!

@chengxiong-ruan
Copy link
Contributor Author

TFTR!

bors r+

@craig craig bot merged commit bef1101 into cockroachdb:master Jul 11, 2022
@craig
Copy link
Contributor

craig bot commented Jul 11, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants