Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDFs and SqlFunction quoting and schema logic #12757

Closed
roji opened this issue Jul 22, 2018 · 19 comments · Fixed by #16166
Closed

UDFs and SqlFunction quoting and schema logic #12757

roji opened this issue Jul 22, 2018 · 19 comments · Fixed by #16166
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Jul 22, 2018

While implementing the tests for UDFs (yeah I'm a bit late to the party...), I came across some odd quoting/schema logic for SQL functions that causes a problem in PostgreSQL - this was implemented in #9558. I understand the problems it was meant to solve, but that doesn't really work from a cross-database perspective.

So in https://github.com/aspnet/EntityFrameworkCore/blob/ea9b3d84f9028a8a5eb35ae02babd50836947163/src/EFCore.Relational/Query/Sql/DefaultQuerySqlGenerator.cs#L1548, it seems that GenerateSqlFunctionName() passes the function name through DelimitIdentifier() only if a schema has been defined on it. This in itself is a bit strange - why should function name quoting depend on whether a schema has been defined for that function? Note that in PostgreSQL function names behave exactly like other identifiers (e.g. table names) and need to be quoted if they contain upper-case letters.

The way things are implemented in SQL Server, it turns out that functions implicitly have a schema - SqlServerDbFunctionConvention sets the DefaultSchema of all functions to be dbo (#9303). This makes generated SQL contain [dbo].[FunctionName](...) instead of a simpler/leaner [FunctionName]. This seems to go against EF Core's principle of generating clean, concise SQL that resembles what we'd write by hand; after all why specify dbo everywhere for functions but not for tables? Is this some sort of SQL Server-specific requirement?

Setting up the same convention in Npgsql (with default schema public) makes the 1st problem go away for some cases, since functions now have a schema by default. But it doesn't handle the case where the user explicitly sets an empty schema, like for IsDate in the tests - since there's no schema there's no quoting, and PostgreSQL fails to find isdate after converting the name to lowercase.

For now, my approach is to override DefaultQuerySqlGenerator.GenerateSqlFunctionName() to make it always quote. This creates an issue for some built-in function which EF Core generates upper-case (e.g. SUM), so I have a hardcoded list of built-in functions that aren't to be quoted (this would be obviated if EF Core starts generating lower-case sum() instead of SUM()). I'm also modifying all my expression translators to generate function names in lower-case (e.g. lower() instead of LOWER()).

I realize this is a bit late to be raising this (we should have thought about PostgreSQL when doing UDFs and #9558), just letting you know.

/cc @pmiddleton

roji added a commit to roji/efcore.pg that referenced this issue Jul 22, 2018
Function quoting logic was changed (for now): all function names are
quoted just like any other identifier (the default EF Core behavior
is only to quote schema-less functions). This meant changing all
expression translators to generate lower-case names (lower() instead
of LOWER()), and also to add a hardcoded list of functions which
EF Core generates in upper-case but which shouldn't get quoted
(e.g. SUM()).

See:
* dotnet/efcore#8507
* dotnet/efcore#12044
* dotnet/efcore#12757
* dotnet/efcore#9558
* dotnet/efcore#9303
@pmiddleton
Copy link
Contributor

Welcome to the wonderful world of quirky Sql Server syntax.

Sql Server differentiates calls to built in functions vs user defined functions based on the presence of the schema name. If a function call contains a schema name then Sql Server tries to resolve it as a UDF. Without a schema name and it tries to resolve it as a built in function. This is why SqlServerDbFunctionConvention sets the default schema.

On top of that you also are not allowed to escape built in function calls in Sql Server. This is illegal [IsDate](...) it has to be IsDate(...)

The current solution in place is to look at the presence of the schema to know whether or not to escape the function. If you read my comment on #9558 I was concerned this solution might not work for all providers.

@divega - The uppercase/lowercase issue (SUM vs sum) for PostgreSQL is another use case for why we need a metadata store for built in functions which can be modified at run time.

I think we need a way to flag a DbFunction/SqlFunctionExpression to know if it is a built in function vs UDF so providers can make appropriate decisions on rendering them.

@roji
Copy link
Member Author

roji commented Jul 22, 2018

Oh, thanks for that explanation... That explains some of the weirdness I saw this weekend :)

Yeah, some sort of flag on the function call would be good here - I was just about to add a NeedsQuoting flag on my own provider-specific function expression.

On the PostgreSQL sides things are a bit simpler but still a bit quirky. Built-in functions aren't special as far as I'm aware - they're just functions in the public (default) schema. It's totally fine to quote any function in order to preserve case (otherwise PostgreSQL folds to lowercase), or in case the function name has special characters.

Even in the PostgreSQL case it may make sense to add a "built-in" flag, to avoid quoting under any circumstances. This way, a method translator can specify uppercase SUM and have that go as-is into the SQL, to be folded to lowercase by PG (if it gets quoted PG will error because the actual function is lowercase sum). This can be useful to make the SQL prettier/more readable.

@ajcvickers
Copy link
Member

@roji Are you considering submitting a PR for this?

@roji
Copy link
Member Author

roji commented Jul 31, 2018

I guess I can - the idea would be to add an IsUserDefined property to SqlFunctionExpression (only true in instances returned by DbFunction), and look at that when generating SQL for calling it? For SQL Server (and PostgreSQL) only user-defined functions would be quoted... Sounds reasonable to everyone?

@pmiddleton
Copy link
Contributor

There is still an issue with Sql Server. You are allowed to map a built in function using a UDF. We really need to carry the IsUserDefined through to IDbFunction and the appropriate fluent configuration methods.

Without doing this we have to continue to rely on the presence of a supplied schema to make the decision which is not ideal.

@roji
Copy link
Member Author

roji commented Jul 31, 2018

@pmiddleton exposing this built-in/UDF distinction in the fluent APIs isn't ideal because it seems to be an SQL Server-specific concern (at least in PostgreSQL it's pretty meaningless)...

Maybe it's just better to leave things as they are? I can always handle it internally in the Npgsql provider (i.e. always use lower-case names etc.)? Or do you think we can do better in the general case?

@pmiddleton
Copy link
Contributor

Really neither solution is ideal for Sql Server. On one hand you have to know of the behavior of how schema is used (udf vs built in) on the other you have a fluent setting that doesn't make a ton of sense.

Ideally @ajcvickers moves the EF fort and lays siege to the Sql Server team until they change TSQL :)

I really don't see a good generalized solution to this right now given how different each system is, combined with the ability to use UDFs to map built in functions.

@ajcvickers
Copy link
Member

@pmiddleton That's @divega's job. ;-)

@roji
Copy link
Member Author

roji commented Aug 1, 2018

Yeah, it seems like there's nowhere to go with this really, I'd definitely prefer for EF Core's public API and general internals to stay clean and for any hacks to be completely SQL Server-specific, which is more or less the current situation. So I'm going to close this for now, unless anyone has any better idea.

@roji roji closed this as completed Aug 1, 2018
@divega
Copy link
Contributor

divega commented Aug 1, 2018

FWIW, in EF6, function metadata contains a flag to indicate whether it is built-in, alongside the ones that indicate whether they are niladic, aggregate, conposable, etc. So it wouldn’t be the first time this problem has been approached that way.

roji added a commit to roji/efcore.pg that referenced this issue Aug 3, 2018
Function quoting logic was changed (for now): all function names are
quoted just like any other identifier (the default EF Core behavior
is only to quote schema-less functions). This meant changing all
expression translators to generate lower-case names (lower() instead
of LOWER()), and also to add a hardcoded list of functions which
EF Core generates in upper-case but which shouldn't get quoted
(e.g. SUM()).

See:
* dotnet/efcore#8507
* dotnet/efcore#12044
* dotnet/efcore#12757
* dotnet/efcore#9558
* dotnet/efcore#9303
@roji roji reopened this Aug 3, 2018
@roji
Copy link
Member Author

roji commented Aug 3, 2018

Reopening to rethink.

If I try hard I can see the utility of a fluent-API-exposed IsBuiltIneven for PostgreSQL, meaning simply that the function is never to be quoted, like in SQL Server. Unlike SQL Server this isn't really necessary, but would allow users to specify function names in any case without fear of them being quoted, which is nice for readability. This could be useful for functions with names such as ST_AsGeometry (if you add quotes around them it has to be "st_asgeometry"). So there does seem to be at least some cross-database justification for this.

If that makes sense to people I can submit a PR for that.

I can also submit a PR for niladic, which also makes sense for PostgreSQL (e.g. current_timestamp which can only be called without parentheses).

@pmiddleton
Copy link
Contributor

What name do we want to go with? I'm not sure if IsBuiltIn is descriptive enough. IsServer is another option but I think that is even more confusing. Have I mentioned that I hate naming things.

Are these fluent only options or should DbFunctionAttribute support them. I think they are advanced options so fluent only is ok.

Thoughts on this @divega

@roji
Copy link
Member Author

roji commented Aug 3, 2018

Another naming option is to flip the logic and have IsUserDefined - maybe that's more descriptive (not sure).

@roji
Copy link
Member Author

roji commented Aug 9, 2018

@ajcvickers any specific reason this needs to wait for 3.0.0 (aside from workload)? If not I can work on a PR for 2.2.

@roji
Copy link
Member Author

roji commented Aug 9, 2018

(I'm assuming this can be done without breaking backwards compatibility, which I admit doesn't seem very clear)

@ajcvickers
Copy link
Member

@roji I think we would consider a PR before 3.0. I believe it is in 3.0 because it involves query, for which we are focusing initially on #12795 since depending on the outcome of that issue things done in query may or may not need to be redone if we do them before. @smitpatel can confirm if there is any other reason for pushing this to 3.0.

roji added a commit to roji/efcore.pg that referenced this issue Aug 12, 2018
For now, only unquoted function names are supported (i.e. all lower-
case, no special chars). Quoting logic for functions is quite
complicated, see below issues for details.

See:
* dotnet/efcore#8507
* dotnet/efcore#12044
* dotnet/efcore#12757
* dotnet/efcore#9558
* dotnet/efcore#9303
roji added a commit to roji/efcore.pg that referenced this issue Aug 12, 2018
For now, only unquoted function names are supported (i.e. all lower-
case, no special chars). Quoting logic for functions is quite
complicated, see below issues for details.

See:
* dotnet/efcore#8507
* dotnet/efcore#12044
* dotnet/efcore#12757
* dotnet/efcore#9558
* dotnet/efcore#9303
roji added a commit to roji/efcore.pg that referenced this issue Aug 12, 2018
For now, only unquoted function names are supported (i.e. all lower-
case, no special chars). Quoting logic for functions is quite
complicated, see below issues for details.

See:
* dotnet/efcore#8507
* dotnet/efcore#12044
* dotnet/efcore#12757
* dotnet/efcore#9558
* dotnet/efcore#9303
@smitpatel
Copy link
Member

Design meeting notes:
SqlServer

  • BuiltIn functions cannot be quoted
  • Non-builtIn functions must be quoted
  • Functions are case insensitive
    SQLite
  • Functions with name containing space etc has to quoted
  • Anything else does not matter quoted or not
    Postgre
  • Functions are case sensitive
  • Needs to be quoted to preserve case or with special character
  • Can remove quoting

Options:

  • IsBuiltIn flag
    • SqlServer can look at the flag and determine.
    • On the top, providers can look at the string and can determine to quote or not.
  • List of built in function in providers
    • Requires full list of functions. Over the time more may be added. Spatial kind package brings in more functions.
  • FunctionName must be passed quoted. UDF functionName would be quoted automatically.
    • Inconsistent with other parts of stack.

Decision:
We will add IsBuiltIn flag on SqlFunctionExpression. It is required argument unless we see huge demand for default.

@pmiddleton
Copy link
Contributor

@smitpatel - We need to carry the IsBuiltIn flag into the DbFunctions fluent api / attribute. That way we can properly deal with builtin functions that are mapped by the end user.

I can do that once we agree that is the way we want to go.

smitpatel added a commit that referenced this issue Jun 19, 2019
Fuctions without arguments passed considered niladic.
Non-niladic functions without arguments must pass empty array of args.
Functions with instance are considered built in.
Functions without schema are considered built in.
Functions with schema are non-built-in (Schema can be null)
To map a DbFunction to built-in function use HasTranslation API

Resolves #12757
Resolves #14882
Resolves #15501
@smitpatel smitpatel added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jun 19, 2019
smitpatel added a commit that referenced this issue Jun 21, 2019
Fuctions without arguments passed considered niladic.
Non-niladic functions without arguments must pass empty array of args.
Functions with instance are considered built in.
Functions without schema are considered built in.
Functions with schema are non-built-in (Schema can be null)
To map a DbFunction to built-in function use HasTranslation API

Resolves #12757
Resolves #14882
Resolves #15501
@smitpatel
Copy link
Member

We added IsBuitIn flag which will determine if function would be quoted or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants