Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DbFunctions: self-bootstrapping functions #9810

Closed
divega opened this issue Sep 13, 2017 · 24 comments · Fixed by #21487
Closed

DbFunctions: self-bootstrapping functions #9810

divega opened this issue Sep 13, 2017 · 24 comments · Fixed by #21487
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@divega
Copy link
Contributor

divega commented Sep 13, 2017

TL;DR

We can enable instance functions and extension methods defined on the DbContext to be called directly outside of queries, since access to "this" DbContext instance can infuse the ability to bootstrap server LINQ queries without having to go through a DbSet<TEntity>:

  var count = db.PostReadCount(p.Id);

This applies to scalar functions as well as TVFs (which would look like any other self-bootstrapping function but would return IQueryable<T>).

Details

(originally from comment at #9213 (comment))

One of the reasons to support instance functions based on DbContext (#9213) is to be able to pass the DbContext to the implementation of the function, which may involve querying for data that exists in the database.

There is another click-stop beyond that which is to be able to use the function directly. E.g. instead of always having to call the function like this:

foreach(var p in posts.Where(p => db.PostReadCount(p.Id) > 5))
{
  // do something
}

We can enable calling the function directly in code:

foreach(var p in posts)
{
  ...
  if (db.PostReadCount(p.Id) > 5)
  {
    // do something
  }
}

(I understand the scenario above is a bit contrived, because in theory calling the function in the query may avoid additional round-trips, but I don't think this example negates the value of self-bootstrapping functions in general)

Since in the body of the method we have access to "this" DbContext, we can use it to execute a query in which the method is invoked:

public int PostReadCount(int postId)
{
  return Posts.Where(p => p.Id == postId).Select(p => db.PostReadCount(p.Id)).FirstOrDefault();  
}

Although ideally the user should be able to do something like this some day:

public int PostReadCount(int postId)
{
  return Execute(db => db.PostReadCount(postId));  
}

Where assuming there is a PostReadCount() scalar function in the database would translate to something like this:

SELECT dbo.PostReadCount(@postId);

We actually had this kind of capability in EF6 and even previous versions, and it was useful for mapping scalar functions as well as TVFs, although it wasn't at all easy to setup.

Impact of client evaluation

Note that it was also safer in EF6 to invoke the method through the LINQ provider as part of the implementation of the method, because there was no automatic client evaluation. In EF Core we should probably have a way to fork re-entrant calls to either evaluate in memory or throw if there is no reasonable in-memory implementation. Otherwise there are scenarios that would result into infinite recursion.

@ajcvickers ajcvickers added this to the Backlog milestone Sep 15, 2017
@ajcvickers
Copy link
Member

@pmiddleton Let us know if you're interested in working on this.

@pmiddleton
Copy link
Contributor

@ajcvickers Yes I am. I actually have already started looking into something along the lines of the second example.

public int PostReadCount(int postId)
{
  return Execute(db => db.PostReadCount(postId));  
}

I ran into some issues with ReLinq and then put it aside. I was going to pick it up again once I had the instance methods on DbContext (#9213) working as it is required first.

@pmiddleton
Copy link
Contributor

This feature is what is really needed to make niladic functions (#9022) useful. It was my work on that item which started me looking at what is in effect this feature.

@rafalkasa
Copy link

Hi @pmiddleton I would like to call function mapped by [DbFunction] attribute. Do you know approximately date when your worki on it will be finalized ?

@pmiddleton
Copy link
Contributor

@rafalkasa

DbFunctions that are a part of a larger query were added as a part of 2.0. Those you can use today.

This issue tracks the ability to directly call a function outside of a query. If you need this functionality I can't give you a good eta as I am doing this in my free time. I have started on it and am making progress, I would like to have it ready for the 2.1 release but no promises.

If you need help just making a regular dbFunction call let me know and I can help out.

@pmiddleton
Copy link
Contributor

@divega

I have a prototype put together that allows running scalar methods directly via syntax similar to what you outlined above.

public int PostReadCount(int postId) { return Execute(db => db.PostReadCount(postId)); }

There are two limitations I have run into with this approach which are worth discussing.

  1. You can only run a single function per database round trip.
    Say for example you want to run some similar metadata functions like Schema_Name() and Schema_Id() on Sql Server. You will have to make two db calls instead of being able to run them at the same time.

  2. You can't use function calls as parameters to other function calls. That mean you can't easily run code like this. context.PostReadCount(context.FindNewestPostId()). In order to run something like that the end use would have to manually build the expression tree to pass in the Execute method. That is just too end user unfriendly.

There is an alternative approach I have been thinking about, but I want to run it past you before I spend much time digging into it.

What if we add a special DbSet to DbContext which we can use to build and run the queries. This DbSet would not be added to the model and could be used to run the two types of queries I outlined above.

context.EmptySet.Select(e => new 
{ 
    Count = context.PostReadCount(context.FindNewestPostId()), 
    SchemaName = context.SchemaName 
})`

I have no idea how feasible this idea is, but I don't want to start digging into it if it's a nonstarter.

@pmiddleton
Copy link
Contributor

@ajcvickers @smitpatel @anpete - Do any of you have any input on this?

Thanks

@divega
Copy link
Contributor Author

divega commented Dec 17, 2017

@pmiddleton Cool, thank you very much! Some answers:

You can only run a single function per database round trip...

I guess you are saying that is the case if you take advantage of the self-bootstrapping capability of one of these functions (i.e. you invoke it directly) and the function returns a scalar. That is expected, but you should still be able to use the function inside the body of a regular query (e.g. one bootstrapped from a DbSet), and if the function is composable (i.e. a TVF) then you should be able to call additional function in query composition like you would normally do with LINQ queries that start with a DbSet.

You can't use function calls as parameters to other function calls...

Based on the example you provided what I expect to happen is that the two functions will be invoked in succession, with the result of the first one passed to the first one. Is that the case, or am I missing some other detail? If we (or the user) want to enable composition in this manner, we would need to tweak the argument for the second function to be an expression (scalars parameters are not composable in the right way because they don't capture expressions). Then you should be able to write it like this:

var count = context.PostReadCount(() => context.FindNewestPostId());

This could in fact be an additional overload of PostReadCount.

What if we add a special DbSet to DbContext which we can use to build and run the queries...

I see this "null" DbSet as a possible complementary approach, but there is also some potential for it to converge with the Execute method described previously. Semantically this is an untyped query root that when used will return a single element of whatever you pass in the projection. It is equivalent to SELECT select_list without a FROM clause that you can use in SQL.

You should be able to achieve something functionally similar by taking any regular DbSet and performing a .GroupBy(x => someConstant) then projecting something not correlated to x, but I am almost sure we lack the smarts to simplify such a query to the minimal SQL translation required.

@smitpatel
Copy link
Member

While current SelectExpression gives ability to create without FromExpression, though the only issue to encounter the assumptions made around root being there. My suspicion is QueryModelVisitor is more prone to the assumption than SelectExpression.
Though idea of Empty DbSet, does not sound really good. On bare minimum level, it will break apart a lot of invariants of the system.

@pfaisant
Copy link

any news on this ?

@pmiddleton
Copy link
Contributor

@pfaisant - This just missed the cut for 2.2.

A redesign for query generation is happening in 3.0. Depending on the time frame of when that is done will determine if there is enough time for me to update my PR to work with the new approach before release. Therefore I can't say if this will make 3.0 or not.

@smitpatel smitpatel self-assigned this Mar 15, 2020
@smitpatel smitpatel modified the milestones: Backlog, 5.0.0 Mar 15, 2020
@smitpatel
Copy link
Member

Pulling this in 5.0. Most of the work has been done with TVF work which are self-bootstrapping. We probably just need to iron out kink to finish this feature. The self-bootstrapping db functions would work for scalars also.

@smitpatel
Copy link
Member

Related #20363

@AndriySvyryd
Copy link
Member

Review whether DbContext.CreateQuery is still needed

@smitpatel
Copy link
Member

Researching more on this:
There are 4 variations since input/output both could be scalar(s) or queryable.

  • input and output both scalars are our basic UDFs which we added in 2.1. Since they don't have queryables, they are used inside lambdas only. I don't think we have to do anything more than that at this point.
  • input and output both being queryables
    • For extension method perspective, it is same as any queryable method (e.g. Where/OrderBy...)
    • For non-extension method perspective, it is refactoring which users typically do to build larger queries.
    • In either case, since queryable already has query provider, either user can compose the query with natural LINQ syntax or build it dynamically.
    • Supporting an additional queryable operator like Where is like database Select Expression having new construct. Which would be extremely rare to do and would require changes to a lot more areas in translation/SQL tree etc. It is possible to add such thing by providers but from application it is probably too expensive.

Which leaves us with rest 2 patterns

  • Scalar input returning Queryable which is what we added using CreateQuery. (TVFs). It requires EF support to inject IAsyncQueryProvider.
  • Queryable input returning scalar (basically aggregate functions). While it can be done by user with special processing of calling Execute on query provider, the method call still would be integrated. EF6 had a method BootstrapFunction on DbFunction do this thing albeit private.
    EF6 method: private static TOut BootstrapFunction<TIn, TOut>(Expression<Func<IEnumerable<TIn>, TOut>> methodExpression, IEnumerable<TIn> collection)
    It takes enumerable because of collection navs/IGrouping.

I am proposing following,
BootstrapFunction method with 2 overloads and 1 BootstrapFunctionAsync method.

private IQueryable<TElement> BootstrapFunction<TElement>(Expression<Func<IQueryable<TElement>>> expression)
{
	var queryProvider = this.GetService<IAsyncQueryProvider>();

	return queryProvider.CreateQuery<TElement>(expression.Body);
}

private TResult BootstrapFunction<TResult>(Expression<Func<TResult>> expression)
{
	var queryProvider = this.GetService<IAsyncQueryProvider>();

	return queryProvider.Execute<TResult>(expression.Body);
}

private Task<TResult> BootstrapFunctionAsync<TResult>(Expression<Func<TResult>> expression, CancellationToken cancellationToken = default)
{
	var queryProvider = this.GetService<IAsyncQueryProvider>();

	return queryProvider.ExecuteAsync<Task<TResult>>(expression.Body, cancellationToken);
}

Definition in application DbContext

public IQueryable<Blog> TVF(int id)
{
	return BootstrapFunction(() => TVF(id));
}

public int Aggregate(IEnumerable<Blog> blogs)
{
	return BootstrapFunction(() => Aggregate(blogs));
}

public Task<int> AggregateAsync(IEnumerable<Blog> blogs, CancellationToken cancellationToken = default)
{
	return BootstrapFunctionAsync(() => Aggregate(blogs), cancellationToken);
}

Usage in application code

var query1 = db.TVF(1).ToList();

var query2 = db.Aggregate(db.Blogs);

var query3 = await db.AggregateAsync(db.Blogs);

M1: works like TVF and injects query provider

M2, M3: works like aggregate by invoking Execute* methods and returning a scalar result. User methods take IEnumerable rather than IQueryable in order to use navigation properties in subqueries. Though this is upto users to decide.
Slight catch, if user wants to use Async method, they still have to define sync method and register that as DbFunction. This is because, our query pipeline only deals with sync operations (even for Queryable methods since they don't have async counter parts). ExecuteAsync method passes in correct result type for us to generate result wrapped in Task. We could change this but it would be large change in query pipeline how we process any terminating operator. We should probably wait for user feedback if they want to use async methods without defining sync one. Sync method does not need to have an implementation for this.
Even though we are fetching different query provider, the query is never aware of the query provider which invoked Execute method hence you could use any context instance to invoke query from other context instances. Though all contexts other than the one on which function is invoked must be the same.

I verified inside our pipeline and with proper DbFunction mapping, we get expression tree as expected.

While testing I put BootstrapFunction methods on DbContext directly but since it is relational it should live somewhere else and not on DbContext directly as long as we can resolve IAsyncQueryProvider service there.

@pmiddleton
Copy link
Contributor

In the original queryable functions code I had for 2.2 I did have the ability to directly run scalar functions. It would be possible to add it back, but how useful would it be to an end user. There are some metadata and security functions that might be worth calling directly, say SCOPE_IDENTITY, SCHEMA_NAME, or CURRENT_USER. I've never seen anyone ask for this so the demand is probably pretty low.

I don't follow how you're aggregate methods example works. Aggregates require a scalar column as input and you are passing in a DbSet.

Your last sentence is the thing I was never able to find a good answer to when I originally wrote this feature. Where can you put the bootstrap function, besides on DbContext, and still have it easily accessible to the end user inside of DbContext. You don't want to force the user to DI something into their context in order to use queryable functions.

You could try making them static methods on DbFunctions and creating a new service scope in order to get access to an IAsyncQueryProvider. I have no idea how well creating another scope would play with the rest of the system but I'm going to guess not well.

@smitpatel
Copy link
Member

I've never seen anyone ask for this so the demand is probably pretty low.

That is trying to use EF Core to connect to database to implement a calculator. :trollface:

On serious note,

Aggregates may require scalar column but even EF6 pattern did not take it since any aggregate with Scalar can be written as .Select(scalar).Aggregate() Implementation can take care of attaching scalar in appropriate place. The way TVF does not allow users to use HasTranslation API, I feel that same would be true for aggregate too.

I think there are some places like context.Database which provides access to service. We don't need actual context, we just need service locator to get IAsyncQueryProvider. If nothing else then context.Database.BootstrapFunction is not too terrible.

@smitpatel
Copy link
Member

Remove CreateQuery API.
Add public FromExpression method which takes Func<IQueryable> on DbContext.
(will write more detailed notes later)

@pmiddleton
Copy link
Contributor

Have a PR out yet? Would love to see how you tie that back into IAsyncQueryProvider

@smitpatel
Copy link
Member

It is same as CreateQuery API just public and different name.

@smitpatel
Copy link
Member

smitpatel commented Jul 1, 2020

Options discussed in 1h55m long design meeting.

  • CreateQuery essentially what is currently implemented API. It could cause pit of failure for beginners as this is the way they need to create the queries.
  • Set<T>().FromFunction(() =>TVF(id)) the syntax of passing in parameterless lambda in LINQ query is somewhat ugly and verbose syntax.
  • BootstrapFunction It can be used for more than just functions. Potentially allowing users to pass us a tree which gets scalar value back, even if not aggregate function.
  • Query - too broad of API. Faces same ambiguity. We also had Query API before which was to get DbSet for query types.
  • CreateExpression to avoid query in the name. Can be used to written either queryable/scalar.
  • CreateQueryable to be explicit but returns queryable so it is redundant in the name.
  • CreateUnrootedQuery - Nicer API to create queryable without query root. Does not exactly solve issue of scalars.
  • context.Set<T>(() => TVF(id)) overload of Set operation.
  • context.UnrootedQuery<T>(() => TVF(id)) same as above but different method.
  • `context.Query.Create/context.Query.Execute - Additional indirect with Query. 2 explicit methods for queryable/scalar.
  • context.FromExpression<T>(() => TVF(id)) can be used to pass any expression and returns IQueryable
  • context.FromUnrooted<T>(() => TVF(id)) same as above but different name.

We decided to use context.FromExpression<T>(() => TVF(id)) for bootstrapping TVF. Users can use it to pass any expression tree which ends in Queryable to hook it up with query provider.
Relational in future can add similar method context.FromSql<T>("SQL") for sql query returning Queryable
For scalars,

  • We can add overload of FromExpression which based on return type of the lambda can call execute and return back scalar, or
  • We add a parallel API to context.Database.ExecuteSql method at core level. This would also make it possible for us to have async support. (overload does not work for async terminating one.

@smitpatel smitpatel added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jul 1, 2020
smitpatel added a commit that referenced this issue Jul 1, 2020
Public method FromExpression on DbContext to tie up our query provider with the expression

Resolves #9810
smitpatel added a commit that referenced this issue Jul 1, 2020
Public method FromExpression on DbContext to tie up our query provider with the expression

Resolves #9810
@ghost ghost closed this as completed in #21487 Jul 2, 2020
ghost pushed a commit that referenced this issue Jul 2, 2020
Public method FromExpression on DbContext to tie up our query provider with the expression

Resolves #9810
@pmiddleton
Copy link
Contributor

Glad to know I'm not the only one who has issues naming things :)

@MisinformedDNA
Copy link

For scalars,

  • We can add overload of FromExpression which based on return type of the lambda can call execute and return back scalar, or
  • We add a parallel API to context.Database.ExecuteSql method at core level. This would also make it possible for us to have async support. (overload does not work for async terminating one.

@smitpatel Are there any open issues to address self-bootstrapping scalar functions? It seems like TVFs were solved, but scalars were not.

@smitpatel
Copy link
Member

@MisinformedDNA - #11624 tracks that.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants