Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query with GroupBy or GroupJoin throws exception #17068

Closed
smitpatel opened this issue Aug 9, 2019 · 46 comments
Closed

Query with GroupBy or GroupJoin throws exception #17068

smitpatel opened this issue Aug 9, 2019 · 46 comments
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. punted-for-3.0 type-enhancement

Comments

@smitpatel
Copy link
Member

smitpatel commented Aug 9, 2019

If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read #17068 (comment) fully to direct your feedback in specific issue for your specific scenario.


Read our documentation about this here


Group By

Linq GroupBy operator allows to create IGrouping<TKey, TElement> where TKey & TElement could be arbitrary types. Further, IGrouping implements IEnumerable<TElement> which means you can compose over using any queryable operator after grouping.
In contrast SQL GROUP BY keyword is very restrictive. In SQL, you can only use scalar values in GROUP BY. Most database also allows only referencing grouping key columns or aggregate applied over any other column. Hence selecting a column which is not in key or does not have aggregate applied is not possible.
Due to this mismatch, not all linq GroupBy can be translated to SQL GROUP BY. In EF Core 3.0 we have support for SQL GROUP BY but not for all linq GroupBy. There are few bugs/limitation in this translation for which we have various issues filed, mainly for scenarios where Distinct or a predicate is applied before aggregate operation.
Another GroupBy scenario which can be translated to server is selecting first element of each group which is being tracked by #13805
Most other linq GroupBy cannot be evaluated on server and must be evaluated on client side. This issue tracks about allowing EF Core to do this client side grouping implicitly. Absence of this feature means users would have to manually call AsEnumerable before GroupBy to explicitly falling into client side grouping.

GroupJoin

Linq GroupJoin is a queryable operator which does not have equivalent SQL keyword. The biggest use of GroupJoin for an ORM is to generate Left join on server side as mentioned here. Customers.GroupJoin(Orders,...) would create IEnumerable<Order> for each Customer. As with GroupBy, you can compose over this in Linq but there is no translation. Again, GroupJoin requires client side grouping to generate correct result. This issue also tracks adding implicit support for it.


In recent discussion it came up that, allowing this may be pit of failure (with escape hatch at the bottom :trollface: ).

@roji
Copy link
Member

roji commented Aug 27, 2019

#12560 (comment) is a good example of why it's dangerous to implement implicit client evaluation of GroupBy. Requiring users to do explicit client evaluation with AsEnumerable forces them to think and understand the distinctions and perf impact, etc.

Reiterating previous thoughts, the more I think of it, the more I think we should aim to avoid implicit client evaluation for any scenario where client evaluation is (significantly) more expensive than server evaluation, when possible.

In retrospect it actually seems like quite a mistake for LINQ to have GroupBy returning groupings rather than a reduced/aggregated scalar, given that it tends to follow SQL in operator naming and the general prevalence of SQL.

@smitpatel smitpatel removed their assignment Sep 3, 2019
@smitpatel
Copy link
Member Author

Linking all other bugs from backlog which requires client side GroupBy. If we decide to fix this, then please consider all the scenarios in linked closed bugs.

@smitpatel
Copy link
Member Author

If we decide to close this, also close related issues which are blocked on this one. And also clean up tests in codebase.

@ajcvickers
Copy link
Member

Note from team meeting: @smitpatel will discuss with team in a design meeting.

@smitpatel
Copy link
Member Author

We discussed this issue in a team meeting in detailed and I have read all above comments thoroughly. Following are our conclusions
GroupBy operator
As listed in this issue earlier following are missing translations in GroupBy operator which we will add in future and we will not client eval any of those patterns.

GroupJoin operator
GroupJoin is mainly used to generate LeftJoin and there is no SQL equivalent. Hence we will not translate any composition over GroupJoin.

GroupJoin <-> GroupBy equivalence
Due to high cost of pattern matching GroupJoin operator, we are not going to add any translation in GroupJoin even if equivalent GroupBy query would be translated. Customers would be required use GroupBy syntax as that is closest representation of what is the SQL.

AsAsyncEnumerable usage
As our exception message says, when client evaluating use AsAsyncEnumerable operator. We understand that after getting IAsyncEnumerable, composing over is not easy. Since Enumerable operators which works on IEnumerable are defined in .NET runtime, we believe that most beneficial thing for all the customers of .NET would be to have equivalent operators which works on IAsyncEnumerable in .NET runtime only. If you need async enumerable operators in your query, please file an issue in https://github.com/dotnet/runtime In the meantime, you can System.Interactive.Async - an external library, which allows enumerable like operators over IAsyncEnumerable for composition. Since this issue is much broader than EF Core and there is a reasonable open source work-around, we are not making any changes to EF Core regarding this.

WithClientEvaluation operator

  • We don't see any value in such operator. EF Core is not going to do any client eval except in top level projection as described in documentation. In order to evaluated further operators on client, use appropriate operator as indicated by message. We understand it is a big breaking changes for some apps but for the best performance it is something which needs to be done at some point. Hence we made the breaking change in 3.x release.

With all of above, I believe that it addresses all the points raised in discussion here.
If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read this comment fully to direct your feedback in specific issue for your specific scenario.

@smitpatel
Copy link
Member Author

Closing this issue as non-needed since sub-issues are created to track individual translation patterns.

@smitpatel smitpatel removed this from the 5.0.0 milestone Mar 6, 2020
@smitpatel smitpatel removed their assignment Jan 12, 2022
@ajcvickers ajcvickers added the closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. label Mar 10, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
@pvg8v6g
Copy link

pvg8v6g commented Feb 29, 2024

i'm arriving late to the party i guess but the loss of server side evaluation from entity framework on groupby is disgusting and gamebreaking.

something as simple as this does NOT work. whoever is sleeping over there on the ef core team needs to wake up and start doing their jobs.

MyTable
	.GroupBy(x => x.Id)
	.OrderBy(x => x.Key)
InvalidOperationException: The LINQ expression 'DbSet<MyTable>()
    .GroupBy(x => x.Id)
    .OrderBy(x => x.Key)' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. 

seriously..................... it's 20 stinking 24 and STILL no resolution to these issues that have been brought up since 2017ish. c'mon man.

@roji
Copy link
Member

roji commented Feb 29, 2024

@pvg8v6g we've done quite a lot of work on closing the gap on GroupBy functionality in recent releases, but there are indeed some problematic corners remaining. This specific querying pattern - ordering after a final GroupBy (without an aggregate function) - is something that IIRC hasn't actually been requested by anyone, which is one reason why it is unimplemented; it also was never server-evaluated, GroupBy without an aggregate function isn't supported by SQL.

As a workaround, the translation for final GroupBy without aggregate function already contains an ORDER BY on the key:

_ = await context.Blogs.GroupBy(b => b.Name).ToListAsync();
SELECT [b].[Name], [b].[Id], [b].[DateTime]
FROM [Blogs] AS [b]
ORDER BY [b].[Name]

So just doing GroupBy without the OrderBy will happen to do what you're looking for.

Otherwise, I'd recommend simply explicitly switching to client evaluation:

_ = context.Blogs
    .AsEnumerable()
    .GroupBy(b => b.Name)
    .OrderBy(b => b.Key)
    .ToList();

@pvg8v6g
Copy link

pvg8v6g commented Mar 8, 2024

@pvg8v6g we've done quite a lot of work on closing the gap on GroupBy functionality in recent releases, but there are indeed some problematic corners remaining. This specific querying pattern - ordering after a final GroupBy (without an aggregate function) - is something that IIRC hasn't actually been requested by anyone, which is one reason why it is unimplemented; it also was never server-evaluated, GroupBy without an aggregate function isn't supported by SQL.

As a workaround, the translation for final GroupBy without aggregate function already contains an ORDER BY on the key:

_ = await context.Blogs.GroupBy(b => b.Name).ToListAsync();
SELECT [b].[Name], [b].[Id], [b].[DateTime]
FROM [Blogs] AS [b]
ORDER BY [b].[Name]

So just doing GroupBy without the OrderBy will happen to do what you're looking for.

Otherwise, I'd recommend simply explicitly switching to client evaluation:

_ = context.Blogs
    .AsEnumerable()
    .GroupBy(b => b.Name)
    .OrderBy(b => b.Key)
    .ToList();

This is not an answer. I chose a VERY simple query to show things are still broken not because that’s what I was trying to do. I am doing VERY complicated queries on a daily basis and do NOT want to bring in tons and tons of data to memory to do a group by. Grouping is an SQL feature and should be translatable server side.

These things used to be possible back in entity framework 6 and that functionality was lost moving to ef core. I should be able to do groupby ANYWHERE in the query with any number of statements after and it be translated to SQL. Bringing them into memory is the opposite of what we need. That’s the whole point of groupby on iqueryable… why not just remove the function off of that class and just leave it on ienumerable if you’re not going to support the functionality….

I’m only upset because we LOST functionality and we would love for it to be returned.

@roji
Copy link
Member

roji commented Mar 9, 2024

@pvg8v6g you seem to have some incorrect assumptions there - I'd very much recommend looking at the actual SQL that the old EF6 generated here.

This is the LINQ query you want EF translate:

MyTable
	.GroupBy(x => x.Id)
	.OrderBy(x => x.Key)

This query by design brings the entire table data to the client - that's simply what it's supposed to do; there's no filtering here (no Where), no aggregation (Sum, Average...) and no projection to prevent all columns from being fetched - you're querying all rows and all columns from the database, and just using LINQ to reorganize the data in groups. So you'd be "bringing in tons and tons of data to memory" no matter what.

Further, LINQ GroupBy cannot be translated to SQL GROUP BY unless you use Select() to apply an aggregate function:

_ = await context.Blogs
    .GroupBy(b => b.Category)
    .Select(g => new { g.Key, MaxId = g.Max(b => b.Id) })
    .ToListAsync();

This can translate to SQL GROUP BY because of the aggregate function (Max); but without the Select() it cannot. EF does support queries where the last operator is a GroupBy, but these don't translate to SQL GROUP BY (not possible) - an ordering is applied (as I wrote above) and the grouping is performed client-side. Again, this doesn't imply any more data being transfered to the client - it merely performs the grouping client-side rather than server-side.

To summarize, for your original query, you can place an AsEnumerable() before the OrderBy - or before the GroupBy - without impacting query performance in a meaningful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. punted-for-3.0 type-enhancement
Projects
None yet
Development

No branches or pull requests