Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relational: Support translating GroupBy() to SQL #2341

Closed
divega opened this issue Jun 6, 2015 · 100 comments

Comments

Projects
None yet
@divega
Copy link
Member

commented Jun 6, 2015

Updated as of 3/14/2018 based on implementation introduced

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses, in particular when aggregate functions are applied in the projection.

Scope for 2.1 release

Our current intention is for the scope of the work in 2.1 to improve how LINQ's GroupBy is evaluated in this particular scenario:

Grouping on a simple expression that references a single column or multiple columns (using an anonymous type) of the query root and then projecting any aggregate operation from (Count, Sum, Average, Min, Max) and/or any individual property that is part of the grouping key (this will be translated to GROUP BY in SQL)

What is supported in 2.1.0-preview2

Apart from what is supported in 2.1.0-preview1 (details below), we have also added some more patterns

  • Grouping by constant/variables
  • Grouping by properties of reference navigations
  • Ordering or filtering on grouping key or aggregate
  • Aggregate being DTO/nominal type

Examples

// Grouping by constant or a variable from closure
db.Orders.GroupBy(o => 2).Select(g => g.Count());
var a = 5;
db.Orders.GroupBy(o => a).Select(g => g.Count());

// Grouping by scalar properties from reference navigations
db.Orders.GroupBy(o => o.Customer.City).Select(g => g.Count());

// Ordering by Key/Aggregate after GroupBy
db.Orders.GroupBy(o => o.CustomerID).OrderBy(o => o.Count()).Select(g => new { g.Key, Count = g.Count() });

// Filtering on Key/Aggregate after GroupBy (Translates to Having clause in SQL)
db.Orders.GroupBy(o => o.CustomerID).Where(o => o.Count() > 0).Select(g => new { g.Key, Count = g.Count() });

// Projecting aggregate into nominal type
db.Orders.GroupBy(o => o.CustomerID).Select(g => new CustomerCountInfo { g.Key, Count = g.Count() });

And a few bugfixes - #11218 #11157 #11176

What is supported in 2.1.0-preview1

// Grouping by single column projecting aggregate or key
db.Orders.GroupBy(o => o.CustomerId).Select(g => g.Count());
db.Orders.GroupBy(o => o.CustomerId).Select(g => new { CustomerId = g.Key, Count = g.Count() });

// Grouping by multiple columns (using anonymous type) projecting aggregate or key or nested key
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate }).Select(g => g.Sum(o => o.Cost));
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { CustomerId = g.Key.CustomerId, Sum = g.Sum(o => o.Cost) });
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { Key = g.Key, Sum = g.Sum(o => o.Cost) }); // Where Key will be anonymous object

// Grouping after complex query root
(from o in db.Orders.Where(o => o.OrderID < 10400).OrderBy(o => o.OrderDate).Take(100)
    join c in db.Customers.Where(c => c.CustomerID != "DRACD" && c.CustomerID != "FOLKO").OrderBy(c => c.City).Skip(10).Take(50)
        on o.CustomerID equals c.CustomerID
    group o by c.CustomerID)
.Select(g => new { g.Key, Count = g.Average(o => o.OrderID) });

db.Orders.OrderBy(o => o.OrderID)
    .Skip(80)
    .Take(500)
    .GroupBy(o => o.CustomerID)
    .Select(g => g.Max(o => o.OrderID));

// All above examples have group of entity types after GroupBy

// Selecting Group of anonymous types containing multiple columns
db.Orders.GroupBy(o => o.CustomerId, new {o.OrderDate, o.Price}).Select(g => g.Sum(t => t.Price));

Scenarios that we are not planning to improve in the 2.1 release

1. Grouping on constants (available in 2.1.0-preview2)
2. Grouping on an entity (e.g. a reference navigation property)
3. Projecting non-aggregate scalar subqueries after grouping, e.g. FirstOrDefault()
4. Making groups of multiple entityTypes using anonymous types.
5. Using Key/Aggregate values after GroupBy in joins (#10012)

All the scenarios above present different variations depending on what happens after the GroupBy, e.g. is there an aggregate function or not, is the key mentioned in the projection or not, etc. These scenarios will still result in client evaluation.

We would appreciate if customers that care about EF Core supporting any of those scenarios that are scoped out from 2.1 to create individual issues for them, up-vote them, and keep the discussion there.

@rowanmiller rowanmiller added this to the 7.0.0 milestone Jun 12, 2015

@alexvaluyskiy

This comment has been minimized.

Copy link

commented Jul 14, 2015

I have these classes

public class Blog
{
    public int BlogId { get; set; }
    public string Url { get; set; }

    public List<Post> Posts { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}

If I use this code

var query1 = from p in context.Posts
             where p.BlogId == 1
             group p by p.Title into g
             select new
             {
                 Title = g.Key,
                 Count = g.Count()
             };

EF7 generates this SQL

SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
FROM [Post] AS [p]
WHERE [p].[BlogId] = 1

without grouping and projection and groups data on the client side

If I use Group by with Inner Join with this code

var query = from p in context.Posts
            join b in context.Blogs on p.BlogId equals b.BlogId
            where p.BlogId == 1
            group p by p.Title into g
            select new
            {
                Title = g.Key,
                Count = g.Count()
            };

I get this error

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType1`2[EF7Test.Post,EF7Test.Blog]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[EF7Test.Post]'."}

EF 7.0.0-beta6-13735

@rowanmiller

This comment has been minimized.

Copy link
Member

commented Jul 24, 2015

Consider these cases when translating SQL

@leus

This comment has been minimized.

Copy link

commented Oct 2, 2015

Just another test case:

public class Categoria
{
    public int Id { get; set; }
    public int? CategoriaId { get; set; }
    public string Nombre { get; set; }
}

var q = from c in dbContext.Categorias
                join cc in dbContext.Categorias on c.CategoriaId equals cc.Id
                group c by c.CategoriaId into g
                select new { Id = g.Key.Value, total = g.Count() };

Throws the following exception:

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType3`2[nyp.DataModels.Categoria,nyp.DataModels.Categoria]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[nyp.DataModels.Categoria]'."}

This is for a recursive tree-like structure. Here I'm just trying to count children.

@rowanmiller

This comment has been minimized.

Copy link
Member

commented Dec 7, 2015

@anpete should this go to backlog for now?

@anpete

This comment has been minimized.

Copy link

commented Dec 7, 2015

@rowanmiller Sounds good.

@rowanmiller rowanmiller removed the pri0 label Dec 7, 2015

@rowanmiller rowanmiller modified the milestones: Backlog, 7.0.0 Dec 7, 2015

@YehudahA

This comment has been minimized.

Copy link

commented Dec 9, 2015

Why it's go to backlog?!
It's critical for for information and reports systems.
For example, I have 2 million records and I need to show sales by customer. Grouping in client side, is impossible!

@ErikEJ

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2015

View/ Indexed view?

@rowanmiller

This comment has been minimized.

Copy link
Member

commented Dec 9, 2015

Or a raw SQL query. It's not that we aren't going to support it, just that we don't have time to get this translating (and stabilize it) before the 7.0.0 release. This one will be one of the first things we tackle after that.

@rjperes

This comment has been minimized.

Copy link

commented Jul 22, 2016

What would be the recommended approach using SQL (FromSql)?
The problem I see is that we need an entity to materialize the records into, and I think it would need to be mapped.

@divega divega changed the title Relational: Support translating of GroupBy() to SQL Relational: Support translating GroupBy() to SQL Aug 8, 2016

@divega

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2016

@rjperes that would require a feature we are planning to add in 1.1.0: the ability to boostrap queries using arbitrary types: #1862.

@chris-oswald

This comment has been minimized.

Copy link

commented Nov 28, 2016

@rowanmiller I was hoping this was a 1.1 milestone, but I don't see this in 1.2 either. Any idea when this will get pulled off the backlog? I know we have work-arounds in the mean time.

@rowanmiller

This comment has been minimized.

Copy link
Member

commented Nov 28, 2016

@chris-oswald not yet, we are planning for the next release this week.

@simeyla

This comment has been minimized.

Copy link

commented Dec 8, 2016

It's good to at least see it is listed as item #2 in the Critical O/RM features section.

Even if all situations cannot be accounted for - I hope we are going to at least get the simplest cases covered.

And to be fair @chris-oswald there is literally nothing in the current 1.2 feature list - you scared me for a moment!

@tuespetre

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2017

Heads up: I have made a lot of headway in this area. Expect a PR soon 😸

@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2017

@jzabroski - At present grouping on constant does not throw. It would evaluate it on client. Change it to throw an exception is breaking change.

@sonphnt

This comment has been minimized.

Copy link

commented Dec 2, 2017

Hi. Have we had any supports about this?

@mchenx

This comment has been minimized.

Copy link

commented Feb 20, 2018

This seems still not suppprted... I'm currently working on a project and facing the performance issue as it doesn't translate to select count(*), xxxx group by xxxx , it actually fetches all the data from DB and does group by locally which causes the performance issue, coz I have got a million records...

var counts = (from t in _dbContext.VoteTickets
join m in _dbContext.VoteItemData
on t.VoteItemId equals m.Id
where m.City == city
group t by m.District into grp
select new
{
District = grp.Key,
Count = grp.Count()
})
.ToArray();

@jzabroski

This comment has been minimized.

Copy link

commented Feb 20, 2018

@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2018

upgrading EF6 to EFCore will bring unexpected surprises.
EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in many cases have behavioral differences. Any application expecting the group by constant to work same as EF6 is gonna fail regardless of it client eval or throw. By doing client eval, there is higher probability of external tools may just work.

@Ben-Pattinson

This comment has been minimized.

Copy link

commented Feb 20, 2018

@mchenx I'm affraid you won't get any joy with EF Core anytime soon. I recommend doing the simple stuff in EF Core and anything complicated in dapper or similar. It really is WAY easier than even full EF. Yes there's hard coded strings, but with string interpolation, nameof and a data layer, you can really make it robust.
Give it a shot, you'll be surprised.

@ethanli83

This comment has been minimized.

Copy link

commented Feb 26, 2018

@Ben-Pattinson, maybe you will be interested in this project I worked on EFSqlTranslator. It translates EFCore lambda into string and executes it on Dapper.

@Ben-Pattinson

This comment has been minimized.

Copy link

commented Feb 26, 2018

@ethanli83, very nice! I must say, initially I was rather sceptical, having been badly burnt by EF's horrific SQL generation. Assuming you haven't cherry picked your examples of SQL generation, they are nice and readable, and moreover, sane. Very un-EF like. You'll get into trouble for stuff like that, the Union will be along in a moment to give you a good talking to :) Comprehensive examples too, keep up the good work!

@ethanli83

This comment has been minimized.

Copy link

commented Feb 27, 2018

Thank you @Ben-Pattinson, I will take a look at Union!~

@anpete

This comment has been minimized.

Copy link

commented Feb 27, 2018

@Ben-Pattinson SQL generation is something we have tried really hard to improve in EF Core. Do you have some examples of where you feel we can improve?

@Ben-Pattinson

This comment has been minimized.

Copy link

commented Feb 27, 2018

@anpete (sorry for the rambling reply)
I probably should have been more specific, I was referring to full framework's EF sql. I have spent literally days trying to get full EF to output sql that did what I meant. Trying to decipher the generated sql was often a nightmare. Getting the same operation was minutes in dapper. This is why I settled on a hybrid approach. Use EF for CRUD and simple selects, dapper for anything complex. When I started porting to core, I was pleased to see the SQL is indeed much more readable, which is a great improvement.... however imho the lack of group-by support makes it a bit of a joke.
The silent nature in which the group-by executes client-side is particularly nasty for the unwary. It reinforces my position of allways checking the SQL EF outputs - which the new generation does really help with.
I can work round this new limitation by simply extending my dapper use, but I'm sure you can see the problems anyone not on a hybrid approach will have.

I appreciate the problems you have when you need to support the same linq syntax for many different types of query, but when personal projects are running rings around your official data layer implementation, something isn't right.

Personally, all I ever wanted was to write sql in my data layer and have it compile time validated against my data layer. It's so much easier and flexible than translating between unrelated syntaxes. I've got that now with dapper, nameof, string interpolation and t4 templates. It would have been nice to get slicker intellisense support, but it's pretty usable.

@jzabroski

This comment has been minimized.

Copy link

commented Feb 27, 2018

@jzabroski

This comment has been minimized.

Copy link

commented Feb 27, 2018

@anpete

This comment has been minimized.

Copy link

commented Feb 28, 2018

@Ben-Pattinson Thanks for the feedback and glad to hear that you have come up with a solution that works for you.

A couple of responses to the points you make: LINQ GroupBy is very different from SQL GROUP BY and so we need client eval here for many cases anyway. It was a shame we weren't able to get to GroupBy with aggregates translation until 2.1, but it is almost here. One reason for this prioritization was that we have the FromSql API, which gives you a Dapper-like experience for creating queries with raw SQL. Have you tried it? If so, it would be great if you could let us know why it is not working for you - It could be because we didn't have Query Types, which are also landing in 2.1, and make the FromSql experience much better because you no longer need an entity result type.

@simeyla

This comment has been minimized.

Copy link

commented Mar 4, 2018

@anpete It's really great to finally see this working :-)

But oddly enough I still receive the error message telling me it cannot translate to GROUP BY, even though the generated SQL right underneath most certainly uses GROUP BY! I assume this must be a known issue?

Entity Framework Core 2.1.0-preview1-28290 initialized ......

warn: Microsoft.EntityFrameworkCore.Query[20500]
The LINQ expression 'GroupBy(new <>f__AnonymousType1`1(OrderDateDt = [x].OrderDateDt), [x])' could not be translated and will be evaluated locally.

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (33ms) [Parameters=[], CommandType='Text', CommandTimeout='30']


      SELECT [x].[OrderDateDt], COUNT(*)
      FROM [Order] AS [x]
      WHERE [x].[CompletedOrderId] IS NOT NULL AND ([x].[OrderDateDt] > DATEADD(day, -30E0, GETDATE()))
      GROUP BY [x].[OrderDateDt]

This is from this source C#

        var groupedOrders = _context.Order
                    .Where(x => x.CompletedOrderId != null)
                    .Where(x => x.OrderDateDt > DateTime.Now.AddDays(-30))
                    .GroupBy(x => new { x.OrderDateDt })
                    .Select(x => new
                    {
                        Date = x.Key.OrderDateDt,
                        Count = x.Count()
                    })
                    .ToList();

It returns instantly and definitely works as expected so I'm happy for now - just wanted to point this out.

@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2018

@simeyla - Thanks for info. I have filed #11157 to track the issue and fix it.

smitpatel added a commit that referenced this issue Mar 14, 2018

Query: Improvements to Relational GroupBy translation for composition
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341

smitpatel added a commit that referenced this issue Mar 14, 2018

Query: Improvements to Relational GroupBy translation for composition
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341

smitpatel added a commit that referenced this issue Mar 14, 2018

Query: Improvements to Relational GroupBy translation for composition
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2018

For preview2 version, there are more patterns now being translated to server (including group by constant). I have updated first post to capture the details.

@artemvalmus

This comment has been minimized.

Copy link

commented Aug 14, 2018

Please let me know if I can solve the following in 2.1.1:

I need to take the latest item from each group, but I receive warnings saying the query cannot be translated. Is there a workaround and is it going to be possible in the foreseeable future?

Example:

    public class Author
    {
        public int AuthorId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

    public class Book
    {
        public int BookId { get; set; }
        public int AuthorId { get; set; }
        public string Name { get; set; }
        public DateTime CreatedAt { get; set; }

        public virtual Author Author { get; set; }
    }

    public class EFCoreDemoContext : DbContext
    {
        public static readonly LoggerFactory MyLoggerFactory
            = new LoggerFactory(new[] { new ConsoleLoggerProvider((_, __) => true, true) });

        public DbSet<Author> Authors { get; set; }
        public DbSet<Book> Books { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseLoggerFactory(MyLoggerFactory).UseSqlServer("Server=(localdb)\\MSSQLLocalDB;Database=Test;Trusted_Connection=True");
        }
    }

    static void Main(string[] args)
    {
        using (var context = new EFCoreDemoContext())
        {
            context.Database.EnsureCreated();

            // works as expected
            var test1 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.Count()).ToList();

            // warn: The LINQ expression 'GroupBy([x].AuthorId, [x])' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'orderby [b].CreatedAt asc' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'Last()' could not be translated and will be evaluated locally.
            var test2 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.OrderBy(b => b.CreatedAt).Last()).ToList();
        }

        Console.ReadKey();
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.