Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function counts differently when collection from LINQ isn't materialized #12315

Closed
pklejnowski opened this issue Jun 9, 2018 · 33 comments

Comments

@pklejnowski
Copy link

commented Jun 9, 2018

After I updated my application from .net Core 2.0 to 2.1 (with packages) I've noticed that one function is returning only half records. So I was trying to debug this problem and check what I've found:
image

That method returns only 5 elements (what is totally incorrect). I put the same function (but with .ToList() before .Take(pageSize) in watch and it returns 10 elements (what is correct).

Generated SQL query that I catched in SQL Profiler returns 10 elements.

Steps to reproduce

The bug is in LINQ query

public Page<GroupTotal> Get(QueryParameter queryParameter)
{
    var summariesQuery = _context.Groups.AsNoTracking()
        .Where(g => g.IsActive)
        .GroupBy(g => new { g.GroupName, g.GroupId });
    
    return summariesQuery
        .Select(summary => new GroupTotal
        {
            Id = summary.Key.GroupId,
            Name = summary.Key.GroupName,
            Mode = summary.Select(s => s.Mode).ToArray() // THIS LINE MAKES PROBLEM
        })
        .ToPage(queryParameter.PageSize, queryParameter.PageNumber);
}

public static Page<T> ToPage<T>(this IQueryable<T> source, int pageSize, int pageNumber)
{
    var items = source.Skip(pageSize * (pageNumber - 1))
        .Take(pageSize)
        .ToList();

    return new Page<T>
    {
        PageSize = pageSize,
        PageNumber = pageNumber,
        Items = items,
        TotalCount = source.Count()
    };
}

If I'll remove Mode = summary.Select(s => s.Mode).ToArray() this line from my LINQ query then everything works fine.

Further technical details

EF Core version: EntityFrameworkCore.SqlServer 2.1.0
Database Provider: Microsoft.EntityFrameworkCore.SqlServer 2.1.0
Operating system: Windows 10
IDE: Visual Studio 2017 15.7.3

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 11, 2018

Perhaps it might be because of .GroupBy(g => new { g.GroupName, g.GroupId }); I heard that peopole have problem with GroupBy in LINQ

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 11, 2018

I’m not so expereinced in EF but i’m good in SQL and LINQ, so there are my 5 cents.
Your query is not effective for SQL Server and EF have to create special transformation of this query to:

var summariesGrouped = _context.Groups.AsNoTracking()
        .Where(g => g.IsActive)
        .Select(g => new { g.GroupName, g.GroupId })
        .Distinct();

var summariesPaged = summariesGrouped
        .Skip(pageSize * (pageNumber - 1))
        .Take(pageSize);

return summariesPaged.
        .Join(_context.Groups, l => new { l.GroupName, l.GroupId }, r => new { r.GroupName, r.GroupId },
              (l, r) => new { r.GroupName, r.GroupId, r.Mode })
        .AsEnumerable()  // enforce client side evaluation
        .GroupBy(g => new { g.GroupName, g.GroupId })
        .Select(summary => new GroupTotal
        {
            Id = summary.Key.GroupId,
            Name = summary.Key.GroupName,
            Mode = summary.Select(s => s.Mode).ToArray() 
        });

If EF Core is not doing this non trivial transformation - your query is not effective and you will get client side evaluation of almost whole table on client side which equals to server death.

Wellcome to the real world.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 11, 2018

@sdanyliv How you know that? How you know that will be transformated in that way and it won't be efficient? Or how can I learn it somehow to avoid not efficient LINQ queries in future?
Although still no idea why it works fine for EntityFrameworkCore 2.0

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 11, 2018

@pklejnowski, i don‘t know how EF optimizes queries. It should, but it is on early stages of that, but possibly i’m wrong.
I have writen query that is most effective in your situation based on my experience. And just warning you - analyse SQL. If you will get expected result but SQL does not contain OFFSET, FETCH keywords - you have client evaluation.

@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2018

@pklejnowski - The issues people having are around Group By with aggregate operators which would generate Server GROUP BY, which we started translating in 2.1 release. This is not that case.

@maumar - Would correlated subquery optimization apply in this case? There is ToArray but also GroupBy which would be client eval.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 11, 2018

As I said in first post. Generated SQL looks fine and returns correct results. I have tracked it by sql profiler.

@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2018

@pklejnowski - Can you post generated SQL(s)?

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 11, 2018

@pklejnowski, your line with comment // THIS LINE MAKES PROBLEM brings additional complexity for the query. That’s what i’m trying to explain.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

@smitpatel
Here is the generated query tracked by SQL Profiler:

exec sp_executesql N'
SELECT 
	[g].[GroupId], 
	[g].[GroupName],
	[g].[Mode]
FROM 
	[Read].[Groups] AS [g] 
WHERE 
	[g].[Mode] = @__mode_0 
	ORDER BY [g].[GroupName], [g].[GroupId] OFFSET @__p_1 ROWS FETCH NEXT @__p_2 ROWS ONLY',
	N'@__mode_0 int,
	@__p_1 int,
	@__p_2 int',
	@__mode_0=1,
	@__p_1=0,
	@__p_2=10

And of course it return 10 records. I don't know why LINQ query returns only 5 though.
Unless I'll materialize it before .Take(pageSize) in ToPage method

var items = source.Skip(pageSize * (pageNumber - 1))
                .ToList() // here
                .Take(pageSize)
                .ToList();

Then it returns properly result (but of course is not efficient because query will be significantly different and will take all records from database and after that will take 10 records.

Or I can remove this line Mode = summary.Select(s => s.Mode).ToArray() as I said in first post.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Intresting thing. I've returned to EntityFrameworkCore2.0 and I've checked generated SQL query and is without OFFSET and FETCH part. So is totally inefficient in 2.0 and in 2.1 doesn't work ^^

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

@pklejnowski, last one try. Just trying to help you in this situation.

It is because your query is trying to retrieve details of grouped records and apply paging to that group. It's not trivial query without client evaluation and i have rewritten it to you to work as expected here. I know no one ORM that can do this automatically (I'm maintainer of one of them).

Query should have the following structure:

  1. Group records (or distinct) and select only grouping key - result1
  2. Apply paging to result1 - only at this stage you can apply paging!
  3. Join result1 to same recordset (it filters records and paging will be applied correctly) and select only needed columns - result2
  4. Group result2 on client side because we need details ummary.Select(s => s.Mode).ToArray()
@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Ahhh I see, I'll check this. By the way what is "client evaluation"? Can you explain it?

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

Client evaluation means loading almost all table to your application and group records on your machine.
It's very dangerous because of high memory usage and may lead to OutOfMemory.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Thank you. What characterizes not trivial query? In the future I would like to intuitively recognize which query may pose a problem.

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

@pklejnowski, there are a lot of them but usually you will have problem with grouping.

For GROUP BY query SQL Server can return only grouping keys and aggregation result with predefined (or custom) aggregation functions COUNT, SUM, MIN, MAX, ect..

SELECT
  GroupKey1,
  GroupKey2,
  COUNT(*),
  MIN(Field),
  ...
FROM SomeTable
GROUP BY
  GroupKey1,
  GroupKey2

It's easy and EF works well in that case and you can apply paging to this query. BUT if you need details of the grouped values - things become complicated.

For example change your query to get only first Mode:

summariesQuery
        .Select(summary => new GroupTotal
        {
            Id = summary.Key.GroupId,
            Name = summary.Key.GroupName,
            Mode = summary.Select(s => s.Mode).First() // THIS ALSO MAKES PROBLEM
        })

Solving this sample is ineffective even with my solution and can be done only by Window Functions, which are not supported by EF Core, and you have to write raw SQL. You can check my sample in our repository linq2db/linq2db#1129 (comment) , but as mentioned before, this functionality is not available in EF Core, but not forever.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

@sdanyliv I see. You explained a lot to me. I always thought that everything in EF+LINQ is happening magically in best way but clearly not.

Based on your example if I need for my Pager total count, then it wouldn't be a problem to add var totalCount = summariesGrouped.Count() at the end?

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

If total count of groups - yes, but notice that this will run additional query.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Yes I'm aware of that. But I have no choice :) I need to know how many of them are in the database, since I have to display page number list.

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

I always thought that everything in EF+LINQ is happening magically in best way but clearly not.

Don't trust anyone :)
disabling-client-evaluation

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

This warning looks nice but is too sensitive I think. Because it throw exception even for var maxYear = _context.Groups.Max(g => g.Year);

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

In this case also?

_context.Groups.AsNoTracking().Max(g => g.year);
@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Also. However on EntityFrameworkCore2.1 works fine even without AsNoTracking. So maybe they had bug in 2.0 version. Anyway should I turn on this option? When Client Evaluation can be useful?

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 12, 2018

We had discussion about this in our team. Conclusion is: never, if you do not need change tracking for these entities. You can always turn on client side evaluation in appropriate place via AsEnumerable() or by ToArrayAsync() extension methods.
Otherwise instead of SQL Server calculation you will get uncontrolled copy of the table on the client side.

@maumar

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2018

@pklejnowski

_context.Groups.Max(g => g.Year)

should not cause client evaluation. What is the exception message you are seeing? Can you post a full code sample that shows the problem?

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

@maumar
System.InvalidOperationException: 'Warning as error exception for warning 'Microsoft.EntityFrameworkCore.Query.QueryClientEvaluationWarning': The LINQ expression 'Max()' could not be translated and will be evaluated locally. To suppress this Exception use the DbContextOptionsBuilder.ConfigureWarnings API. ConfigureWarnings can be used when overriding the DbContext.OnConfiguring method or using AddDbContext on the application service provider.'

@maumar

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2018

you only see this in 2.0 right? or 2.1 also?

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

Only 2.0, in 2.1 is fine

@ajcvickers ajcvickers added the type-bug label Jun 12, 2018
@ajcvickers ajcvickers added this to the 2.2.0 milestone Jun 12, 2018
@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 13, 2018

@sdanyliv
Client Evaluation is the new feature of EntityFrameworkCore? Is something like that for the old EF in .net 4.5 as well? I am shocked that there is very little information in google about Client Evaluation. And VS does not warn you in any way about this unless you turn on warning manually.

@sdanyliv

This comment has been minimized.

Copy link

commented Jun 13, 2018

@pklejnowski, i'm just keeping one eye open on how things are going in EF Core. As i know this "feature" exists from first versions of EF Core.
Maybe i'm wrong but, from my observations, if EF Core can not generate SQL - client evaluation happens.

@pklejnowski

This comment has been minimized.

Copy link
Author

commented Jun 14, 2018

@sdanyliv one more question about this code you have suggested. Shouldn't I add .Where(g => g.IsActive) to query since this condition is in summariesGrouped as well? Without that data might be filtered wrong I think?

So it would be:

return summariesPaged.
        .Join(_context.Groups, l => new { l.GroupName, l.GroupId }, r => new { r.GroupName, r.GroupId },
              (l, r) => new { r.GroupName, r.GroupId, r.Mode })
        .Where(g => g.IsActive) // <<< HERE
        .AsEnumerable()  // enforce client side evaluation
        .GroupBy(g => new { g.GroupName, g.GroupId })
        .Select(summary => new GroupTotal
        {
            Id = summary.Key.GroupId,
            Name = summary.Key.GroupName,
            Mode = summary.Select(s => s.Mode).ToArray() 
        });
@sdanyliv

This comment has been minimized.

Copy link

commented Jun 14, 2018

Yes, my fault. Looks like filter needed.

@smitpatel smitpatel removed the type-bug label Jun 14, 2018
@ajcvickers ajcvickers modified the milestones: 2.2.0, 3.0.0 Aug 3, 2018
@ajcvickers ajcvickers removed this from the 3.0.0 milestone Aug 3, 2018
@ajcvickers ajcvickers assigned smitpatel and unassigned maumar Aug 6, 2018
@ajcvickers ajcvickers added this to the 3.0.0 milestone Aug 6, 2018
@ajcvickers ajcvickers added the type-bug label Aug 6, 2018
@smitpatel

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2018

@pklejnowski - Based on code snippets posted in first post, I am not able to reproduce this issue. I am seeing correct number of records and SQL generated after applying Skip/Take.
Repro code:

using System;
using System.Linq;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;

namespace EFSampleApp
{
    public class Program
    {
        public static void Main(string[] args)
        {
            using (var db = new MyContext())
            {
                // Recreate database
                db.Database.EnsureDeleted();
                db.Database.EnsureCreated();

                // Seed database
                db.AddRange(new Group
                {
                    GroupName = "Group1",
                    IsActive = true,
                    Mode = new Mode()
                },
                new Group
                {
                    GroupName = "Group2",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group3",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group4",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group5",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group6",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group7",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group8",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group9",
                    IsActive = true,
                    Mode = new Mode()
                }, new Group
                {
                    GroupName = "Group10",
                    IsActive = true,
                    Mode = new Mode()
                });

                db.SaveChanges();
            }

            using (var db = new MyContext())
            {
                // Run queries
                var query = db.Groups.AsNoTracking()
                    .Where(g => g.IsActive)
                    .GroupBy(g => new { g.GroupName, g.GroupId })
                    .Select(summary => new GroupTotal
                    {

                        Id = summary.Key.GroupId,
                        Name = summary.Key.GroupName,
                        Mode = summary.Select(s => s.Mode).ToArray() // THIS LINE MAKES PROBLEM
                    })
                    .Skip(0)
                    .Take(10)
                    .ToList();

                Console.WriteLine(query.Count);
            }

            Console.WriteLine("Program finished.");
        }
    }


    public class MyContext : DbContext
    {
        private static ILoggerFactory LoggerFactory => new LoggerFactory().AddConsole(LogLevel.Trace);

        // Declare DBSets
        public DbSet<Blog> Blogs { get; set; }
        public DbSet<Group> Groups { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            // Select 1 provider
            optionsBuilder
                .UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=_ModelApp;Trusted_Connection=True;Connect Timeout=5;ConnectRetryCount=0")
                //.UseSqlite("filename=_modelApp.db")
                //.UseInMemoryDatabase(databaseName: "_modelApp")
                .EnableSensitiveDataLogging()
                .UseLoggerFactory(LoggerFactory);
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            // Configure model
        }
    }

    public class Blog
    {
        public int Id { get; set; }
    }

    public class Group
    {
        public int GroupId { get; set; }
        public string GroupName { get; set; }
        public bool IsActive { get; set; }
        public Mode Mode { get; set; }
    }

    public class GroupTotal
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public Mode[] Mode { get; set; }
    }

    public class Mode
    {
        public int Id { get; set; }
    }
}

Please modify above repro code to demonstrate the issue you are seeing.

@smitpatel smitpatel removed this from the 3.0.0 milestone Aug 6, 2018
@ajcvickers

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it.

BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.