Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EF7 generating incorrect SQL for the Concat/Union All #30273

Closed
onlypritam opened this issue Feb 13, 2023 · 14 comments · Fixed by #30451 or #30452
Closed

EF7 generating incorrect SQL for the Concat/Union All #30273

onlypritam opened this issue Feb 13, 2023 · 14 comments · Fixed by #30451 or #30452
Assignees
Labels
area-query area-set-operations closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-bug
Milestone

Comments

@onlypritam
Copy link

It seems the same code that used to work in EF core 6.4.4 is not breaking in EF 7 and creating an incorrect SQL statement.

Because of this incorrect SQL we are getting an error: "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists."

//Just create a Student class with 2 or 3 fields (with one of the varchar column name as City), and create the table from it. Enter some dummy data (use Delhi, Bangalore And Kolkata in the city column or change the filters in the code below accordingly) Then execute the following code 
//Basically the "Union All" SQL query that it's creating is incorrect

public static string M1()
{
BDbContext db = new BDbContext();
IQueryable<Student> DelhiStudents = (from N in db.students where N.City== "Delhi" select N).Distinct();
IQueryable<Student> BlrStudents = (from N in db.students where N.City == "Bangalore" select N).Distinct();
IQueryable<Student> KolStudents = (from N in db.students where N.City == "Kolkata" select N).Distinct();
IQueryable<Student> AllStudents= DelhiStudents.Concat(BlrStudents).Concat(KolStudents);
IList<string>Students= AllStudents.Select(x=>x.Name).ToList(); //<< this is the line of code that throws the exception>>
return "Success";
}

Include stack traces

Microsoft.Data.SqlClient.SqlException
HResult=0x80131904
Message=All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
  Source=Core Microsoft SqlClient Data Provider
  StackTrace:
  at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
  at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
  at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
  at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
  at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
  at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
  at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
 at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
 at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
 at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
 at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)
 at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.<>c.<MoveNext>b__21_0(DbContext _, Enumerator enumerator)
 at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
 at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()
 at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
 at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
 at Beginers.Program.M1() 
 at Beginers.Program.Main(String[] args) 

### Include provider and version information

EF Core version: 7.x
Database provider: (e.g. Microsoft.EntityFrameworkCore.SqlServer)
Target framework: ( .NET 7.0)
Operating system: Windows 10
IDE: (Visual Studio 2022 17.4)
@ajcvickers
Copy link
Member

This issue is lacking enough information for us to be able to fully understand what is happening. Please attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate.

@onlypritam
Copy link
Author

Beginers.NET6 bug check.zip

@roji
Copy link
Member

roji commented Feb 20, 2023

Minimal repro:

await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();

_ = ctx.Blogs
    .Distinct()
    .Concat(ctx.Blogs.Distinct())
    .Concat(ctx.Blogs.Distinct())
    .Select(s => s.Name)
    .ToList();

public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Blog
{
    public int Id { get; set; }
    public string? Name { get; set; }
}

Resulting SQL:

SELECT [t].[Name]
FROM (
    SELECT DISTINCT [b].[Id], [b].[Name]
    FROM [Blogs] AS [b]
    UNION ALL
    SELECT DISTINCT [b0].[Id], [b0].[Name]
    FROM [Blogs] AS [b0]
) AS [t]
UNION ALL
SELECT DISTINCT [b1].[Id], [b1].[Name]
FROM [Blogs] AS [b1]

Note that we only project out the name from the subquery, and attempt to union that with id and name. Ideally we'd filter the projection down into the 3 different joined queries (and avoid the subquery altogether).

@roji
Copy link
Member

roji commented Feb 22, 2023

As a workaround specifically for the above issue, it's possible to perform the concatenation on the client side; where the concatenation is performed (server- or client-side) doesn't impact performance in a significant way, since the same data has to be delivered across in any case.

This can be done by adding AsEnumerable() (causing it to get executed separately) at the end of each sub-query, and pass the results to Concat on the client side. For example, the original code can be modified as follows:

public static string M1()
{
    BDbContext db = new BDbContext();
    var DelhiStudents = (from N in db.students where N.City== "Delhi" select N.Name).Distinct().AsEnumerable();
    var BlrStudents = (from N in db.students where N.City == "Bangalore" select N.Name).Distinct().AsEnumerable();
    var KolStudents = (from N in db.students where N.City == "Kolkata" select N.Name).Distinct().AsEnumerable;
    var StudentNames = DelhiStudents.Concat(BlrStudents).Concat(KolStudents);
    return "Success";
}

@stevendarby
Copy link
Contributor

Although there does seem to be a bug here I just wanted to note that OP's example doesn't do Distinct() on the Name column (as in the workaround above), but on the whole entity. Given the entity has a key (Id), I don't think the Distinct()s in OP's code are really doing anything useful and could be removed entirely for a simpler workaround for that specific case?

@ajcvickers ajcvickers added this to the Backlog milestone Feb 22, 2023
@roji
Copy link
Member

roji commented Feb 22, 2023

@stevendarby thanks, yeah, that's possible - if Distinct isn't needed (due to the primary key be unique), then removing it should indeed be the better workaround.

@lvendramini
Copy link

I have the same issue

Entity Framework version 7.0.2, behaves differently than the version 6.0.6 we are using until 7 is fixed.

Our code is
IQueryable wpParties =
(from a in _context.OTAddresses.Where(e => staffRegionCvIds.Contains((Guid)e.RegionCvId) && !e.IsDeleted)
join lp in _context.O_V_Latest_Party
on a.PartyId equals lp.Id
join wp in _context.OTWorkplaces.Where(e => !e.IsDeleted)
on lp.Id equals wp.Id
select lp).Distinct();

IQueryable orgParties =
(from wp in wpParties
join pr in _context.OTPartyRelationships.Where(e => !e.IsDeleted)
on wp.CentralPartyId equals pr.ParentCentralPartyId
join lp in _context.O_V_Latest_Party on pr.ChildCentralPartyId equals lp.CentralPartyId
join org in _context.OTOrganizations.Where(e => e.IsActive && !e.IsDeleted) on lp.Id equals org.Id
select lp
).Distinct();

IQueryable individuals =
(from lp in _context.O_V_Latest_Party
join ind in _context.OTIndividuals on lp.Id equals ind.Id
select lp
).Distinct();

IQueryable latestParties = null!;
latestParties = wpParties.Concat(orgParties).Concat(individuals);

IList latestCentralPartyIds = null!;
latestCentralPartyIds = await latestParties.Select(x => x.CentralPartyId).ToListAsync();

What we expect (which was previously working on the older release) if for the 3 lists wpParties, orgParties, individuals to be combined, then get all the CentralPartyIds into a list.
What seems to be happening it unions the first lists (wpParties and orgParties), then does a select centralPartyId on it. Then it tries to union the individual list on (but the columns do not match so it fails).

EF Core 7.0.2 --> this is not correct. Below is what is happening giving an error since it's trying to union before the last concat
Select Id
from ( Select ID, Name From TabA Union all Select Id, Name from TabB) as t0
union all
Select ID,Name from TabC

Old Version - EF 6.0.6  this is correct
Select Id
from ( Select ID, Name From TabA Union all Select Id, Name from TabB union all Select ID,Name from TabC) as t0

@roji
Copy link
Member

roji commented Feb 24, 2023

Confirmed that this is a regression from 6.0, which produces the following SQL:

SELECT [t0].[Name]
FROM (
    SELECT DISTINCT [b].[Id], [b].[Name]
    FROM [Blogs] AS [b]
    UNION ALL
    SELECT DISTINCT [b0].[Id], [b0].[Name]
    FROM [Blogs] AS [b0]
    UNION ALL
    SELECT DISTINCT [b1].[Id], [b1].[Name]
    FROM [Blogs] AS [b1]
) AS [t0]

@roji roji removed this from the Backlog milestone Feb 24, 2023
@lvendramini
Copy link

As a workaround specifically for the above issue, it's possible to perform the concatenation on the client side; where the concatenation is performed (server- or client-side) doesn't impact performance in a significant way, since the same data has to be delivered across in any case.

This can be done by adding AsEnumerable() (causing it to get executed separately) at the end of each sub-query, and pass the results to Concat on the client side. For example, the original code can be modified as follows:

public static string M1()
{
    BDbContext db = new BDbContext();
    var DelhiStudents = (from N in db.students where N.City== "Delhi" select N.Name).Distinct().AsEnumerable();
    var BlrStudents = (from N in db.students where N.City == "Bangalore" select N.Name).Distinct().AsEnumerable();
    var KolStudents = (from N in db.students where N.City == "Kolkata" select N.Name).Distinct().AsEnumerable;
    var StudentNames = DelhiStudents.Concat(BlrStudents).Concat(KolStudents);
    return "Success";
}

This would around would not work for us. It is bring back much more data than is needed.

@roji
Copy link
Member

roji commented Feb 24, 2023

This would around would not work for us. It is bring back much more data than is needed

@lvendramini I'm not sure how that could be. Concat brings the same amount of data regardless of whether it's evaluated at the client or at the server.

@ajcvickers
Copy link
Member

Note from triage: given that this looks like a regression from EF Core 6 to EF Core 7 (and not a difference in behavior between EF6 and EF Core as originally reported), we will investigate a patch fix. /cc @maumar

@maumar
Copy link
Contributor

maumar commented Mar 9, 2023

problem is in SelectExpression pruning logic. For set ops we prune both sources individually, however in the problematic case, the second source is distinct, so we can't prune it (we would change the result if we did), but the first source is UNION ALL, so it's ok to prune. To mitigate this, we can check if any of the sources have Distinct on them before we attempt to prune either.

@maumar
Copy link
Contributor

maumar commented Mar 9, 2023

@onlypritam as alternative workaround you can try removing 3 distincts applied to individual cities:

public static string M1()
{
BDbContext db = new BDbContext();
IQueryable<Student> DelhiStudents = (from N in db.students where N.City== "Delhi" select N);// removed distinct
IQueryable<Student> BlrStudents = (from N in db.students where N.City == "Bangalore" select N);// removed distinct
IQueryable<Student> KolStudents = (from N in db.students where N.City == "Kolkata" select N);// removed distinct
IQueryable<Student> AllStudents= DelhiStudents.Concat(BlrStudents).Concat(KolStudents); 
IList<string>Students= AllStudents.Select(x=>x.Name).ToList(); 
return "Success";
}

those 3 distincts you had don't do anything anyway - students are entities so they are already distinct within DbSet.

maumar added a commit that referenced this issue Mar 9, 2023
…, The same code was working fine for EF6.4.4

Problem was that during SelectExpression pruning for set operations we would try to prune both sources independently. However, if a source is Distinct we can't prune (that could affect the result). So the problematic case is where we prune one source but can't prune the second due to Distinct, and we end up with both sources having not matching projections.
Fix is to check if either source has Distinct before attempting to prune them.

Fixed #30273
maumar added a commit that referenced this issue Mar 10, 2023
…, The same code was working fine for EF6.4.4

Problem was that during SelectExpression pruning for set operations we would try to prune both sources independently. However, if a source is Distinct we can't prune (that could affect the result). So the problematic case is where we prune one source but can't prune the second due to Distinct, and we end up with both sources having not matching projections. Fix is to check if either source has Distinct before attempting to prune them.

Fixed #30273
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Mar 10, 2023
maumar added a commit that referenced this issue Mar 10, 2023
…, The same code was working fine for EF6.4.4 (#30452)

Problem was that during SelectExpression pruning for set operations we would try to prune both sources independently. However, if a source is Distinct we can't prune (that could affect the result). So the problematic case is where we prune one source but can't prune the second due to Distinct, and we end up with both sources having not matching projections. Fix is to check if either source has Distinct before attempting to prune them.

Fixed #30273
@maumar maumar reopened this Mar 12, 2023
@maumar
Copy link
Contributor

maumar commented Mar 12, 2023

reopening for potential patch

@maumar maumar added this to the 7.0.5 milestone Mar 14, 2023
ajcvickers pushed a commit that referenced this issue Mar 14, 2023
…ncat/Union All, The same code was working fine for EF6.4.4 (#30451)

Fixed #30273
@ajcvickers ajcvickers changed the title EF7 generating incorrect SQL for the Concat/Union All, The same code was working fine for EF6.4.4 EF7 generating incorrect SQL for the Concat/Union All Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment