Skip to content

Avoiding SQL Server plan cache pollution with EF Core 3 and Enumerable.Contains() #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ErikEJ opened this issue Mar 29, 2020 · 21 comments
Labels
comment Blog comment

Comments

@ErikEJ
Copy link
Owner

ErikEJ commented Mar 29, 2020

No description provided.

@roji
Copy link

roji commented Mar 30, 2020

Any reason to generate so many ORs rather than to just put the parameter placeholders in the IN parentheses?

@ErikEJ
Copy link
Owner Author

ErikEJ commented Mar 31, 2020

Just because I would not know how to do that! Do you think it makes any difference performance wise?

@joelweiss
Copy link

Do you think it makes any difference performance wise?

this says no

@ksks88
Copy link

ksks88 commented May 10, 2020

Hey @ErikEJ,

Do you think it would be safe to remove the condition for checking argument length for Postgresql (Npgsql)?

Good job BTW!

@ErikEJ
Copy link
Owner Author

ErikEJ commented May 10, 2020

Yes, looks like some source say short.MaxValue is the limit, but not sure about the ADO.NET provider - @roji ?

@roji
Copy link

roji commented May 10, 2020

Sorry, am missing some context... What exactly are we talking about here? :)

@ksks88
Copy link

ksks88 commented May 10, 2020

@roji
Copy link

roji commented May 11, 2020

Oh, OK. Yeah, PostgreSQL supports up to ushort.MaxValue (65535) parameters for each statement - but a batched command can have more than 65535 parameters (as long as no single statement does). What current argument length check do you have?

@ErikEJ
Copy link
Owner Author

ErikEJ commented May 11, 2020

Current max for SQL Server is 2048, so it can be much higher for postgress

@roji
Copy link

roji commented May 11, 2020

Yep, absolutely.

@ransagy
Copy link

ransagy commented Nov 21, 2021

Sorry for raising the dead here a bit, But im trying to put it into context and this seemed a better location than the EF Core issue for it (if you think otherwise, let me know and ill mirror there):

You chose to update only the "bucketed" solution from divega's proposals - Is that solely because its more globally supported by the various DB vendors? Strictly in SQL Server (and Azure SQL), do you think the string_split or TVP solutions are preferable in the context of plan cache pollution?

I'm specifically more worried about using FromSql/FromSqlInterpolated if i can avoid it entirely.

@ErikEJ
Copy link
Owner Author

ErikEJ commented Nov 21, 2021

I am actually writing a blog post about using string_split just now.

This solution is provider agnostic, string_split SQL Server only.

And when I wrote the code, TVF mapping was not available in EF Core, it is now, so no FromSql needed!

@ransagy
Copy link

ransagy commented Nov 21, 2021

Then i'll be looking forward for that article and will take another look at TVFs (which i thought not relevant in this context for some reason). Thanks!

@ErikEJ
Copy link
Owner Author

ErikEJ commented Nov 21, 2021

I just blogged: "Avoiding SQL Server plan cache pollution due to unparameterized Contains queries generated by EF Core"
https://erikej.github.io/efcore/sqlserver/2021/11/22/efcore-sqlserver-cache-pollution.html

@joelmandell
Copy link

I had to use the IQueryableExtensions.In(), to support older SQL Server versions.
And had an issue when using a library that parses the table name from generated query (I am using multiple Interceptors). The issue is when IEnumerable<TKey> values is empty.

It has the check if(!values.Any()) and returns queryable.Take(0).
When that call is used EF Core generates an subquery, and the library that I use (EFCoreSecondLevelCacheInterceptor) fails trying to fetch table name from query.

I resorted to return queryable.Where(x => true) instead. And that seems to work after that. Posting this in case someone else has this problem.

Updated code of that extension method, according to that change:

public static IQueryable<TQuery> In<TKey, TQuery>(
            this IQueryable<TQuery> queryable,
            IEnumerable<TKey> values,
            Expression<Func<TQuery, TKey>> keySelector)
        {
            if (values == null)
            {
                throw new ArgumentNullException(nameof(values));
            }

            if (keySelector == null)
            {
                throw new ArgumentNullException(nameof(keySelector));
            }

            if (!values.Any())
            {
                //.Where instead of .Take(0), cause that seem to produce "funky" SQL.
                return queryable.Where(x => true);
            }

            var distinctValues = Bucketize(values);

            if (distinctValues.Length > 2048)
            {
                throw new ArgumentException("Too many parameters for SQL Server, reduce the number of parameters", nameof(keySelector));
            }

            var predicates = distinctValues
                .Select(v =>
                {
                    // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                    Expression<Func<TKey>> valueAsExpression = () => v;
                    return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                })
                .ToList();

            while (predicates.Count > 1)
            {
                predicates = PairWise(predicates).Select(p => Expression.OrElse(p.Item1, p.Item2)).ToList();
            }

            var body = predicates.Single();

            var clause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);

            return queryable.Where(clause);
        }

@ErikEJ ErikEJ changed the title Discussion Avoiding SQL Server plan cache pollution with EF Core 3 and Enumerable.Contains() Jan 26, 2022
Copy link

I really like the solution proposed but can't seem to get the query below working. I also tried using String_Split solution but that defaults to nvarchar(4000) converting to varchar kills the db :) as db I am working on is using varchars

var childItemFilter = List{"childItem1","childItem2"};
itemQuery = itemQuery.Where(m => m.ChildItems.Any(s => childItemFilter.Contains(s.childItemReference)));

@ErikEJ
Copy link
Owner Author

ErikEJ commented Feb 5, 2022

@Henery309 thanks for the kind words.

I do not see any attempt to use the In extension method in the code Shared. Please share a full repro and I will have a look.

Also, what does not working mean?

Copy link

Thanks for helping Erik appreciate it a lot.

I have the following query which retrieves all projects that are assigned to staff members
Because I am using a contains in the linq query sql is generating 1 sql plan for every staff member.

var dbcontet = new ProjectDbContext();
var query = dbcontext.Projects.Where(m=> m.StaffMembers.Any(s=> staffReferences.Contains(s.StaffReference)));

I tried using the extension method IN:
query = query.In(staffReferences, s => s.StaffMembers.SelectMany(s => s.StaffReference));

But this fails with the following error message:
System.InvalidOperationException: 'The LINQ expression 's => s.StaffReference' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.'

@ErikEJ
Copy link
Owner Author

ErikEJ commented Feb 6, 2022

@Henery309 I need a complete runnable repo in order to help, not code fragments

Copy link

@ErikEJ have prepared a running example: https://github.com/Henery309/EfCoreTest
The runnable code uses sqllite as the database and outputs the generated sql to the console.

Using the following code:
var projs = db.Projects.In(staffIds, m => m.StaffMembers.Single().Id).ToList();

The output I get is:

SELECT [p].[Id], [p].[Name]
FROM [dbo].[Project] AS [p]
WHERE ((
SELECT TOP(1) [s].[Id]
FROM [dbo].[StaffMember] AS [s]
WHERE [p].[Id] = [s].[ProjectId]) = @__v_0) OR ((
SELECT TOP(1) [s0].[Id]
FROM [dbo].[StaffMember] AS [s0]
WHERE [p].[Id] = [s0].[ProjectId]) = @__v_1)

multiple selects are added to where clause

I am trying to figure out how to get:

SELECT [p].[Id], [p].[Name]
FROM [dbo].[Project] AS [p]
WHERE EXISTS (
SELECT 1
FROM [dbo].[StaffMember] AS [s]
WHERE ([p].[Id] = [s].[ProjectId]) AND ([s].[Id] = @param1 OR [s].Id == @Param2))

@ErikEJ
Copy link
Owner Author

ErikEJ commented Feb 6, 2022

@Henery309 Hmmm... This is made to solve a SQL Server specific issue with hardcoded paramters, and I see no hardcoded values in the SQL your are sharing. If you use SQLite, this may not be needed at all.

I think SQL Server does not have issus with the multiple SELECTs. I am not sure which implementation if the In method you are using, but there are some suggesions for collapsing ORs in the comments in the gist.

@ErikEJ ErikEJ added the comment Blog comment label Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comment Blog comment
Projects
None yet
Development

No branches or pull requests

7 participants